woollama

Web Over Ollama (and Llamas). An MCP + OpenAI router for AI desktops.

📖 Documentation: woollama.readthedocs.io

woollama sits between AI clients (Cursor, the OpenAI SDK, Claude Desktop,cosmic-fabric, anything that speaks OpenAI or MCP) and AI backends (Ollama,Anthropic, fabric, lackpy, filesystem MCPs, anything that speaks OpenAI orMCP). It composes them into orchestrated calls without inventing a newprotocol.

                          ┌─────────────────────┐
                          │   AI clients        │
                          │   (any OpenAI or    │
                          │    MCP client)      │
                          └──────────┬──────────┘
                                     │
                  ┌──────────────────┴───────────────────┐
                  │            woollama                  │
                  │  OpenAI server  +  MCP server        │
                  │  ───────────────────────────────     │
                  │  routes models, tools, executors     │
                  │  composes patterns + tools + models  │
                  │  into named recipes                  │
                  └──────────────────┬───────────────────┘
                                     │
                  ┌──────────────────┴───────────────────┐
                  │                                      │
              ┌───┴────┐                            ┌────┴────┐
              │ MCP    │  tools, prompts, resources │ OpenAI  │  inference
              │ tool   │                            │ compat  │
              │ servers│                            │ backends│
              └────────┘                            └─────────┘
              fabric-mcp, lackpy,                   Ollama, Anthropic,
              filesystem, git, …                    vLLM, llama.cpp, …

Status

Python prototype — multi-backend router, both surfaces live. woollama worksend-to-end as:

an OpenAI-compatible server: /v1/chat/completions (pass-through andhidden chat-loop orchestration of recipes, both with stream:true → OpenAISSE), /v1/models, /v1/tools, and a stateful surface —/v1/responses + /v1/conversations (OpenAI Responses/Conversations shape;see below);
an MCP server to its own clients — over stdio (woollama mcp) andover Streamable HTTP at /mcp, mounted on the same port as /v1/*. Itre-exports every discovered downstream tool (namespaced, with output_schema)plus a chat verb that emits live tool-progress notifications — i.e. it's anMCP aggregator.

It routes inference across multiple backends by <provider>/<model> —ollama (local), anthropic, openai, groq, together, openrouter, andany OpenAI-compatible endpoint you add in inferencers.toml (e.g.self-hosted vLLM) — plus claude-code/<model>, a keyless path to Claude via thelocal CLI (tool-less, or as an executor that runs a recipe's allow-listedMCP tools itself — tool delegation). Config is file-driven (mcp.json,recipes.toml, inferencers.toml).

Stateful conversations route handles; backends own the state — woollamanever stores transcripts in its own system. Two state-owning backends:claude-resume (claude --resume, for claude-code models; keyless, the Claudesession owns the bytes) and managed-agents (Anthropic's Managed Agents, forclaude-agent models; ANTHROPIC_API_KEY, Anthropic hosts the session — andexposes the transcript, so /v1/conversations/{id}/items works). Models with nostate-owning backend (ollama/cloud/recipe) are stateless — the caller ownshistory (store:false). Long-lived MCPconnections. Served on both a Unix socket ($XDG_RUNTIME_DIR/woollama.sock,mode 0600 — the default for local MCP clients) and an ephemeral loopback TCPport; never 0.0.0.0 without explicit opt-in.

Not production-ready. Current status and what's next live indocs/roadmap.md.

Implementation note: woollama will be a Rust program at v1.0.The Python in src/woollama/ is a prototype used to iterate thearchitecture quickly. The Rust port lands when the design surface isstable. See docs/rust-transition.md for theexplicit transition criteria.

See docs/architecture.md for the full target design anddocs/build-log.md for the slice-by-slice history.

Quick taste

The router is OpenAI-compatible, so any OpenAI client can drive it:

import openai
c = openai.OpenAI(base_url="http://127.0.0.1:<port>/v1", api_key="x")

# Pass-through to Ollama
r = c.chat.completions.create(
    model="ollama/qwen3:14b-iq4xs",
    messages=[{"role": "user", "content": "Hi"}],
)

# Orchestrated: a recipe (system prompt + tools + model), transparent to the
# client. The chat-loop happens inside woollama; client sees only the final answer.
r = c.chat.completions.create(
    model="woollama/streamer",
    messages=[{"role": "user", "content": "Please count to 4."}],
)

woollama serves on two transports at once: a Unix socket at$XDG_RUNTIME_DIR/woollama.sock (mode 0600 — the default for local MCP clients,since a connectable socket can spend the router's API keys) and an ephemeralloopback TCP port written to $XDG_RUNTIME_DIR/woollama.addr for clients todiscover. The <port> above is that ephemeral port. Same pattern as a localfabric --serve instance.

Install (development)

git clone https://github.com/<you>/woollama
cd woollama
uv sync                           # creates .venv and installs deps
uv run woollama                   # starts the router; prints its address

In a second shell:

# Discover the address
cat "${XDG_RUNTIME_DIR:-/tmp}/woollama.addr"
# Then point an OpenAI client at it (see Quick taste above).

Tests & lint

uv run --extra dev pytest        # hermetic suite (live tests are opt-in: -m integration)
uv run ruff check .              # lint — the CI gate

CI (.github/workflows/ci.yml) runs both on every push to main and every PR.For the same lint gate locally on commit, opt into the pre-commit hook:

uv tool install pre-commit && pre-commit install

Lint only — the project does not use ruff format (lines are hand-wrapped,E501 is ignored), so there is no formatter step in either gate.

Design principles

Two standards, neither extended. MCP for tool/prompt/resourcediscovery and execution; OpenAI chat-completions for the inferenceprimitive. woollama is a router between them.
Local-only, ephemeral by default. Random loopback port, persistedaddress file for discovery, never 0.0.0.0 without explicit opt-in. Therouter holds API keys and routes to local resources — it should not beLAN-reachable.
The model namespace is the universal addressing scheme. Raw inferencers(<provider>/<model>, e.g. ollama/X, anthropic/X, claude-code/X) andfull recipes (woollama/<recipe>) are all addressable through OpenAI'sstandard model field. No new wire format.
woollama owns routing, not inference or tools. It uses other people'sinference engines (Ollama, Anthropic, …) and other people's tool servers(any MCP server — filesystem, git, lackpy, …). It composes them.
she talks to llamas.

What works today

OpenAI surface: /v1/models, /v1/chat/completions (pass-through +recipe orchestration, both with stream:true → OpenAI SSE), /v1/toolsintrospection
Stateful surface: /v1/responses (stateless subset + stateful) and/v1/conversations (create/list/get/delete, plus items where the backendexposes its transcript). woollama routes conversation handles; backends ownstate (woollama never stores transcripts itself) — claude-resume forclaude-code models, managed-agents (Anthropic Managed Agents) forclaude-agent models; models with no state-owning backend are stateless(store:false)
Multi-backend routing by <provider>/<model>: ollama, anthropic, openai,groq, together, openrouter, claude-code, + any OpenAI-compatible endpointvia inferencers.toml
Tool delegation: a claude-code recipe with tools runs as an executor —Claude owns the agentic loop and calls the recipe's allow-listed MCP toolsitself (per-recipe --mcp-config + --allowedTools containment)
MCP server side: stdio (woollama mcp) and Streamable HTTP at /mcp onthe same port — recipes as prompts, a chat verb (with live tool-progressnotifications), and every downstream tool re-exported with its output_schema(aggregator)
File-driven config (mcp.json, recipes.toml, inferencers.toml), multi-MCP-server discovery + unified tool registry, long-lived MCP connections
Recipe allow-list enforced as a security boundary (in-loop AND in delegation);served on a Unix socket + loopback TCP, address discovery file; CI(ruff + hermetic suite, 3.11/3.12)

Not yet (next on the roadmap)

The live, interactive Claude-in-tmux session backend (a separate Rust sessiondriver) and the interactive requires_action path — gated on spikes that needa real terminal
cosmic-fabric actually consuming the conversations surface (the last v1.0 gate)
The Rust v1.0 port

Full scorecard, ordering, and pending verifications:docs/roadmap.md.

Origin

woollama is the production-grade rewrite of an architecture co-designedin cosmic-fabric, whichremains a frontend (and will use woollama as its router engine). The designdocs that brought woollama here:

docs/architecture.md — the model/tool/executor router design
docs/naming.md — how we landed on this name

License

MIT — see LICENSE.

woollama

woollama

Status

Quick taste

Install (development)

Tests & lint

Design principles

What works today

Not yet (next on the roadmap)

Origin

License

MCP Server · Populars

🦞 OpenClaw — Personal AI Assistant

MarkItDown-MCP

MarkItDown

Awesome MCP Servers

mcp-server-sentry: A Sentry MCP server

MCP Server · New

iai-pme

ReVa - Ghidra MCP Server for AI-Powered Reverse Engineering

CC Web MCP

MCP Gateway

Tianshu 天枢