woollama
Web Over Ollama (and Llamas). An MCP + OpenAI router for AI desktops.
๐ Documentation: woollama.readthedocs.io
woollama sits between AI clients (Cursor, the OpenAI SDK, Claude Desktop,cosmic-fabric, anything that speaks OpenAI or MCP) and AI backends (Ollama,Anthropic, fabric, lackpy, filesystem MCPs, anything that speaks OpenAI orMCP). It composes them into orchestrated calls without inventing a newprotocol.
โโโโโโโโโโโโโโโโโโโโโโโ
โ AI clients โ
โ (any OpenAI or โ
โ MCP client) โ
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโ
โ woollama โ
โ OpenAI server + MCP server โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ routes models, tools, executors โ
โ composes patterns + tools + models โ
โ into named recipes โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโดโโโโโ โโโโโโดโโโโโ
โ MCP โ tools, prompts, resources โ OpenAI โ inference
โ tool โ โ compat โ
โ serversโ โ backendsโ
โโโโโโโโโโ โโโโโโโโโโโ
fabric-mcp, lackpy, Ollama, Anthropic,
filesystem, git, โฆ vLLM, llama.cpp, โฆ
Status
Python prototype โ multi-backend router, both surfaces live. woollama worksend-to-end as:
- an OpenAI-compatible server:
/v1/chat/completions(pass-through andhidden chat-loop orchestration of recipes, both withstream:trueโ OpenAISSE),/v1/models,/v1/tools, and a stateful surface โ/v1/responses+/v1/conversations(OpenAI Responses/Conversations shape;see below); - an MCP server to its own clients โ over stdio (
woollama mcp) andover Streamable HTTP at/mcp, mounted on the same port as/v1/*. Itre-exports every discovered downstream tool (namespaced, withoutput_schema)plus achatverb that emits live tool-progress notifications โ i.e. it's anMCP aggregator.
It routes inference across multiple backends by <provider>/<model> โollama (local), anthropic, openai, groq, together, openrouter, andany OpenAI-compatible endpoint you add in inferencers.toml (e.g.self-hosted vLLM) โ plus claude-code/<model>, a keyless path to Claude via thelocal CLI (tool-less, or as an executor that runs a recipe's allow-listedMCP tools itself โ tool delegation). Config is file-driven (mcp.json,recipes.toml, inferencers.toml).
Stateful conversations route handles; backends own the state โ woollamanever stores transcripts in its own system. Two state-owning backends:claude-resume (claude --resume, for claude-code models; keyless, the Claudesession owns the bytes) and managed-agents (Anthropic's Managed Agents, forclaude-agent models; ANTHROPIC_API_KEY, Anthropic hosts the session โ andexposes the transcript, so /v1/conversations/{id}/items works). Models with nostate-owning backend (ollama/cloud/recipe) are stateless โ the caller ownshistory (store:false). Long-lived MCPconnections. Served on both a Unix socket ($XDG_RUNTIME_DIR/woollama.sock,mode 0600 โ the default for local MCP clients) and an ephemeral loopback TCPport; never 0.0.0.0 without explicit opt-in.
Not production-ready. Current status and what's next live indocs/roadmap.md.
Implementation note: woollama will be a Rust program at v1.0.The Python in
src/woollama/is a prototype used to iterate thearchitecture quickly. The Rust port lands when the design surface isstable. Seedocs/rust-transition.mdfor theexplicit transition criteria.
See docs/architecture.md for the full target design anddocs/build-log.md for the slice-by-slice history.
Quick taste
The router is OpenAI-compatible, so any OpenAI client can drive it:
import openai
c = openai.OpenAI(base_url="http://127.0.0.1:<port>/v1", api_key="x")
# Pass-through to Ollama
r = c.chat.completions.create(
model="ollama/qwen3:14b-iq4xs",
messages=[{"role": "user", "content": "Hi"}],
)
# Orchestrated: a recipe (system prompt + tools + model), transparent to the
# client. The chat-loop happens inside woollama; client sees only the final answer.
r = c.chat.completions.create(
model="woollama/streamer",
messages=[{"role": "user", "content": "Please count to 4."}],
)
woollama serves on two transports at once: a Unix socket at$XDG_RUNTIME_DIR/woollama.sock (mode 0600 โ the default for local MCP clients,since a connectable socket can spend the router's API keys) and an ephemeralloopback TCP port written to $XDG_RUNTIME_DIR/woollama.addr for clients todiscover. The <port> above is that ephemeral port. Same pattern as a localfabric --serve instance.
Install (development)
git clone https://github.com/<you>/woollama
cd woollama
uv sync # creates .venv and installs deps
uv run woollama # starts the router; prints its address
In a second shell:
# Discover the address
cat "${XDG_RUNTIME_DIR:-/tmp}/woollama.addr"
# Then point an OpenAI client at it (see Quick taste above).
Tests & lint
uv run --extra dev pytest # hermetic suite (live tests are opt-in: -m integration)
uv run ruff check . # lint โ the CI gate
CI (.github/workflows/ci.yml) runs both on every push to main and every PR.For the same lint gate locally on commit, opt into the pre-commit hook:
uv tool install pre-commit && pre-commit install
Lint only โ the project does not use ruff format (lines are hand-wrapped,E501 is ignored), so there is no formatter step in either gate.
Design principles
- Two standards, neither extended. MCP for tool/prompt/resourcediscovery and execution; OpenAI chat-completions for the inferenceprimitive. woollama is a router between them.
- Local-only, ephemeral by default. Random loopback port, persistedaddress file for discovery, never
0.0.0.0without explicit opt-in. Therouter holds API keys and routes to local resources โ it should not beLAN-reachable. - The model namespace is the universal addressing scheme. Raw inferencers(
<provider>/<model>, e.g.ollama/X,anthropic/X,claude-code/X) andfull recipes (woollama/<recipe>) are all addressable through OpenAI'sstandardmodelfield. No new wire format. - woollama owns routing, not inference or tools. It uses other people'sinference engines (Ollama, Anthropic, โฆ) and other people's tool servers(any MCP server โ filesystem, git, lackpy, โฆ). It composes them.
- she talks to llamas.
What works today
- OpenAI surface:
/v1/models,/v1/chat/completions(pass-through +recipe orchestration, both withstream:trueโ OpenAI SSE),/v1/toolsintrospection - Stateful surface:
/v1/responses(stateless subset + stateful) and/v1/conversations(create/list/get/delete, plusitemswhere the backendexposes its transcript). woollama routes conversation handles; backends ownstate (woollama never stores transcripts itself) โclaude-resumeforclaude-codemodels,managed-agents(Anthropic Managed Agents) forclaude-agentmodels; models with no state-owning backend are stateless(store:false) - Multi-backend routing by
<provider>/<model>: ollama, anthropic, openai,groq, together, openrouter,claude-code, + any OpenAI-compatible endpointviainferencers.toml - Tool delegation: a
claude-coderecipe with tools runs as an executor โClaude owns the agentic loop and calls the recipe's allow-listed MCP toolsitself (per-recipe--mcp-config+--allowedToolscontainment) - MCP server side: stdio (
woollama mcp) and Streamable HTTP at/mcponthe same port โ recipes as prompts, achatverb (with live tool-progressnotifications), and every downstream tool re-exported with itsoutput_schema(aggregator) - File-driven config (
mcp.json,recipes.toml,inferencers.toml), multi-MCP-server discovery + unified tool registry, long-lived MCP connections - Recipe allow-list enforced as a security boundary (in-loop AND in delegation);served on a Unix socket + loopback TCP, address discovery file; CI(ruff + hermetic suite, 3.11/3.12)
Not yet (next on the roadmap)
- The live, interactive Claude-in-tmux session backend (a separate Rust sessiondriver) and the interactive
requires_actionpath โ gated on spikes that needa real terminal - cosmic-fabric actually consuming the conversations surface (the last v1.0 gate)
- The Rust v1.0 port
Full scorecard, ordering, and pending verifications:docs/roadmap.md.
Origin
woollama is the production-grade rewrite of an architecture co-designedin cosmic-fabric, whichremains a frontend (and will use woollama as its router engine). The designdocs that brought woollama here:
docs/architecture.mdโ the model/tool/executor router designdocs/naming.mdโ how we landed on this name
License
MIT โ see LICENSE.