teaguesterling

woollama

Community teaguesterling
Updated

MCP + OpenAI-compatible router for Ollama: ollama/<name> pass-through + woollama/<recipe> orchestrated chat-loop recipes

woollama

Web Over Ollama (and Llamas). An MCP + OpenAI router for AI desktops.

๐Ÿ“– Documentation: woollama.readthedocs.io

woollama sits between AI clients (Cursor, the OpenAI SDK, Claude Desktop,cosmic-fabric, anything that speaks OpenAI or MCP) and AI backends (Ollama,Anthropic, fabric, lackpy, filesystem MCPs, anything that speaks OpenAI orMCP). It composes them into orchestrated calls without inventing a newprotocol.

                          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                          โ”‚   AI clients        โ”‚
                          โ”‚   (any OpenAI or    โ”‚
                          โ”‚    MCP client)      โ”‚
                          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                     โ”‚
                  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                  โ”‚            woollama                  โ”‚
                  โ”‚  OpenAI server  +  MCP server        โ”‚
                  โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€     โ”‚
                  โ”‚  routes models, tools, executors     โ”‚
                  โ”‚  composes patterns + tools + models  โ”‚
                  โ”‚  into named recipes                  โ”‚
                  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                     โ”‚
                  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                  โ”‚                                      โ”‚
              โ”Œโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”                            โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”
              โ”‚ MCP    โ”‚  tools, prompts, resources โ”‚ OpenAI  โ”‚  inference
              โ”‚ tool   โ”‚                            โ”‚ compat  โ”‚
              โ”‚ serversโ”‚                            โ”‚ backendsโ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              fabric-mcp, lackpy,                   Ollama, Anthropic,
              filesystem, git, โ€ฆ                    vLLM, llama.cpp, โ€ฆ

Status

Python prototype โ€” multi-backend router, both surfaces live. woollama worksend-to-end as:

  • an OpenAI-compatible server: /v1/chat/completions (pass-through andhidden chat-loop orchestration of recipes, both with stream:true โ†’ OpenAISSE), /v1/models, /v1/tools, and a stateful surface โ€”/v1/responses + /v1/conversations (OpenAI Responses/Conversations shape;see below);
  • an MCP server to its own clients โ€” over stdio (woollama mcp) andover Streamable HTTP at /mcp, mounted on the same port as /v1/*. Itre-exports every discovered downstream tool (namespaced, with output_schema)plus a chat verb that emits live tool-progress notifications โ€” i.e. it's anMCP aggregator.

It routes inference across multiple backends by <provider>/<model> โ€”ollama (local), anthropic, openai, groq, together, openrouter, andany OpenAI-compatible endpoint you add in inferencers.toml (e.g.self-hosted vLLM) โ€” plus claude-code/<model>, a keyless path to Claude via thelocal CLI (tool-less, or as an executor that runs a recipe's allow-listedMCP tools itself โ€” tool delegation). Config is file-driven (mcp.json,recipes.toml, inferencers.toml).

Stateful conversations route handles; backends own the state โ€” woollamanever stores transcripts in its own system. Two state-owning backends:claude-resume (claude --resume, for claude-code models; keyless, the Claudesession owns the bytes) and managed-agents (Anthropic's Managed Agents, forclaude-agent models; ANTHROPIC_API_KEY, Anthropic hosts the session โ€” andexposes the transcript, so /v1/conversations/{id}/items works). Models with nostate-owning backend (ollama/cloud/recipe) are stateless โ€” the caller ownshistory (store:false). Long-lived MCPconnections. Served on both a Unix socket ($XDG_RUNTIME_DIR/woollama.sock,mode 0600 โ€” the default for local MCP clients) and an ephemeral loopback TCPport; never 0.0.0.0 without explicit opt-in.

Not production-ready. Current status and what's next live indocs/roadmap.md.

Implementation note: woollama will be a Rust program at v1.0.The Python in src/woollama/ is a prototype used to iterate thearchitecture quickly. The Rust port lands when the design surface isstable. See docs/rust-transition.md for theexplicit transition criteria.

See docs/architecture.md for the full target design anddocs/build-log.md for the slice-by-slice history.

Quick taste

The router is OpenAI-compatible, so any OpenAI client can drive it:

import openai
c = openai.OpenAI(base_url="http://127.0.0.1:<port>/v1", api_key="x")

# Pass-through to Ollama
r = c.chat.completions.create(
    model="ollama/qwen3:14b-iq4xs",
    messages=[{"role": "user", "content": "Hi"}],
)

# Orchestrated: a recipe (system prompt + tools + model), transparent to the
# client. The chat-loop happens inside woollama; client sees only the final answer.
r = c.chat.completions.create(
    model="woollama/streamer",
    messages=[{"role": "user", "content": "Please count to 4."}],
)

woollama serves on two transports at once: a Unix socket at$XDG_RUNTIME_DIR/woollama.sock (mode 0600 โ€” the default for local MCP clients,since a connectable socket can spend the router's API keys) and an ephemeralloopback TCP port written to $XDG_RUNTIME_DIR/woollama.addr for clients todiscover. The <port> above is that ephemeral port. Same pattern as a localfabric --serve instance.

Install (development)

git clone https://github.com/<you>/woollama
cd woollama
uv sync                           # creates .venv and installs deps
uv run woollama                   # starts the router; prints its address

In a second shell:

# Discover the address
cat "${XDG_RUNTIME_DIR:-/tmp}/woollama.addr"
# Then point an OpenAI client at it (see Quick taste above).

Tests & lint

uv run --extra dev pytest        # hermetic suite (live tests are opt-in: -m integration)
uv run ruff check .              # lint โ€” the CI gate

CI (.github/workflows/ci.yml) runs both on every push to main and every PR.For the same lint gate locally on commit, opt into the pre-commit hook:

uv tool install pre-commit && pre-commit install

Lint only โ€” the project does not use ruff format (lines are hand-wrapped,E501 is ignored), so there is no formatter step in either gate.

Design principles

  1. Two standards, neither extended. MCP for tool/prompt/resourcediscovery and execution; OpenAI chat-completions for the inferenceprimitive. woollama is a router between them.
  2. Local-only, ephemeral by default. Random loopback port, persistedaddress file for discovery, never 0.0.0.0 without explicit opt-in. Therouter holds API keys and routes to local resources โ€” it should not beLAN-reachable.
  3. The model namespace is the universal addressing scheme. Raw inferencers(<provider>/<model>, e.g. ollama/X, anthropic/X, claude-code/X) andfull recipes (woollama/<recipe>) are all addressable through OpenAI'sstandard model field. No new wire format.
  4. woollama owns routing, not inference or tools. It uses other people'sinference engines (Ollama, Anthropic, โ€ฆ) and other people's tool servers(any MCP server โ€” filesystem, git, lackpy, โ€ฆ). It composes them.
  5. she talks to llamas.

What works today

  • OpenAI surface: /v1/models, /v1/chat/completions (pass-through +recipe orchestration, both with stream:true โ†’ OpenAI SSE), /v1/toolsintrospection
  • Stateful surface: /v1/responses (stateless subset + stateful) and/v1/conversations (create/list/get/delete, plus items where the backendexposes its transcript). woollama routes conversation handles; backends ownstate (woollama never stores transcripts itself) โ€” claude-resume forclaude-code models, managed-agents (Anthropic Managed Agents) forclaude-agent models; models with no state-owning backend are stateless(store:false)
  • Multi-backend routing by <provider>/<model>: ollama, anthropic, openai,groq, together, openrouter, claude-code, + any OpenAI-compatible endpointvia inferencers.toml
  • Tool delegation: a claude-code recipe with tools runs as an executor โ€”Claude owns the agentic loop and calls the recipe's allow-listed MCP toolsitself (per-recipe --mcp-config + --allowedTools containment)
  • MCP server side: stdio (woollama mcp) and Streamable HTTP at /mcp onthe same port โ€” recipes as prompts, a chat verb (with live tool-progressnotifications), and every downstream tool re-exported with its output_schema(aggregator)
  • File-driven config (mcp.json, recipes.toml, inferencers.toml), multi-MCP-server discovery + unified tool registry, long-lived MCP connections
  • Recipe allow-list enforced as a security boundary (in-loop AND in delegation);served on a Unix socket + loopback TCP, address discovery file; CI(ruff + hermetic suite, 3.11/3.12)

Not yet (next on the roadmap)

  • The live, interactive Claude-in-tmux session backend (a separate Rust sessiondriver) and the interactive requires_action path โ€” gated on spikes that needa real terminal
  • cosmic-fabric actually consuming the conversations surface (the last v1.0 gate)
  • The Rust v1.0 port

Full scorecard, ordering, and pending verifications:docs/roadmap.md.

Origin

woollama is the production-grade rewrite of an architecture co-designedin cosmic-fabric, whichremains a frontend (and will use woollama as its router engine). The designdocs that brought woollama here:

  • docs/architecture.md โ€” the model/tool/executor router design
  • docs/naming.md โ€” how we landed on this name

License

MIT โ€” see LICENSE.

MCP Server ยท Populars

MCP Server ยท New

    CodeAbra

    iai-pme

    The best-benchmarked open-source memory system for AI coding assistants

    Community CodeAbra
    cyberkaida

    ReVa - Ghidra MCP Server for AI-Powered Reverse Engineering

    MCP server for reverse engineering tasks in Ghidra ๐Ÿ‘ฉโ€๐Ÿ’ป

    Community cyberkaida
    JcDizzy

    CC Web MCP

    ้ขๅ‘ Claude Code ็ฌฌไธ‰ๆ–นๆจกๅž‹ๅŽ็ซฏ็š„่ฝป้‡ๆœฌๅœฐ WebSearch/WebFetch fallback MCPใ€‚A lightweight, local-first WebSearch/WebFetch fallback MCP for Claude Code when using third-party Anthropic-compatible models such as DeepSeek, Qwen or Kimi.

    Community JcDizzy
    lasso-security

    MCP Gateway

    A plugin-based gateway that orchestrates other MCPs and allows developers to build upon it enterprise-grade agents.

    Community lasso-security
    magicyuan876

    Tianshu ๅคฉๆžข

    ๅคฉๆžข - ไผไธš็บง AI ไธ€็ซ™ๅผๆ•ฐๆฎ้ข„ๅค„็†ๅนณๅฐ | PDF/Office่ฝฌMarkdown | ๆ”ฏๆŒMCPๅ่ฎฎAIๅŠฉๆ‰‹้›†ๆˆ | Vue3+FastAPIๅ…จๆ ˆๆ–นๆกˆ | ๆ–‡ๆกฃ่งฃๆž | ๅคšๆจกๆ€ไฟกๆฏๆๅ–

    Community magicyuan876