tarundattagondi

mcp-curate

Community tarundattagondi
Updated

Turn an OpenAPI/Swagger spec into a high-quality, curated MCP server — consolidates endpoints into clear tools and proves it with an eval harness.

mcp-curate

CIPythonLicense: MIT

Turn an OpenAPI spec into a curated MCP server an LLM can actually use — and prove it with an eval.

A naive OpenAPI→MCP generator dumps one tool per endpoint. Point it at GitHub'sAPI and the model drowns in 1190 tools and picks the wrong one. mcp-curateconsolidates those endpoints into a small set of clear, well-describedmeta-tools — and ships an eval harness that measures whether the model picks theright tool, raw vs curated, on your own spec.

Before / after

Spec Raw tools Curated tools Reduction
Swagger Petstore 19 3 84%
Stripe API 587 40 93%
GitHub REST API 1190 40 97%
$ mcp-curate curate examples/github.json
raw tools:     1190
curated tools: 40  (budget 40)
reduction:     97%

Curated tools (actions consolidated):
  - repos: 202 actions  [repos]
  - actions: 187 actions  [actions]
  - orgs: 108 actions  [orgs]
  - issues: 55 actions  [issues]
  ...

Each curated tool exposes an action argument that selects the underlyingoperation, so 1190 flat choices become 40 namespaced ones.

Oversized tags get split, not stuffed. When the tool budget has headroom,a giant tag is broken into focused sub-tools by path instead of one bloatedtool. With more budget, GitHub's 202-operation repos tag splits cleanly:

$ mcp-curate curate examples/github.json --max-tools 120 --max-actions 30
  - repos: ...            repos_branches, repos_commits, repos_collaborators,
  - repos_branches: 36    repos_comments, repos_compare, ... (focused sub-tools)

At a tight budget (the default 40), curation keeps tags whole and clean ratherthan forcing unrelated tags together; raise --max-tools to trade tool countfor smaller, more focused tools.

Does curation actually help? (the eval)

mcp-curate eval runs natural-language requests against both the raw and thecurated tool set using your LLM key, and reports how often the model routes tothe correct tool.

$ export ANTHROPIC_API_KEY=...
$ mcp-curate eval examples/stripe.json --cases examples/eval_cases/stripe.yaml

Eval: raw vs curated tool selection
cases: 11   raw tools: 587   curated tools: 40

raw     correct-tool selection: <run it>%
curated correct-tool selection: <run it>%
  -> improvement: <run it> points

The harness uses your key on your spec, so the numbers aren'thard-coded — run the command above to reproduce them. Golden sets ship forPetstore and Stripe (examples/eval_cases/); add your own as a small YAML file.

The eval is deliberately honest. Beyond correct-tool selection it also reports:

  • curated tool + action accuracy — so curation can't "win" just by offeringfewer, broader tools (it must still route to the right operation);
  • argument construction accuracy (raw vs curated) — for cases that declareexpected arguments, whether the model filled the right parameters(e.g. petId: 42 from "look up pet 42").

Install

git clone <repo> && cd mcp-curate
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,llm]"
./examples/fetch_specs.sh        # petstore is committed; this also grabs GitHub + Stripe

Usage

# Inspect a spec's raw tool count.
mcp-curate parse examples/petstore.json

# See the before/after curation report.
mcp-curate curate examples/github.json --max-tools 40

# Serve the curated MCP server over stdio (bring-your-own auth header).
mcp-curate serve examples/petstore.json --curated \
  --header "Authorization: Bearer $TOKEN"

# A/B the tool selection with your LLM key.
mcp-curate eval examples/petstore.json --cases examples/eval_cases/petstore.yaml

Add --llm-descriptions to curate/serve/eval to let the LLM polish thecurated tool names and descriptions (otherwise they're generated deterministically,with no API key required).

How it works

  1. Parse — load OpenAPI 3.x (JSON/YAML), resolve $ref with cycle cutting,flatten each operation into a spec-agnostic model.
  2. Curate — group operations by tag (path-segment fallback), merge thesmallest related groups to fit a tool budget, split any oversized groupinto focused sub-tools using leftover headroom, and collapse each group intoone meta-tool with an action selector.
  3. Serve — expose either tool set over the MCP stdio transport; tool callsbecome real HTTP requests against the spec's server URL.
  4. Eval — force the model to pick a tool for each golden request and scoreraw vs curated routing.

Development

python -m pytest        # 35 tests: parser, curation, server roundtrip, eval

Tests are offline: the parser/curation suites need no network, and the evalsuite uses a scripted LLM client (no API key).

License

MIT

MCP Server · Populars

MCP Server · New

    jackccrawford

    Geniuz

    Your AI remembers now. Geniuz stores everything in a local database locally on Mac, Windows, Linux, Raspberry Pi. No cloud. No account. No API keys. Nothing leaves your machine. It's open source; you can read every line of code.

    Community jackccrawford
    ggui-ai

    ggui

    The universal interface layer between AI agents and humans. Generate rich UIs on demand via MCP.

    Community ggui-ai
    aanno

    CocoIndex Code MCP Server

    An RAG for code development, implemented as MCP server with cocoindex

    Community aanno
    timescale

    Tiger Linear MCP Server

    A wrapper around the Linear API for internal LLMs

    Community timescale
    choplin

    MCP Gemini CLI

    MCP Server

    Community choplin