mqfarooqi1

agentic-rag-mcp

Community mqfarooqi1
Updated

agentic-rag-mcp

A weekend build to get the three buzzwords straight in my own head by actuallywiring them up: agentic RAG, MCP, and a small multi-agent pipeline,all over the same tiny document set. It's a working sketch, not a framework, andnot production code — the point was to see the moving parts and where each ideaearns its keep (and where it doesn't).

The knowledge base is five short docs about a fictional library called Halyard(docs/). They're written so a couple of questions genuinely need more than onelookup — which is the whole reason "agentic" retrieval is interesting here.

What runs without an API key, and what needs one

File What it is API key?
knowledge_base.py chunk → embed (MiniLM) → FAISS search. The retrieval backend everything else uses. no
mcp_server.py an MCP server exposing the KB as tools (kb_search, kb_fetch, kb_sources) no
mcp_client_demo.py launches that server over stdio and calls its tools no
agentic_rag.py single agent that drives its own retrieval via a search tool yes (offline: naive baseline)
multi_agent.py retriever → synthesizer → verifier pipeline with a revise loop yes (offline: stub roles)
llm.py the manual tool-use loop the agents share

The LLM pieces use Claude (claude-opus-4-8, adaptive thinking, prompt caching onthe system/tools prefix). Without a key, agentic_rag.py falls back to a naivesingle-shot retrieval baseline and multi_agent.py runs stub roles, so you canstill watch the control flow.

Retrieval (knowledge_base.py)

Nothing clever: fixed-size character chunks with overlap, MiniLM embeddings, aFAISS flat inner-product index. No reranker, no hybrid BM25, no semantic chunking.Those are the obvious quality upgrades, but they'd be beside the point — this repois about the orchestration on top, so the retriever is intentionally the dumb part.

Agentic RAG (agentic_rag.py)

Naive RAG retrieves once and answers. That breaks on questions like "how do Iauthenticate SqlSource after 3.0, and is the old way still supported?" — theanswer is split across the migration guide and the changelog, and a single top-ksearch usually grabs one and misses the other.

Agentic RAG hands the model a kb_search tool and lets it decide: search, readthe results, notice the second half of the question isn't covered, search again,then answer with citations. Same FAISS backend; the model just does multi-hoplookups itself. agentic_rag.py prints each query it chooses so you can see thehops.

The trade-off is real and worth stating: the agent costs several modelround-trips and can occasionally wander or over-search. For single-fact questionsnaive RAG is cheaper and just as good. The agent earns its cost only when onelookup genuinely isn't enough.

MCP (mcp_server.py, mcp_client_demo.py)

The agents above could just import knowledge_base. MCP is about not doing that.Model Context Protocol is a standard way for aserver to advertise tools and for any client — Claude Desktop, an IDE, your ownagent — to discover and call them without custom glue. Run one retrieval server,point many clients at it.

mcp_server.py is that server (built with the official SDK's FastMCP), servingthree tools over stdio. mcp_client_demo.py is the easiest way to see it work: itspawns the server as a subprocess, does the MCP handshake, lists the tools, andcalls them — all locally, no key:

python mcp_client_demo.py

To use the same server from Claude Desktop, add it to that app'sclaude_desktop_config.json:

{
  "mcpServers": {
    "halyard-kb": {
      "command": "python",
      "args": ["C:/Users/you/.../agentic-rag-mcp/mcp_server.py"]
    }
  }
}

Then Claude can call kb_search directly. The point: the retrieval logic lives inone place behind a stable interface, and the consumer doesn't know or care thatit's FAISS underneath.

Multi-agent (multi_agent.py)

Same task, decomposed into three focused roles, orchestrated in plain Python:

question -> RETRIEVER (has the search tool) -> evidence
         -> SYNTHESIZER (no tools)          -> draft answer
         -> VERIFIER (structured output)    -> supported? / issues
              supported -> return
              not       -> feed issues back to the synthesizer, retry (<= N)

That's two standard patterns — orchestrator/workers and evaluator/optimizer (theverify-and-revise loop). The flow is ordinary code; I'm deliberately notletting agents spawn each other freely. For a task this well-shaped, a fixedworkflow is more predictable, cheaper, and far easier to debug than an autonomousswarm, and you can read exactly what happened. Autonomous multi-agent is worth itwhen the task genuinely can't be scripted up front — this one can, so it's aworkflow with LLM steps, and I think that's the honest default for most "agent"problems.

The three roles are just functions, so the LLM version and an offline stub versionshare the same orchestrate() loop.

Running it

pip install -r requirements.txt

python knowledge_base.py     # retrieval sanity check
python mcp_client_demo.py    # MCP server + client, end to end, offline

# with a key, the LLM versions:
export ANTHROPIC_API_KEY=...        # PowerShell: $env:ANTHROPIC_API_KEY="..."
python agentic_rag.py
python multi_agent.py

First run downloads the ~90 MB embedding model. Requirements: Python 3.9+, numpy,faiss-cpu, sentence-transformers, mcp, and (for the LLM steps) anthropic + pydantic.

Caveats / what I'd do differently for real work

  • It's five toy docs. Retrieval quality, chunking, and reranking would matter a lotmore on a real corpus and are barely exercised here.
  • I never measured anything. To make claims about "agentic beats naive" I'd build asmall eval set of multi-hop questions with reference answers and actually scorethem, including tokens and round-trips, not just eyeball the traces.
  • The verifier is an LLM judging against retrieved text; it can be wrong and tendsto be lenient. I'd calibrate it against a few human-checked answers.
  • The agent loop has no budget cap beyond max_turns; a real one wants a tokenbudget and a cost ceiling.
  • In a real deployment the agents would call the KB through the MCP server (theSDK has helpers to bridge MCP tools into the tool loop); here they call the KBdirectly for simplicity, and MCP is shown as its own slice.

Muhammad Farooqi · https://github.com/mqfarooqi1

MCP Server · Populars

MCP Server · New

    bobmatnyc

    MCP Vector Search

    CLI-first semantic code search with MCP integration. Modern, fast, and intelligent code search powered by ChromaDB and AST parsing.

    Community bobmatnyc
    ptbsare

    MCP Proxy Server

    This server acts as a central hub for Model Context Protocol (MCP) resource servers.

    Community ptbsare
    docling-project

    Docling MCP: making docling agentic

    Making docling agentic through MCP

    Community docling-project
    SouravRoy-ETL

    duckle

    Local-first ETL/ELT studio: a drag-and-drop visual pipeline designer that compiles to SQL and runs on DuckDB. Tiny desktop app, no servers, git-friendly workspaces.

    Community SouravRoy-ETL
    ksylvan

    Fabric MCP Server

    Fabric MCP Server: Seamlessly integrate Fabric AI capabilities into MCP-enabled tools like IDEs and chat interfaces.

    Community ksylvan