MCP Agent Toolkit
An agent built on the Model Context Protocol (MCP) that discovers and calls external tools for multi-step research, data lookup, and code-execution tasks, returning reliable structured JSON.
Overview
This project implements a real, runnable MCP server (using the official mcp Python SDK) exposing four distinct tools: a read-only SQL tool over a seeded SQLite database, a REST API tool, a sandboxed Python code interpreter, and a set of file-resource lifecycle tools. An MCP client/agent connects to the server over stdio, dynamically discovers the available tools (nothing is hardcoded), and drives a plan-act loop with a pluggable LLM (OpenAI, Anthropic, or a deterministic offline mock) to satisfy multi-step natural-language tasks such as "look up customer X's orders, sum their total, and tell me if it exceeds $500." The agent's final answer is validated against a Pydantic schema before being returned, and every tool call made along the way is recorded in a trace. A small FastAPI layer exposes this agent over HTTP as POST /agent/run. The SQL tool also demonstrates MCP "context negotiation": callers request a page size, the server clamps it to a safe maximum and reports whether more data is available.
Key Features
- MCP server with 4 distinct tools, built on the official
mcpPython SDK (FastMCP):sql_query— read-only SQL access to a bundled SQLite sample database (customers,products,orders), with context-negotiated pagination (page_size/offset, server-clamped,has_more/next_offsetreported back).rest_exchange_rate— calls a REST API (a bundled local mock FastAPI server, see "REST tool: real API vs. mock" below) to fetch currency exchange rates.run_python— a sandboxed Python code interpreter: snippets are statically checked against an import/name allowlist-denylist, then executed in an isolated subprocess with a wall-clock timeout (and, on POSIX, CPU/memoryrlimits).list_resources/open_resource/read_resource/release_resource— explicit MCP resource lifecycle management (list available file resources, open a handle, read through that handle one or more times, then explicitly release it; reads after release fail).
- MCP client/agent (
agent/client.py) that performs live tool discovery viatools/list(no hardcoded tool list) and runs a bounded plan-act loop, calling MCP tools and observing results until it produces a final answer. - Pluggable LLM (
agent/llm.py):LLM_PROVIDER=openai|anthropic|mock. Defaults to a deterministic, rule-based mock LLM that requires no API key and no network access, so the whole project (including all tests) runs fully offline. - Structured (JSON-mode) output: the agent's final answer is always validated against the
AgentAnswerPydantic schema (agent/schemas.py), including a typed tool-call trace. - Context negotiation: the SQL tool clamps requested page sizes to a server-side maximum and reports
has_more/next_offset, demonstrating negotiated, paginated tool results instead of unbounded responses. - FastAPI wrapper (
api/main.py) exposingPOST /agent/run, which runs the full client-server-agent interaction per request and returns the structured result plus the discovered tool names. - pytest suite (18 tests) covering tool discovery, SQL correctness and pagination, sandbox accept/reject behaviour, the REST tool against the bundled mock server, resource lifecycle enforcement, and end-to-end structured agent output — all offline, no API key required.
- Docker image that seeds the database at build time and runs the mock REST server plus the FastAPI app.
Tech Stack
- Python 3.11+
mcp(official Model Context Protocol Python SDK,FastMCPhigh-level server API, stdio transport)- FastAPI + Uvicorn
- Pydantic v2 (schemas / structured output)
- SQLite (stdlib
sqlite3, no external DB server) - httpx (REST calls)
- OpenAI SDK / Anthropic SDK (optional, pluggable)
- pytest
- Docker
Architecture
┌───────────────────────────────┐
HTTP POST │ FastAPI (api/main.py) │
/agent/run ─────▶ │ POST /agent/run │
└───────────────┬────────────────┘
│ run_agent_task(task)
▼
┌───────────────────────────────┐
│ Agent / MCP client │
│ (agent/client.py) │
│ │
│ 1. spawn MCP server (stdio) │
│ 2. tools/list -> dynamic │
│ tool discovery │
│ 3. loop: │
│ LLMClient.plan_step(...) │
│ -> call_tool | final_answer │
│ 4. validate AgentAnswer (Pydantic)│
└───────┬───────────────┬─────────┘
│ │
MCP stdio│ │pluggable LLM
protocol │ │(agent/llm.py)
▼ ▼
┌─────────────────────┐ OpenAI / Anthropic /
│ MCP server │ deterministic MockLLM
│ (mcp_server/server.py, FastMCP)
│ │
│ sql_query ─────────┼──▶ data/sample.db (SQLite)
│ rest_exchange_rate ─┼──▶ mock_rest_server.py (FastAPI, localhost)
│ run_python ─────────┼──▶ sandbox.py (subprocess, allowlist, timeout)
│ list/open/read/ │
│ release_resource ───┼──▶ data/resources/*.txt|*.md
└─────────────────────┘
- MCP server (
mcp_server/server.py): a singleFastMCPinstance registering the four tools below plus one MCP resource template (resource://mcp-agent-toolkit/{filename}), served over stdio. The agent spawns it as a subprocess (python -m mcp_server.server) per run, so each HTTP request gets an isolated server process. - The four tools:
sql_queryopens a fresh read-onlysqlite3connection (PRAGMA query_only = ON) per call againstdata/sample.db, rejects anything that isn't a singleSELECT, and implements context negotiation: it accepts a requestedpage_size/offset, clampspage_sizeto[1, 25]server-side regardless of what was asked for, and always reports backeffective_page_size,total_rows_matched,has_more, andnext_offsetso a caller can page through a large result set safely.rest_exchange_ratemakes a real HTTP call (viahttpx) to a REST API for currency exchange rates.run_pythonstatically validates a Python snippet withast(import allowlist + denylisted names such asopen/eval/exec/dunder-attribute access), then runs it in a brand-new subprocess (neverexec()-ed in-process) with a wall-clock timeout and, on POSIX, CPU-time/address-spacerlimits.list_resources/open_resource/read_resource/release_resourcemodel explicit MCP resource lifecycle: list available files, open a named handle (handle_id), read through it any number of times, then release it — after release, further reads on that handle fail. The same files are also exposed as standard MCP resources viaresource://mcp-agent-toolkit/{filename}forresources/list/resources/read.
- MCP client/agent (
agent/client.py): connects over stdio withmcp.ClientSession, callssession.list_tools()to discover tools dynamically (the tool list is never hardcoded in the agent), then loops: ask theLLMClientfor aPlanStep(call a tool, or produce a final answer) given the task, the discovered tools, and the history of calls/results so far; execute the chosen tool viasession.call_tool(...); repeat (bounded byMAX_AGENT_STEPS). The final answer dict from the LLM is validated into anAgentAnswerPydantic model, with aToolCallRecordper tool call made. - Pluggable LLM (
agent/llm.py):build_llm_client()readsLLM_PROVIDERand returns anOpenAILLMClient,AnthropicLLMClient, or the defaultMockLLMClient. All three implement the sameplan_step(task, available_tools, history) -> PlanStepinterface. The real providers are prompted to return one JSON object matching thecall_tool/final_answershape described inSYSTEM_INSTRUCTIONS. TheMockLLMClientis a small deterministic rule-based planner: it recognizes the reference task shape ("customer X's orders... exceeds $N"), issues onesql_querycall, then computes the sum/threshold comparison itself and returns afinal_answer— enough to drive the full agent loop, including the FastAPI layer and the entire pytest suite, with zero network access and no API key. - FastAPI layer (
api/main.py):POST /agent/runtakes{"task": "..."}, callsrun_agent_task(task)(which owns the full spawn-discover-loop-validate lifecycle above), and returns{"result": AgentAnswer, "available_tools": [...]}.
REST tool: real API vs. mock (read this)
The rest_exchange_rate tool calls a bundled local mock REST server (mcp_server/mock_rest_server.py, a small FastAPI app), not an external public API. This was a deliberate choice per the project brief: the goal is a demo (and a pytest suite) that is fully self-contained and deterministic and does not depend on outbound internet access being available in whatever environment this repo is cloned, tested, or CI-run in. The mock server is exercised over real HTTP (via httpx, over 127.0.0.1) exactly as a call to a real public API would be — the tool code has no special-casing for "mock vs. real"; pointing MOCK_REST_BASE_URL at a real exchange-rate API with a compatible response shape would work with no code changes to the tool itself.
Sandboxing approach and its real limits (read this)
The run_python tool is a portfolio-grade sandbox, explicitly not a hardened, production-grade isolation boundary. What it actually does:
- Static AST check (
mcp_server/sandbox.py::check_code_safety): parses the snippet with Python'sastmodule and rejects it before execution if it imports anything outside a small allowlist (math,statistics,json,itertools,collections,datetime,re,random,textwrap), or references denylisted names (open,exec,eval,compile,__import__,getattr/setattr/delattr,globals/locals/vars, etc.), or accesses any dunder attribute (a common sandbox-escape vector, e.g.().__class__). - Process isolation: the snippet is written to a temp file and run via
subprocess.run([sys.executable, "-I", "-S", script])— a brand-new OS process, neverexec()-ed in the server's own process. - Resource limits: a wall-clock timeout (
subprocess.run(timeout=...), default 5s, max 10s) always applies. On POSIX, apreexec_fnadditionally appliesresource.setrlimitfor CPU time and address space. On Windows there is no OS-level CPU/memory rlimit — only the wall-clock timeout and the static allowlist apply; this is a real, documented gap, not an oversight.
What this does not protect against: a sufficiently creative snippet may find static-analysis bypasses that this simple allowlist/denylist does not catch (this is a known-hard problem for a dynamic language like Python); there is no OS-level container/seccomp/VM boundary (no gVisor/Firecracker/Docker-level isolation for the interpreter itself — the Docker image in this repo containerizes the whole app, not each snippet individually); and on Windows, memory/CPU exhaustion inside the timeout window is not prevented. For genuinely untrusted/adversarial code, use a real container- or VM-based sandbox instead of this module.
Setup / Installation
# 1. Create and activate a virtual environment (recommended)
python -m venv venv
venv\Scripts\activate
# 2. Install dependencies
pip install -r requirements.txt
# 3. Seed the sample SQLite database (creates data/sample.db)
python scripts/seed_db.py
# 4. Copy the env template (optional -- defaults work with no keys set)
copy .env.example .env
Usage
Run the FastAPI server
uvicorn api.main:app --reload
(Optional) Run the bundled mock REST server, for the REST tool
uvicorn mcp_server.mock_rest_server:app --port 8811
If this is not running, rest_exchange_rate calls will fail with a connection error — everything else (SQL tool, sandbox tool, resource tools, the reference agent task) works without it.
Example request/response
POST /agent/run
Content-Type: application/json
{
"task": "Look up customer Alice Nguyen's orders, sum their total, and tell me if it exceeds $500."
}
{
"result": {
"task": "Look up customer Alice Nguyen's orders, sum their total, and tell me if it exceeds $500.",
"status": "completed",
"answer": "Alice Nguyen's completed orders total $334.00, which does not exceed $500.00.",
"key_facts": {
"customer_name": "Alice Nguyen",
"order_total": 334.0,
"threshold": 500.0,
"exceeds_threshold": false,
"num_completed_orders": 3
},
"tool_calls": [
{
"step": 1,
"tool_name": "sql_query",
"arguments": {
"sql": "SELECT c.name AS customer_name, o.order_id, o.quantity, p.unit_price, (o.quantity * p.unit_price) AS line_total, o.status FROM orders o JOIN customers c ON c.customer_id = o.customer_id JOIN products p ON p.product_id = o.product_id WHERE c.name = 'Alice Nguyen'",
"page_size": 25
},
"result_summary": "{\"columns\": [...], \"rows\": [...], ...}",
"raw_result": { "columns": ["customer_name", "order_id", "quantity", "unit_price", "line_total", "status"], "rows": [ { "customer_name": "Alice Nguyen", "order_id": 1, "quantity": 1, "unit_price": 59.5, "line_total": 59.5, "status": "completed" }, { "..." : "..." } ], "row_count_returned": 3, "total_rows_matched": 3, "offset": 0, "requested_page_size": 25, "effective_page_size": 25, "has_more": false, "next_offset": null }
}
]
},
"available_tools": ["sql_query", "rest_exchange_rate", "run_python", "list_resources", "open_resource", "read_resource", "release_resource"]
}
This example runs with the default LLM_PROVIDER=mock and requires no API key.
Running the seed script and tests
python scripts/seed_db.py
pytest -v
The test suite (18 tests) seeds the database automatically via a session fixture if it does not already exist, spawns the real MCP server as a subprocess for discovery/end-to-end tests, and starts the bundled mock REST server in-process for the REST tool test. No test requires a real API key or outbound network access.
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
LLM_PROVIDER |
No | mock |
One of mock, openai, anthropic. Selects the agent's planning LLM. |
OPENAI_API_KEY |
Only if LLM_PROVIDER=openai |
— | OpenAI API key. |
OPENAI_MODEL |
No | gpt-4o-mini |
OpenAI model name. |
ANTHROPIC_API_KEY |
Only if LLM_PROVIDER=anthropic |
— | Anthropic API key. |
ANTHROPIC_MODEL |
No | claude-3-5-haiku-latest |
Anthropic model name. |
MOCK_REST_BASE_URL |
No | http://127.0.0.1:8811 |
Base URL the rest_exchange_rate tool calls. |
See .env.example for a copyable template (placeholder values only — no real secrets are stored in this repo).
Project Layout
mcp_server/
server.py MCP server (FastMCP): tool + resource definitions
sandbox.py Sandboxed code interpreter (AST check + subprocess)
mock_rest_server.py Bundled mock REST API (FastAPI)
agent/
client.py MCP client / agent loop, tool discovery
llm.py Pluggable LLM client (openai/anthropic/mock)
schemas.py Pydantic structured-output schemas
api/
main.py FastAPI wrapper: POST /agent/run
scripts/
seed_db.py SQLite seed script
data/
resources/ Sample files exposed via the resource lifecycle tools
sample.db Generated by seed_db.py (not committed, see .gitignore)
tests/ pytest suite
License
MIT License, Copyright (c) 2026 Mahnoor Amjad. See LICENSE.