MCP Agent Toolkit

An agent built on the Model Context Protocol (MCP) that discovers and calls external tools for multi-step research, data lookup, and code-execution tasks, returning reliable structured JSON.

Overview

This project implements a real, runnable MCP server (using the official mcp Python SDK) exposing four distinct tools: a read-only SQL tool over a seeded SQLite database, a REST API tool, a sandboxed Python code interpreter, and a set of file-resource lifecycle tools. An MCP client/agent connects to the server over stdio, dynamically discovers the available tools (nothing is hardcoded), and drives a plan-act loop with a pluggable LLM (OpenAI, Anthropic, or a deterministic offline mock) to satisfy multi-step natural-language tasks such as "look up customer X's orders, sum their total, and tell me if it exceeds $500." The agent's final answer is validated against a Pydantic schema before being returned, and every tool call made along the way is recorded in a trace. A small FastAPI layer exposes this agent over HTTP as POST /agent/run. The SQL tool also demonstrates MCP "context negotiation": callers request a page size, the server clamps it to a safe maximum and reports whether more data is available.

Key Features

MCP server with 4 distinct tools, built on the official mcp Python SDK (FastMCP):
- sql_query — read-only SQL access to a bundled SQLite sample database (customers, products, orders), with context-negotiated pagination (page_size/offset, server-clamped, has_more/next_offset reported back).
- rest_exchange_rate — calls a REST API (a bundled local mock FastAPI server, see "REST tool: real API vs. mock" below) to fetch currency exchange rates.
- run_python — a sandboxed Python code interpreter: snippets are statically checked against an import/name allowlist-denylist, then executed in an isolated subprocess with a wall-clock timeout (and, on POSIX, CPU/memory rlimits).
- list_resources / open_resource / read_resource / release_resource — explicit MCP resource lifecycle management (list available file resources, open a handle, read through that handle one or more times, then explicitly release it; reads after release fail).
MCP client/agent (agent/client.py) that performs live tool discovery via tools/list (no hardcoded tool list) and runs a bounded plan-act loop, calling MCP tools and observing results until it produces a final answer.
Pluggable LLM (agent/llm.py): LLM_PROVIDER=openai|anthropic|mock. Defaults to a deterministic, rule-based mock LLM that requires no API key and no network access, so the whole project (including all tests) runs fully offline.
Structured (JSON-mode) output: the agent's final answer is always validated against the AgentAnswer Pydantic schema (agent/schemas.py), including a typed tool-call trace.
Context negotiation: the SQL tool clamps requested page sizes to a server-side maximum and reports has_more/next_offset, demonstrating negotiated, paginated tool results instead of unbounded responses.
FastAPI wrapper (api/main.py) exposing POST /agent/run, which runs the full client-server-agent interaction per request and returns the structured result plus the discovered tool names.
pytest suite (18 tests) covering tool discovery, SQL correctness and pagination, sandbox accept/reject behaviour, the REST tool against the bundled mock server, resource lifecycle enforcement, and end-to-end structured agent output — all offline, no API key required.
Docker image that seeds the database at build time and runs the mock REST server plus the FastAPI app.

Tech Stack

Python 3.11+
mcp (official Model Context Protocol Python SDK, FastMCP high-level server API, stdio transport)
FastAPI + Uvicorn
Pydantic v2 (schemas / structured output)
SQLite (stdlib sqlite3, no external DB server)
httpx (REST calls)
OpenAI SDK / Anthropic SDK (optional, pluggable)
pytest
Docker

Architecture

                     ┌───────────────────────────────┐
  HTTP POST          │        FastAPI (api/main.py)   │
  /agent/run  ─────▶  │   POST /agent/run              │
                     └───────────────┬────────────────┘
                                     │ run_agent_task(task)
                                     ▼
                     ┌───────────────────────────────┐
                     │   Agent / MCP client            │
                     │   (agent/client.py)             │
                     │                                 │
                     │  1. spawn MCP server (stdio)     │
                     │  2. tools/list -> dynamic         │
                     │     tool discovery                │
                     │  3. loop:                        │
                     │       LLMClient.plan_step(...)    │
                     │       -> call_tool | final_answer │
                     │  4. validate AgentAnswer (Pydantic)│
                     └───────┬───────────────┬─────────┘
                             │               │
                    MCP stdio│               │pluggable LLM
                    protocol │               │(agent/llm.py)
                             ▼               ▼
              ┌─────────────────────┐   OpenAI / Anthropic /
              │  MCP server          │   deterministic MockLLM
              │  (mcp_server/server.py, FastMCP)
              │                       │
              │  sql_query  ─────────┼──▶ data/sample.db (SQLite)
              │  rest_exchange_rate ─┼──▶ mock_rest_server.py (FastAPI, localhost)
              │  run_python ─────────┼──▶ sandbox.py (subprocess, allowlist, timeout)
              │  list/open/read/     │
              │  release_resource ───┼──▶ data/resources/*.txt|*.md
              └─────────────────────┘

MCP server (mcp_server/server.py): a single FastMCP instance registering the four tools below plus one MCP resource template (resource://mcp-agent-toolkit/{filename}), served over stdio. The agent spawns it as a subprocess (python -m mcp_server.server) per run, so each HTTP request gets an isolated server process.
The four tools:
1. sql_query opens a fresh read-only sqlite3 connection (PRAGMA query_only = ON) per call against data/sample.db, rejects anything that isn't a single SELECT, and implements context negotiation: it accepts a requested page_size/offset, clamps page_size to [1, 25] server-side regardless of what was asked for, and always reports back effective_page_size, total_rows_matched, has_more, and next_offset so a caller can page through a large result set safely.
2. rest_exchange_rate makes a real HTTP call (via httpx) to a REST API for currency exchange rates.
3. run_python statically validates a Python snippet with ast (import allowlist + denylisted names such as open/eval/exec/dunder-attribute access), then runs it in a brand-new subprocess (never exec()-ed in-process) with a wall-clock timeout and, on POSIX, CPU-time/address-space rlimits.
4. list_resources / open_resource / read_resource / release_resource model explicit MCP resource lifecycle: list available files, open a named handle (handle_id), read through it any number of times, then release it — after release, further reads on that handle fail. The same files are also exposed as standard MCP resources via resource://mcp-agent-toolkit/{filename} for resources/list/resources/read.
MCP client/agent (agent/client.py): connects over stdio with mcp.ClientSession, calls session.list_tools() to discover tools dynamically (the tool list is never hardcoded in the agent), then loops: ask the LLMClient for a PlanStep (call a tool, or produce a final answer) given the task, the discovered tools, and the history of calls/results so far; execute the chosen tool via session.call_tool(...); repeat (bounded by MAX_AGENT_STEPS). The final answer dict from the LLM is validated into an AgentAnswer Pydantic model, with a ToolCallRecord per tool call made.
Pluggable LLM (agent/llm.py): build_llm_client() reads LLM_PROVIDER and returns an OpenAILLMClient, AnthropicLLMClient, or the default MockLLMClient. All three implement the same plan_step(task, available_tools, history) -> PlanStep interface. The real providers are prompted to return one JSON object matching the call_tool / final_answer shape described in SYSTEM_INSTRUCTIONS. The MockLLMClient is a small deterministic rule-based planner: it recognizes the reference task shape ("customer X's orders... exceeds $N"), issues one sql_query call, then computes the sum/threshold comparison itself and returns a final_answer — enough to drive the full agent loop, including the FastAPI layer and the entire pytest suite, with zero network access and no API key.
FastAPI layer (api/main.py): POST /agent/run takes {"task": "..."}, calls run_agent_task(task) (which owns the full spawn-discover-loop-validate lifecycle above), and returns {"result": AgentAnswer, "available_tools": [...]}.

REST tool: real API vs. mock (read this)

The rest_exchange_rate tool calls a bundled local mock REST server (mcp_server/mock_rest_server.py, a small FastAPI app), not an external public API. This was a deliberate choice per the project brief: the goal is a demo (and a pytest suite) that is fully self-contained and deterministic and does not depend on outbound internet access being available in whatever environment this repo is cloned, tested, or CI-run in. The mock server is exercised over real HTTP (via httpx, over 127.0.0.1) exactly as a call to a real public API would be — the tool code has no special-casing for "mock vs. real"; pointing MOCK_REST_BASE_URL at a real exchange-rate API with a compatible response shape would work with no code changes to the tool itself.

Sandboxing approach and its real limits (read this)

The run_python tool is a portfolio-grade sandbox, explicitly not a hardened, production-grade isolation boundary. What it actually does:

Static AST check (mcp_server/sandbox.py::check_code_safety): parses the snippet with Python's ast module and rejects it before execution if it imports anything outside a small allowlist (math, statistics, json, itertools, collections, datetime, re, random, textwrap), or references denylisted names (open, exec, eval, compile, __import__, getattr/setattr/delattr, globals/locals/vars, etc.), or accesses any dunder attribute (a common sandbox-escape vector, e.g. ().__class__).
Process isolation: the snippet is written to a temp file and run via subprocess.run([sys.executable, "-I", "-S", script]) — a brand-new OS process, never exec()-ed in the server's own process.
Resource limits: a wall-clock timeout (subprocess.run(timeout=...), default 5s, max 10s) always applies. On POSIX, a preexec_fn additionally applies resource.setrlimit for CPU time and address space. On Windows there is no OS-level CPU/memory rlimit — only the wall-clock timeout and the static allowlist apply; this is a real, documented gap, not an oversight.

What this does not protect against: a sufficiently creative snippet may find static-analysis bypasses that this simple allowlist/denylist does not catch (this is a known-hard problem for a dynamic language like Python); there is no OS-level container/seccomp/VM boundary (no gVisor/Firecracker/Docker-level isolation for the interpreter itself — the Docker image in this repo containerizes the whole app, not each snippet individually); and on Windows, memory/CPU exhaustion inside the timeout window is not prevented. For genuinely untrusted/adversarial code, use a real container- or VM-based sandbox instead of this module.

Setup / Installation

# 1. Create and activate a virtual environment (recommended)
python -m venv venv
venv\Scripts\activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. Seed the sample SQLite database (creates data/sample.db)
python scripts/seed_db.py

# 4. Copy the env template (optional -- defaults work with no keys set)
copy .env.example .env

Usage

Run the FastAPI server

uvicorn api.main:app --reload

(Optional) Run the bundled mock REST server, for the REST tool

uvicorn mcp_server.mock_rest_server:app --port 8811

If this is not running, rest_exchange_rate calls will fail with a connection error — everything else (SQL tool, sandbox tool, resource tools, the reference agent task) works without it.

Example request/response

POST /agent/run
Content-Type: application/json

{
  "task": "Look up customer Alice Nguyen's orders, sum their total, and tell me if it exceeds $500."
}

{
  "result": {
    "task": "Look up customer Alice Nguyen's orders, sum their total, and tell me if it exceeds $500.",
    "status": "completed",
    "answer": "Alice Nguyen's completed orders total $334.00, which does not exceed $500.00.",
    "key_facts": {
      "customer_name": "Alice Nguyen",
      "order_total": 334.0,
      "threshold": 500.0,
      "exceeds_threshold": false,
      "num_completed_orders": 3
    },
    "tool_calls": [
      {
        "step": 1,
        "tool_name": "sql_query",
        "arguments": {
          "sql": "SELECT c.name AS customer_name, o.order_id, o.quantity, p.unit_price, (o.quantity * p.unit_price) AS line_total, o.status FROM orders o JOIN customers c ON c.customer_id = o.customer_id JOIN products p ON p.product_id = o.product_id WHERE c.name = 'Alice Nguyen'",
          "page_size": 25
        },
        "result_summary": "{\"columns\": [...], \"rows\": [...], ...}",
        "raw_result": { "columns": ["customer_name", "order_id", "quantity", "unit_price", "line_total", "status"], "rows": [ { "customer_name": "Alice Nguyen", "order_id": 1, "quantity": 1, "unit_price": 59.5, "line_total": 59.5, "status": "completed" }, { "..." : "..." } ], "row_count_returned": 3, "total_rows_matched": 3, "offset": 0, "requested_page_size": 25, "effective_page_size": 25, "has_more": false, "next_offset": null }
      }
    ]
  },
  "available_tools": ["sql_query", "rest_exchange_rate", "run_python", "list_resources", "open_resource", "read_resource", "release_resource"]
}

This example runs with the default LLM_PROVIDER=mock and requires no API key.

Running the seed script and tests

python scripts/seed_db.py
pytest -v

The test suite (18 tests) seeds the database automatically via a session fixture if it does not already exist, spawns the real MCP server as a subprocess for discovery/end-to-end tests, and starts the bundled mock REST server in-process for the REST tool test. No test requires a real API key or outbound network access.

Environment Variables

Variable	Required	Default	Description
`LLM_PROVIDER`	No	`mock`	One of `mock`, `openai`, `anthropic`. Selects the agent's planning LLM.
`OPENAI_API_KEY`	Only if `LLM_PROVIDER=openai`	—	OpenAI API key.
`OPENAI_MODEL`	No	`gpt-4o-mini`	OpenAI model name.
`ANTHROPIC_API_KEY`	Only if `LLM_PROVIDER=anthropic`	—	Anthropic API key.
`ANTHROPIC_MODEL`	No	`claude-3-5-haiku-latest`	Anthropic model name.
`MOCK_REST_BASE_URL`	No	`http://127.0.0.1:8811`	Base URL the `rest_exchange_rate` tool calls.

See .env.example for a copyable template (placeholder values only — no real secrets are stored in this repo).

Project Layout

mcp_server/
  server.py             MCP server (FastMCP): tool + resource definitions
  sandbox.py            Sandboxed code interpreter (AST check + subprocess)
  mock_rest_server.py   Bundled mock REST API (FastAPI)
agent/
  client.py             MCP client / agent loop, tool discovery
  llm.py                Pluggable LLM client (openai/anthropic/mock)
  schemas.py            Pydantic structured-output schemas
api/
  main.py               FastAPI wrapper: POST /agent/run
scripts/
  seed_db.py             SQLite seed script
data/
  resources/             Sample files exposed via the resource lifecycle tools
  sample.db              Generated by seed_db.py (not committed, see .gitignore)
tests/                   pytest suite

MCP Agent Toolkit