MCP Production — Model Context Protocol + OpenAI
Production-ready MCP server with FastAPI, Redis session memory, streaming, retries, rate limiting, and structured logging. Managed with uv.
Project Structure
mcp_production/
├── app/
│ ├── main.py # FastAPI app factory
│ ├── config.py # Centralised settings (pydantic-settings)
│ ├── logger.py # Structured JSON logging (structlog)
│ ├── api/
│ │ ├── routes.py # All route handlers
│ │ └── schemas.py # Pydantic request/response models
│ ├── core/
│ │ ├── mcp_loop.py # Agentic loop (blocking + streaming)
│ │ └── openai_client.py # OpenAI client with retry
│ ├── tools/
│ │ ├── base.py # BaseTool + ToolRegistry
│ │ ├── weather.py # get_weather tool
│ │ ├── calculator.py # calculate tool (sympy)
│ │ └── wiki.py # search_wiki tool
│ └── memory/
│ └── session.py # Redis-backed session memory
├── tests/
│ ├── test_tools.py # Tool unit tests
│ └── test_api.py # API integration tests
├── scripts/
│ ├── run_dev.sh # Dev server (hot reload)
│ ├── run_prod.sh # Production server (multi-worker)
│ └── test.sh # Run test suite
├── pyproject.toml # uv project + dependencies
├── .env.example # Environment variable template
├── docker-compose.yml # Local dev stack (app + Redis)
└── Dockerfile # Multi-stage Docker build
Quick Start
1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
2. Clone and set up
git clone <repo>
cd mcp_production
# Install all dependencies
uv sync
# Copy and fill in env vars
cp .env.example .env
# Edit .env — add OPENAI_API_KEY at minimum
3. Start Redis
# Option A: Docker Compose (recommended)
docker-compose up redis -d
# Option B: Local Redis
brew install redis && redis-server
4. Run the server
# Development (hot reload)
bash scripts/run_dev.sh
# Or directly:
uv run uvicorn app.main:app --reload
Server starts at http://localhost:8000API docs at http://localhost:8000/docs
API Endpoints
POST /api/v1/chat — Blocking
curl -X POST http://localhost:8000/api/v1/chat \
-H "Content-Type: application/json" \
-d '{
"message": "What is the weather in Tokyo and calculate 17 * 4?",
"session_id": "user-123"
}'
Response:
{
"answer": "The weather in Tokyo is 22°C, sunny. And 17 * 4 = 68.",
"session_id": "user-123",
"turns": 2,
"tools_called": ["get_weather", "calculate"],
"total_tokens": 312
}
POST /api/v1/chat/stream — Streaming SSE
curl -N -X POST http://localhost:8000/api/v1/chat/stream \
-H "Content-Type: application/json" \
-d '{"message": "Search Wikipedia for Python", "session_id": "user-123"}'
Events:
data: {"type": "tool_call", "name": "search_wiki", "args": {"query": "Python"}}
data: {"type": "tool_result", "name": "search_wiki", "content": "Python is..."}
data: {"type": "token", "content": "Python "}
data: {"type": "token", "content": "is a..."}
data: {"type": "done", "turns": 2, "tools": ["search_wiki"]}
DELETE /api/v1/session/{session_id} — Clear History
curl -X DELETE http://localhost:8000/api/v1/session/user-123
GET /api/v1/tools — List Tools
curl http://localhost:8000/api/v1/tools
GET /api/v1/health — Health Check
curl http://localhost:8000/api/v1/health
Adding a New Tool
- Create
app/tools/my_tool.py:
from app.tools.base import BaseTool
class MyTool(BaseTool):
name = "my_tool"
def schema(self) -> dict:
return {
"type": "function",
"function": {
"name": self.name,
"description": "Does something useful.",
"parameters": {
"type": "object",
"properties": {
"input": {"type": "string", "description": "Input value"}
},
"required": ["input"]
}
}
}
async def execute(self, input: str) -> str:
return f"Result for: {input}"
- Register in
app/tools/__init__.py:
from app.tools.my_tool import MyTool
registry.register(MyTool())
That's it — the tool is automatically included in all API calls.
Running Tests
bash scripts/test.sh
# Or with uv directly:
uv run pytest tests/ -v
Docker (Full Stack)
docker-compose up --build
Environment Variables
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
required | Your OpenAI API key |
OPENAI_MODEL |
gpt-4o-mini |
Model to use |
REDIS_URL |
redis://localhost:6379 |
Redis connection URL |
SESSION_TTL_SECONDS |
3600 |
Session memory TTL |
APP_ENV |
development |
development or production |
RATE_LIMIT_PER_MINUTE |
20 |
Requests per minute per IP |
OPENWEATHER_API_KEY |
(mock used) | Real weather API key |