MCP Context Hub
Local MCP server (Node.js + TypeScript) for context optimization, RAG memory, semantic cache, and sub-MCP proxy. Designed to run on a machine with GPU (RTX 3060 Ti) + Ollama, acting as a single MCP endpoint for Claude.
Architecture
Claude (Remote)
|
HTTP POST/GET/DELETE + Bearer Token
|
+-----------v-----------+
| Express (:3100) |
| Auth + IP Allowlist |
+-----------+-----------+
|
+-----------v-----------+
| McpServer (SDK v1) |
| |
| Tools: |
| context_pack |
| memory_search |
| memory_upsert |
| context_compress |
| proxy_call |
+-+------+------+-----+-+
| | | |
v v v v
Ollama SQLite Cache ProxyMgr
Client Vector LRU (stdio
(chat Store +TTL sub-MCP)
+embed +FTS5
+fallback)
Features
- context_pack — Combines semantic + text search, deduplication, and LLM synthesis into a structured context bundle (summary, facts, next actions)
- memory_search — Semantic similarity search over stored documents using vector embeddings
- memory_upsert — Store documents with automatic chunking, embedding, and indexing
- context_compress — Compress text into bullets, JSON, steps, or summary format to reduce token usage
- proxy_call — Call tools on sub-MCP servers (e.g., filesystem) with optional post-processing (summarize, compress)
Requirements
- Node.js >= 20
- Ollama with the following models:
llama3.1:8b-instruct-q4_K_M(primary chat)qwen2.5:7b-instruct-q4_K_M(fallback chat)nomic-embed-text:v1.5(embeddings, 768 dims)
Quick Start
# 1. Clone and install
git clone https://github.com/DiegoNogueiraDev/mcp-context-hub.git
cd mcp-context-hub
npm install
# 2. Pull Ollama models
ollama pull llama3.1:8b-instruct-q4_K_M
ollama pull qwen2.5:7b-instruct-q4_K_M
ollama pull nomic-embed-text:v1.5
# 3. Configure environment
cp .env.example .env
# Edit .env and set MCP_AUTH_TOKEN to a secure random value
# 4. Start the server
npm run dev
Or use the setup script:
chmod +x scripts/setup.sh
./scripts/setup.sh
npm run dev
Usage
Health Check
curl http://localhost:3100/health
# {"status":"healthy","timestamp":"..."}
MCP Protocol
The server uses Streamable HTTP transport at /mcp. Initialize a session first:
# Initialize session
curl -X POST http://localhost:3100/mcp \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "Authorization: Bearer <your-token>" \
-d '{
"jsonrpc": "2.0",
"method": "initialize",
"params": {
"protocolVersion": "2025-03-26",
"capabilities": {},
"clientInfo": { "name": "my-client", "version": "1.0.0" }
},
"id": 1
}'
Then call tools using the mcp-session-id header from the response:
# Store a document
curl -X POST http://localhost:3100/mcp \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "mcp-session-id: <session-id>" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "memory_upsert",
"arguments": {
"document_id": "my-doc",
"content": "Your document text here...",
"scope": "project",
"tags": ["example"]
}
},
"id": 2
}'
# Search memories
curl -X POST http://localhost:3100/mcp \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "mcp-session-id: <session-id>" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "memory_search",
"arguments": {
"query": "your search query",
"top_k": 5
}
},
"id": 3
}'
Sub-MCP Proxy
Configure sub-MCP servers via the PROXY_SERVERS environment variable:
PROXY_SERVERS='{"filesystem":{"command":"node","args":["node_modules/@modelcontextprotocol/server-filesystem/dist/index.js","/tmp"]}}' npm run dev
Then call tools on them via proxy_call:
{
"name": "proxy_call",
"arguments": {
"server": "filesystem",
"tool": "read_file",
"arguments": { "path": "/tmp/example.txt" },
"post_process": "none"
}
}
Configuration
All settings via environment variables (see .env.example):
| Variable | Default | Description |
|---|---|---|
MCP_AUTH_TOKEN |
Bearer token for authentication | |
MCP_ALLOWED_IPS |
127.0.0.1,::1 |
Comma-separated allowed IPs |
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama API URL |
PRIMARY_MODEL |
llama3.1:8b-instruct-q4_K_M |
Primary chat model |
FALLBACK_MODEL |
qwen2.5:7b-instruct-q4_K_M |
Fallback chat model |
EMBEDDING_MODEL |
nomic-embed-text:v1.5 |
Embedding model |
PORT |
3100 |
Server port |
HOST |
0.0.0.0 |
Server host |
DB_PATH |
./data/context-hub.db |
SQLite database path |
CACHE_TTL_MS |
300000 |
Cache TTL (5 minutes) |
CACHE_MAX_ENTRIES |
100 |
Max cache entries |
LOG_LEVEL |
info |
Log level (debug, info, warn, error) |
PROXY_SERVERS |
{} |
Sub-MCP server configs (JSON) |
Commands
npm run dev # Start dev server (HTTP on :3100)
npm run dev:stdio # Start in stdio mode (for local MCP testing)
npm run build # Compile TypeScript
npm start # Run compiled output
npm test # Run tests (31 tests, 6 files)
npm run typecheck # Type-check without emitting
npm run health # Run health check script
Project Structure
src/
config.ts # Environment configuration
index.ts # Entry point + graceful shutdown
db/
connection.ts # SQLite singleton (WAL mode)
migrations.ts # Table definitions (documents, chunks, FTS5, audit)
cosine.ts # Cosine similarity + embedding serialization
server/
mcp-server.ts # McpServer setup + tool registration
transport.ts # Express + Streamable HTTP transport
session.ts # Session management
middleware/
auth.ts # Bearer token validation
ip-allowlist.ts # IP restriction
audit.ts # Tool call logging
tools/
schemas.ts # Zod schemas for all tools
context-pack.ts # context_pack implementation
memory-search.ts # memory_search implementation
memory-upsert.ts # memory_upsert implementation
context-compress.ts # context_compress implementation
proxy-call.ts # proxy_call implementation
services/
ollama-client.ts # Ollama API (chat + embed + fallback)
sqlite-vector-store.ts # Vector store (SQLite + brute-force cosine)
text-search.ts # FTS5 full-text search
chunker.ts # Recursive text splitter
dedup.ts # Content hashing + Jaccard dedup
semantic-cache.ts # LRU + TTL in-memory cache
proxy-manager.ts # Sub-MCP stdio connections
utils/
logger.ts # Pino structured logging
metrics.ts # In-memory call metrics
retry.ts # Exponential backoff retry
tokens.ts # Token estimation
types/
index.ts # Type re-exports
ollama.ts # Ollama API types
vector-store.ts # VectorStore interface
tests/
unit/ # cosine, chunker, dedup, cache
integration/ # sqlite vector store
e2e/ # Express server
Tech Stack
- Runtime: Node.js 20, TypeScript
- MCP SDK:
@modelcontextprotocol/sdkv1.26 - HTTP: Express v5 + Streamable HTTP transport
- Database: SQLite (
better-sqlite3) with WAL mode, FTS5 - Embeddings: Ollama
nomic-embed-text:v1.5(768 dimensions) - Chat: Ollama with automatic model fallback
- Validation: Zod v4
- Logging: Pino
- Testing: Vitest
License
MIT