Context Rot Detection
MCP service that gives AI agents self-awareness about their cognitive state.
Every long-running AI agent suffers from context rot — measurable performance degradation as the context window fills up. Research from Chroma, Stanford ("lost-in-the-middle"), and Redis confirms this is the #1 practical failure mode in production agent systems.
An agent experiencing context rot doesn't know it's degrading — it just starts making worse decisions. This tool gives agents real-time visibility into their own cognitive health.
Features
- Health score (0–100) based on token utilization, retrieval accuracy, and session fatigue
- Model-specific degradation curves for 15+ curated models (Claude, GPT, Gemini, o-series)
- Auto-resolves any HuggingFace model — pass a repo ID like
meta-llama/Llama-3.1-70Band the context window is detected automatically, with results cached in SQLite - Lost-in-the-middle risk scoring based on Stanford research
- Tool-call burden and session fatigue analysis
- Actionable recovery recommendations — compact context, offload to memory, checkpoint, break into subtasks
- Per-agent health history tracking (SQLite)
- Service-wide utilization statistics
Quick Start
npx (zero install)
npx context-rot-detection
npm (global install)
npm install -g context-rot-detection
context-rot-detection
MCP Client Configuration
Claude Code
Add to .mcp.json in your project root:
{
"mcpServers": {
"context-rot-detection": {
"command": "npx",
"args": ["-y", "context-rot-detection"],
"env": {
"HEALTH_HISTORY_DB": "./health.db"
}
}
}
}
Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"context-rot-detection": {
"command": "npx",
"args": ["-y", "context-rot-detection"],
"env": {
"HEALTH_HISTORY_DB": "/path/to/health.db"
}
}
}
}
Docker
{
"mcpServers": {
"context-rot-detection": {
"command": "docker",
"args": [
"run", "-i", "--rm",
"-v", "context-rot-data:/data",
"ghcr.io/milos-product-maker/context-rot-detection:latest"
]
}
}
}
Configuration
| Environment Variable | Description | Default |
|---|---|---|
HEALTH_HISTORY_DB |
Path to SQLite database for health history. Use :memory: for ephemeral storage. |
:memory: |
LOG_FILE |
Path to append structured JSON log lines. Omit to disable file logging. | (none) |
Tools
check_my_health
Analyze the current context window health. Call this periodically during long sessions or before critical decisions.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
token_count |
integer | Yes | Current estimated token count in context window |
model |
string | No | LLM model identifier — a curated name (e.g., claude-opus-4, gpt-4o), a HuggingFace repo ID (e.g., meta-llama/Llama-3.1-70B), or any string (falls back to conservative defaults) |
session_duration_minutes |
integer | No | How long this session has been running |
tool_calls_count |
integer | No | Number of tool calls made in this session |
context_summary |
string | No | Brief summary of current task and recent actions |
agent_id |
string | No | Unique agent identifier for history tracking |
Example response:
{
"health_score": 62,
"status": "warning",
"token_utilization": {
"current": 155000,
"max_effective": 170000,
"percentage": 91.2,
"danger_zone_starts_at": 170000
},
"quality_estimate": {
"retrieval_accuracy": "degrading",
"middle_content_risk": "high",
"estimated_hallucination_risk": "moderate"
},
"session_fatigue": {
"tool_call_burden": "moderate",
"session_length_risk": "low",
"recommendation": "Consider breaking into sub-tasks if complexity increases."
},
"recommendations": [
{
"priority": "high",
"action": "compact_context",
"reason": "You are approaching the effective quality threshold. Summarize older context and remove completed task details.",
"estimated_quality_gain": 15
},
{
"priority": "high",
"action": "offload_to_memory",
"reason": "High risk of lost-in-the-middle effect. Store critical information to external memory before it is effectively lost.",
"estimated_quality_gain": 8
}
]
}
get_health_history
Retrieve health check history for a specific agent.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
agent_id |
string | Yes | Unique agent identifier |
limit |
integer | No | Max records to return (default: 20, max: 100) |
get_service_stats
Get service-wide utilization statistics. No parameters required.
Returns total calls, unique agents, average health score, model distribution, status distribution, and recent activity (last hour / last 24h).
Supported Models
| Model | Max Tokens | Danger Zone | Middle-Loss Risk |
|---|---|---|---|
claude-opus-4-5 |
200K | 175K | Low |
claude-opus-4 |
200K | 170K | Low |
claude-sonnet-4 |
200K | 165K | Low |
claude-3.7-sonnet |
200K | 160K | Low–Medium |
claude-3.5-sonnet |
200K | 152K | Medium |
claude-haiku-3.5 |
200K | 130K | Medium |
gpt-4.1 |
1M | 500K | Medium |
gpt-4.1-mini |
1M | 450K | Medium |
gpt-4o |
128K | 105K | Medium |
gpt-4o-mini |
128K | 95K | Medium–High |
o3 |
200K | 160K | Low–Medium |
o4-mini |
200K | 150K | Medium |
gemini-2.5-pro |
1M | 600K | Medium |
gemini-2.5-flash |
1M | 520K | Medium–High |
gemini-2.0-flash |
1M | 500K | High |
HuggingFace Auto-Resolution
Any model string containing / is treated as a HuggingFace repo ID. The server fetches config.json from the repo, extracts the context window size (max_position_embeddings, n_positions, or max_seq_len), and generates a conservative degradation profile:
- 65% of max tokens → degradation onset
- 80% of max tokens → danger zone
Results are cached in SQLite — subsequent lookups are instant.
model: "meta-llama/Llama-3.1-70B" → 131K context, danger at 105K
model: "mistralai/Mistral-7B-v0.1" → 32K context, danger at 26K
model: "mosaicml/mpt-7b" → 65K context, danger at 52K
If the fetch fails (network error, gated model, missing config), the server falls back silently to conservative defaults.
Fallback
Any unrecognized model string without / falls back to conservative defaults (128K max, 100K danger zone).
How It Works
The health score is a weighted composite of four signals:
| Signal | Weight | Source |
|---|---|---|
| Token utilization quality | 40% | Model-specific sigmoid degradation curve |
| Retrieval accuracy | 25% | Base accuracy minus lost-in-the-middle penalty |
| Tool-call burden | 20% | Compounding quality loss after 10+ tool calls |
| Session length | 15% | Time-based fatigue heuristic |
The degradation curves are derived from empirical research:
- Chroma: Context Rot — quality degrades around 147K–152K tokens on 200K models
- Stanford: Lost in the Middle — retrieval accuracy drops for information in the middle of the context window
- Redis: Context Rot — compounding degradation effects in long-running agents
Development
git clone https://github.com/milos-product-maker/context-rot-detection.git
cd context-rot-detection
npm install
npm run dev # Run with tsx (hot reload)
npm test # Run unit tests
npm run build # Compile TypeScript
Testing with MCP Inspector
npx @modelcontextprotocol/inspector node dist/index.js
License
MIT