Context Rot Detection

MCP service that gives AI agents self-awareness about their cognitive state.

Every long-running AI agent suffers from context rot — measurable performance degradation as the context window fills up. Research from Chroma, Stanford ("lost-in-the-middle"), and Redis confirms this is the #1 practical failure mode in production agent systems.

An agent experiencing context rot doesn't know it's degrading — it just starts making worse decisions. This tool gives agents real-time visibility into their own cognitive health.

Features

Health score (0–100) based on token utilization, retrieval accuracy, and session fatigue
Model-specific degradation curves for 15+ curated models (Claude, GPT, Gemini, o-series)
Auto-resolves any HuggingFace model — pass a repo ID like meta-llama/Llama-3.1-70B and the context window is detected automatically, with results cached in SQLite
Lost-in-the-middle risk scoring based on Stanford research
Tool-call burden and session fatigue analysis
Actionable recovery recommendations — compact context, offload to memory, checkpoint, break into subtasks
Per-agent health history tracking (SQLite)
Service-wide utilization statistics

Quick Start

npx (zero install)

npx context-rot-detection

npm (global install)

npm install -g context-rot-detection
context-rot-detection

MCP Client Configuration

Claude Code

Add to .mcp.json in your project root:

{
  "mcpServers": {
    "context-rot-detection": {
      "command": "npx",
      "args": ["-y", "context-rot-detection"],
      "env": {
        "HEALTH_HISTORY_DB": "./health.db"
      }
    }
  }
}

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "context-rot-detection": {
      "command": "npx",
      "args": ["-y", "context-rot-detection"],
      "env": {
        "HEALTH_HISTORY_DB": "/path/to/health.db"
      }
    }
  }
}

Docker

{
  "mcpServers": {
    "context-rot-detection": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "-v", "context-rot-data:/data",
        "ghcr.io/milos-product-maker/context-rot-detection:latest"
      ]
    }
  }
}

Configuration

Environment Variable	Description	Default
`HEALTH_HISTORY_DB`	Path to SQLite database for health history. Use `:memory:` for ephemeral storage.	`:memory:`
`LOG_FILE`	Path to append structured JSON log lines. Omit to disable file logging.	(none)

Tools

`check_my_health`

Analyze the current context window health. Call this periodically during long sessions or before critical decisions.

Parameters:

Parameter	Type	Required	Description
`token_count`	integer	Yes	Current estimated token count in context window
`model`	string	No	LLM model identifier — a curated name (e.g., `claude-opus-4`, `gpt-4o`), a HuggingFace repo ID (e.g., `meta-llama/Llama-3.1-70B`), or any string (falls back to conservative defaults)
`session_duration_minutes`	integer	No	How long this session has been running
`tool_calls_count`	integer	No	Number of tool calls made in this session
`context_summary`	string	No	Brief summary of current task and recent actions
`agent_id`	string	No	Unique agent identifier for history tracking

Example response:

{
  "health_score": 62,
  "status": "warning",
  "token_utilization": {
    "current": 155000,
    "max_effective": 170000,
    "percentage": 91.2,
    "danger_zone_starts_at": 170000
  },
  "quality_estimate": {
    "retrieval_accuracy": "degrading",
    "middle_content_risk": "high",
    "estimated_hallucination_risk": "moderate"
  },
  "session_fatigue": {
    "tool_call_burden": "moderate",
    "session_length_risk": "low",
    "recommendation": "Consider breaking into sub-tasks if complexity increases."
  },
  "recommendations": [
    {
      "priority": "high",
      "action": "compact_context",
      "reason": "You are approaching the effective quality threshold. Summarize older context and remove completed task details.",
      "estimated_quality_gain": 15
    },
    {
      "priority": "high",
      "action": "offload_to_memory",
      "reason": "High risk of lost-in-the-middle effect. Store critical information to external memory before it is effectively lost.",
      "estimated_quality_gain": 8
    }
  ]
}

`get_health_history`

Retrieve health check history for a specific agent.

Parameters:

Parameter	Type	Required	Description
`agent_id`	string	Yes	Unique agent identifier
`limit`	integer	No	Max records to return (default: 20, max: 100)

`get_service_stats`

Get service-wide utilization statistics. No parameters required.

Returns total calls, unique agents, average health score, model distribution, status distribution, and recent activity (last hour / last 24h).

Supported Models

Model	Max Tokens	Danger Zone	Middle-Loss Risk
`claude-opus-4-5`	200K	175K	Low
`claude-opus-4`	200K	170K	Low
`claude-sonnet-4`	200K	165K	Low
`claude-3.7-sonnet`	200K	160K	Low–Medium
`claude-3.5-sonnet`	200K	152K	Medium
`claude-haiku-3.5`	200K	130K	Medium
`gpt-4.1`	1M	500K	Medium
`gpt-4.1-mini`	1M	450K	Medium
`gpt-4o`	128K	105K	Medium
`gpt-4o-mini`	128K	95K	Medium–High
`o3`	200K	160K	Low–Medium
`o4-mini`	200K	150K	Medium
`gemini-2.5-pro`	1M	600K	Medium
`gemini-2.5-flash`	1M	520K	Medium–High
`gemini-2.0-flash`	1M	500K	High

HuggingFace Auto-Resolution

Any model string containing / is treated as a HuggingFace repo ID. The server fetches config.json from the repo, extracts the context window size (max_position_embeddings, n_positions, or max_seq_len), and generates a conservative degradation profile:

65% of max tokens → degradation onset
80% of max tokens → danger zone

Results are cached in SQLite — subsequent lookups are instant.

model: "meta-llama/Llama-3.1-70B"       → 131K context, danger at 105K
model: "mistralai/Mistral-7B-v0.1"      → 32K context, danger at 26K
model: "mosaicml/mpt-7b"                → 65K context, danger at 52K

If the fetch fails (network error, gated model, missing config), the server falls back silently to conservative defaults.

Fallback

Any unrecognized model string without / falls back to conservative defaults (128K max, 100K danger zone).

How It Works

The health score is a weighted composite of four signals:

Signal	Weight	Source
Token utilization quality	40%	Model-specific sigmoid degradation curve
Retrieval accuracy	25%	Base accuracy minus lost-in-the-middle penalty
Tool-call burden	20%	Compounding quality loss after 10+ tool calls
Session length	15%	Time-based fatigue heuristic

The degradation curves are derived from empirical research:

Chroma: Context Rot — quality degrades around 147K–152K tokens on 200K models
Stanford: Lost in the Middle — retrieval accuracy drops for information in the middle of the context window
Redis: Context Rot — compounding degradation effects in long-running agents

Development

git clone https://github.com/milos-product-maker/context-rot-detection.git
cd context-rot-detection
npm install
npm run dev        # Run with tsx (hot reload)
npm test           # Run unit tests
npm run build      # Compile TypeScript

Testing with MCP Inspector

npx @modelcontextprotocol/inspector node dist/index.js

License

MIT

Context Rot Detection