milos-product-maker

Context Rot Detection

Updated

Context Rot Detection & Healing MCP Service — gives AI agents self-awareness about their cognitive state

Context Rot Detection

MCP service that gives AI agents self-awareness about their cognitive state.

Every long-running AI agent suffers from context rot — measurable performance degradation as the context window fills up. Research from Chroma, Stanford ("lost-in-the-middle"), and Redis confirms this is the #1 practical failure mode in production agent systems.

An agent experiencing context rot doesn't know it's degrading — it just starts making worse decisions. This tool gives agents real-time visibility into their own cognitive health.

Features

  • Health score (0–100) based on token utilization, retrieval accuracy, and session fatigue
  • Model-specific degradation curves for 15+ curated models (Claude, GPT, Gemini, o-series)
  • Auto-resolves any HuggingFace model — pass a repo ID like meta-llama/Llama-3.1-70B and the context window is detected automatically, with results cached in SQLite
  • Lost-in-the-middle risk scoring based on Stanford research
  • Tool-call burden and session fatigue analysis
  • Actionable recovery recommendations — compact context, offload to memory, checkpoint, break into subtasks
  • Per-agent health history tracking (SQLite)
  • Service-wide utilization statistics

Quick Start

npx (zero install)

npx context-rot-detection

npm (global install)

npm install -g context-rot-detection
context-rot-detection

MCP Client Configuration

Claude Code

Add to .mcp.json in your project root:

{
  "mcpServers": {
    "context-rot-detection": {
      "command": "npx",
      "args": ["-y", "context-rot-detection"],
      "env": {
        "HEALTH_HISTORY_DB": "./health.db"
      }
    }
  }
}

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "context-rot-detection": {
      "command": "npx",
      "args": ["-y", "context-rot-detection"],
      "env": {
        "HEALTH_HISTORY_DB": "/path/to/health.db"
      }
    }
  }
}

Docker

{
  "mcpServers": {
    "context-rot-detection": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "-v", "context-rot-data:/data",
        "ghcr.io/milos-product-maker/context-rot-detection:latest"
      ]
    }
  }
}

Configuration

Environment Variable Description Default
HEALTH_HISTORY_DB Path to SQLite database for health history. Use :memory: for ephemeral storage. :memory:
LOG_FILE Path to append structured JSON log lines. Omit to disable file logging. (none)

Tools

check_my_health

Analyze the current context window health. Call this periodically during long sessions or before critical decisions.

Parameters:

Parameter Type Required Description
token_count integer Yes Current estimated token count in context window
model string No LLM model identifier — a curated name (e.g., claude-opus-4, gpt-4o), a HuggingFace repo ID (e.g., meta-llama/Llama-3.1-70B), or any string (falls back to conservative defaults)
session_duration_minutes integer No How long this session has been running
tool_calls_count integer No Number of tool calls made in this session
context_summary string No Brief summary of current task and recent actions
agent_id string No Unique agent identifier for history tracking

Example response:

{
  "health_score": 62,
  "status": "warning",
  "token_utilization": {
    "current": 155000,
    "max_effective": 170000,
    "percentage": 91.2,
    "danger_zone_starts_at": 170000
  },
  "quality_estimate": {
    "retrieval_accuracy": "degrading",
    "middle_content_risk": "high",
    "estimated_hallucination_risk": "moderate"
  },
  "session_fatigue": {
    "tool_call_burden": "moderate",
    "session_length_risk": "low",
    "recommendation": "Consider breaking into sub-tasks if complexity increases."
  },
  "recommendations": [
    {
      "priority": "high",
      "action": "compact_context",
      "reason": "You are approaching the effective quality threshold. Summarize older context and remove completed task details.",
      "estimated_quality_gain": 15
    },
    {
      "priority": "high",
      "action": "offload_to_memory",
      "reason": "High risk of lost-in-the-middle effect. Store critical information to external memory before it is effectively lost.",
      "estimated_quality_gain": 8
    }
  ]
}

get_health_history

Retrieve health check history for a specific agent.

Parameters:

Parameter Type Required Description
agent_id string Yes Unique agent identifier
limit integer No Max records to return (default: 20, max: 100)

get_service_stats

Get service-wide utilization statistics. No parameters required.

Returns total calls, unique agents, average health score, model distribution, status distribution, and recent activity (last hour / last 24h).

Supported Models

Model Max Tokens Danger Zone Middle-Loss Risk
claude-opus-4-5 200K 175K Low
claude-opus-4 200K 170K Low
claude-sonnet-4 200K 165K Low
claude-3.7-sonnet 200K 160K Low–Medium
claude-3.5-sonnet 200K 152K Medium
claude-haiku-3.5 200K 130K Medium
gpt-4.1 1M 500K Medium
gpt-4.1-mini 1M 450K Medium
gpt-4o 128K 105K Medium
gpt-4o-mini 128K 95K Medium–High
o3 200K 160K Low–Medium
o4-mini 200K 150K Medium
gemini-2.5-pro 1M 600K Medium
gemini-2.5-flash 1M 520K Medium–High
gemini-2.0-flash 1M 500K High

HuggingFace Auto-Resolution

Any model string containing / is treated as a HuggingFace repo ID. The server fetches config.json from the repo, extracts the context window size (max_position_embeddings, n_positions, or max_seq_len), and generates a conservative degradation profile:

  • 65% of max tokens → degradation onset
  • 80% of max tokens → danger zone

Results are cached in SQLite — subsequent lookups are instant.

model: "meta-llama/Llama-3.1-70B"       → 131K context, danger at 105K
model: "mistralai/Mistral-7B-v0.1"      → 32K context, danger at 26K
model: "mosaicml/mpt-7b"                → 65K context, danger at 52K

If the fetch fails (network error, gated model, missing config), the server falls back silently to conservative defaults.

Fallback

Any unrecognized model string without / falls back to conservative defaults (128K max, 100K danger zone).

How It Works

The health score is a weighted composite of four signals:

Signal Weight Source
Token utilization quality 40% Model-specific sigmoid degradation curve
Retrieval accuracy 25% Base accuracy minus lost-in-the-middle penalty
Tool-call burden 20% Compounding quality loss after 10+ tool calls
Session length 15% Time-based fatigue heuristic

The degradation curves are derived from empirical research:

Development

git clone https://github.com/milos-product-maker/context-rot-detection.git
cd context-rot-detection
npm install
npm run dev        # Run with tsx (hot reload)
npm test           # Run unit tests
npm run build      # Compile TypeScript

Testing with MCP Inspector

npx @modelcontextprotocol/inspector node dist/index.js

License

MIT

MCP Server · Populars

MCP Server · New