JayArrowz

ARSR MCP Server

Community JayArrowz
Updated

Adaptive Retrieval-Augmented Self-Refinement MCP Server — a closed-loop system that lets LLMs iteratively verify and correct their own claims using uncertainty-guided retrieval.

ARSR MCP Server

Adaptive Retrieval-Augmented Self-Refinement — a closed-loop MCP server that lets LLMs iteratively verify and correct their own claims using uncertainty-guided retrieval.

What it does

Unlike one-shot RAG (retrieve → generate), ARSR runs a refinement loop:

Generate draft → Decompose claims → Score uncertainty
       ↑                                    ↓
   Decide stop ← Revise with evidence ← Retrieve for low-confidence claims

The key insight: retrieval is guided by uncertainty. Only claims the model is unsure about trigger evidence fetching, and the queries are adversarial — designed to disprove the claim, not just confirm it.

Architecture

The server exposes 6 MCP tools. The outer LLM (Claude, GPT, etc.) orchestrates the loop by calling them in sequence:

# Tool Purpose
1 arsr_draft_response Generate initial candidate answer (returns is_refusal flag)
2 arsr_decompose_claims Split into atomic verifiable claims
3 arsr_score_uncertainty Estimate confidence via semantic entropy
4 arsr_retrieve_evidence Web search for low-confidence claims
5 arsr_revise_response Rewrite draft with evidence
6 arsr_should_continue Decide: iterate or finalize

Inner LLM: Tools 1-5 use Claude Haiku internally for intelligence (query generation, claim extraction, evidence evaluation). This keeps costs low while the outer model handles orchestration.

Refusal detection: arsr_draft_response returns a structured is_refusal flag (classified by the inner LLM) indicating whether the draft is a non-answer. When is_refusal is true, downstream tools (decompose, revise) pivot to extracting claims from the original query and building an answer from retrieved evidence instead of trying to refine a refusal.

Web Search: arsr_retrieve_evidence uses the Anthropic API's built-in web search tool — no external search API keys needed.

Setup

Prerequisites

  • Node.js 18+
  • An Anthropic API key

Install & Build

cd arsr-mcp-server
npm install
npm run build

Environment

export ANTHROPIC_API_KEY="sk-ant-..."

Run

stdio mode (for Claude Desktop, Cursor, etc.):

npm start

HTTP mode (for remote access):

TRANSPORT=http PORT=3001 npm start

Claude Desktop Configuration

Add to your claude_desktop_config.json:

Npm:

{
  "mcpServers": {
    "arsr": {
      "command": "npx",
      "args": ["@jayarrowz/mcp-arsr"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "ARSR_MAX_ITERATIONS": "3",
        "ARSR_ENTROPY_SAMPLES": "3",
        "ARSR_RETRIEVAL_STRATEGY": "adversarial",
        "ARSR_INNER_MODEL": "claude-haiku-4-5-20251001"
      }
    }
  }
}

Local build:

{
  "mcpServers": {
    "arsr": {
      "command": "node",
      "args": ["/path/to/arsr-mcp-server/dist/src/index.js"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "ARSR_MAX_ITERATIONS": "3",
        "ARSR_ENTROPY_SAMPLES": "3",
        "ARSR_RETRIEVAL_STRATEGY": "adversarial",
        "ARSR_INNER_MODEL": "claude-haiku-4-5-20251001"
      }
    }
  }
}

How the outer LLM uses it

The orchestrating LLM calls the tools in sequence:

1. draft = arsr_draft_response({ query: "When was Tesla founded?" })
   // draft.is_refusal indicates if the inner LLM refused to answer
2. claims = arsr_decompose_claims({ draft: draft.draft, original_query: "When was Tesla founded?", is_refusal: draft.is_refusal })
3. scored = arsr_score_uncertainty({ claims: claims.claims })
4. low = scored.scored.filter(c => c.confidence < 0.85)
5. evidence = arsr_retrieve_evidence({ claims_to_check: low })
6. revised = arsr_revise_response({ draft: draft.draft, evidence: evidence.evidence, scored: scored.scored, original_query: "When was Tesla founded?", is_refusal: draft.is_refusal })
7. decision = arsr_should_continue({ iteration: 1, scored: revised_scores })
   → if "continue": go to step 2 with revised text
   → if "stop": return revised.revised to user

Configuration

All settings can be overridden via environment variables, falling back to defaults if unset:

Setting Env var Default Description
max_iterations ARSR_MAX_ITERATIONS 3 Budget limit for refinement loops
confidence_threshold ARSR_CONFIDENCE_THRESHOLD 0.85 Claims above this skip retrieval
entropy_samples ARSR_ENTROPY_SAMPLES 3 Rephrasings for semantic entropy
retrieval_strategy ARSR_RETRIEVAL_STRATEGY adversarial adversarial, confirmatory, or balanced
inner_model ARSR_INNER_MODEL claude-haiku-4-5-20251001 Model for internal intelligence

Cost estimate

Per refinement loop iteration (assuming ~5 claims, 3 low-confidence):

  • Inner LLM calls: ~6-10 Haiku calls ≈ $0.002-0.005
  • Web searches: 6-9 queries ≈ included in API
  • Typical total for 2 iterations: < $0.02

Images

Before:

After:

License

MIT

MCP Server · Populars

MCP Server · New

    mksglu

    Context Mode

    MCP is the protocol for tool access. We're the virtualization layer for context.

    Community mksglu
    Altinity

    Altinity MCP Server

    Model Context Protocol server to use ClickHouse® databases in your AI agents

    Community Altinity
    Vexa-ai

    Vexa

    Open-source meeting transcription API for Google Meet, Microsoft Teams & Zoom. Auto-join bots, real-time WebSocket transcripts, MCP server for AI agents. Self-host or use hosted SaaS.

    Community Vexa-ai
    imran-siddique

    AgentOS MCP Server

    A Safety-First Kernel for Autonomous AI Agents - POSIX-inspired primitives with 0% policy violation guarantee

    Community imran-siddique
    openclaw

    🦞 OpenClaw — Personal AI Assistant

    Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

    Community openclaw