wasp-mcp
Web Agent Semantic Protocol — MCP Server
wasp-mcp is a Model Context Protocol server that lets Claude (or any MCP client) query arbitrary webpages with token-efficient, structure-aware retrieval. Instead of dumping raw HTML into the context window, WASP builds a lightweight structural index (the manifest) from a page's headings, then fetches content only for the sections relevant to a query.
The result: answers grounded in real page content at a fraction of the token cost of naive scraping.
See the WASP Whitepaper for full protocol specification.
How It Works
Every webpage has two useful layers:
- Structure — headings and section anchors that form a table of contents. Small, cheap to index.
- Content — the text under each heading. Expensive to send in full; most is irrelevant to any given query.
WASP exploits this split with a two-tier pipeline:
Tier 1 — get_manifest(url)
↓ Try GET /.well-known/wasp.json (site-native manifest, 3 s timeout)
↓ Fall back: fetch HTML → parse headings → generate manifest client-side
→ Returns: structured index (headings, anchors, depth, token estimates)
Tier 2 — fetch_chunk(url, anchor)
↓ Resolve anchor → DOM element (getElementById → querySelector → fuzzy match)
↓ Extract section text via Range API / heading-sibling walk
→ Returns: plain-text body of that section only
query_page(url, query)
↓ get_manifest → score chunks by keyword match → fetch_chunk for top results
↓ Build numbered [1. Heading] context → call Claude API → inline [N] citations
→ Returns: { answer, sources[] }
A naive full-page scrape of a typical faculty profile costs ~16,700 tokens. The same query via WASP costs ~2,700 — a 6× reduction.
Install
Requirements: Node.js ≥ 18, an Anthropic API key.
git clone https://github.com/seanfeeney/wasp-mcp
cd wasp-mcp
npm install
npm run build
Set your API key:
export ANTHROPIC_API_KEY=sk-ant-...
Run the server (stdio transport, for Claude Desktop / Claude Code):
node dist/index.js
Add to Claude Code
Add wasp-mcp as a local MCP server in your Claude Code project config:
claude mcp add wasp -- node /absolute/path/to/wasp-mcp/dist/index.js
Or edit .claude/settings.json manually:
{
"mcpServers": {
"wasp": {
"command": "node",
"args": ["/absolute/path/to/wasp-mcp/dist/index.js"],
"env": {
"ANTHROPIC_API_KEY": "sk-ant-..."
}
}
}
}
Restart Claude Code after saving. Confirm the server is live:
/mcp
MCP Tools
get_manifest
Fetches the structural index for a URL. Tries the site's own /.well-known/wasp.json first; falls back to client-side DOM generation from the fetched HTML.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
url |
string | yes | Fully-qualified URL of the page |
Example
get_manifest("https://engineering.tamu.edu/cse/profiles/aklappenecker.html")
{
"wasp": "1.0",
"url": "https://engineering.tamu.edu/cse/profiles/aklappenecker.html",
"title": "Andreas Klappenecker — Texas A&M CSE",
"summary": "Faculty profile for Andreas Klappenecker.",
"keywords": ["quantum computing", "cryptography", "image processing"],
"chunks": [
{ "id": "chunk_001", "heading": "Andreas Klappenecker", "anchor": "#wasp-001", "depth": 1, "tokens": 5, "order": 1 },
{ "id": "chunk_002", "heading": "Research Interests", "anchor": "#wasp-002", "depth": 2, "tokens": 4, "order": 2 },
{ "id": "chunk_003", "heading": "Selected Publications","anchor": "#wasp-003", "depth": 2, "tokens": 5, "order": 3 }
],
"generated": "client"
}
fetch_chunk
Retrieves the plain-text body of a single section identified by its anchor. Anchor resolution uses a three-stage fallback: getElementById → querySelector → fuzzy heading match.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
url |
string | yes | Page URL (used for cache lookup; re-fetches if not cached) |
anchor |
string | yes | CSS anchor string from the manifest (e.g. "#research-interests") |
Example
fetch_chunk(
"https://engineering.tamu.edu/cse/profiles/aklappenecker.html",
"#wasp-002"
)
Quantum computing, image processing, cryptography.
query_page
Full end-to-end retrieval: builds the manifest, scores chunks against the query, fetches relevant section bodies, calls Claude, and returns a cited answer.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
url |
string | yes | Page to query |
query |
string | yes | Natural-language question |
provider |
string | no | "claude" (default) | "openai" | "ollama" |
Example
query_page(
"https://engineering.tamu.edu/cse/profiles/aklappenecker.html",
"What are this professor's research interests?"
)
{
"answer": "Professor Klappenecker's research interests are quantum computing [1], image processing [1], and cryptography [1].",
"sources": [
{ "heading": "Research Interests", "anchor": "#wasp-002" }
]
}
Token Efficiency
| Approach | Tokens sent to LLM | Example page |
|---|---|---|
| Raw HTML scrape | ~16,700 | TAMU faculty profile |
WASP query_page |
~2,700 | same page, same query |
| Reduction | 6.1× |
Token savings grow with page length. A 50,000-token documentation page may see 20–40× reduction when only 2–3 sections are relevant.
Project Structure
wasp-mcp/
index.ts MCP server entry — registers tools
manifest.ts get_manifest() — discovery + DOM generation
chunks.ts fetch_chunk() — anchor resolution + text extraction
retrieval.ts query_page() — scoring, enrichment, LLM call
providers.ts claude / openai / ollama provider adapters
cache.ts In-memory URL → { manifest, html } cache with TTL
types.ts Shared TypeScript types
License
MIT © Sean Feeney, 2026