Leluth

UnderPixel

Community Leluth
Updated

Chrome extension + MCP server — record, replay, and understand what's behind the pixels. Timestamped visual-API correlation for Claude Code and any MCP client.

UnderPixel

Record, replay, and understand what's behind the pixels.

Timestamped visual–API correlation for Claude Code and any MCP client.

What is it • Features • Quick Start • Tools • Architecture • Roadmap

Status: Phases 1–3 complete (core capture, replay UI, dependency graph & session export). Phase 4 (auto-docs, performance overlays, dependency-graph UI) in progress. Active development.

🎯 What is UnderPixel?

UnderPixel is a Chrome extension + MCP server that gives AI coding assistants — Claude Code, Cursor, Claude Desktop, VS Code Copilot, Windsurf, or anything that speaks MCP — the one piece of context every other browser tool is missing: which API calls feed which UI elements.

Existing browser MCPs treat network capture and visual capture as separate, unlinked streams. UnderPixel bundles them by timestamp:

Snapshot Bundle @ T = 1712345678000
├── screenshot        PNG
├── dom_state         rrweb incremental snapshot
├── api_calls         [ { url, method, status, headers, body, timing }, … ]
├── trigger           "fetch response: GET /api/okrs"
└── correlation       "DOM #okr-table updated from GET /api/okrs"

Ask Claude "what API feeds the user table?" and get an answer grounded in actual recorded traffic — not guesswork from page source.

✨ Key Features

  • 🔗 Visual–API correlation — every screenshot, DOM mutation, and API call indexed on a shared timeline. The core differentiator.
  • 🧠 API dependency graph — auto-detects call chains via value propagation (JWT → request, ID → URL, token → header). Maps your auth flow without you writing a line.
  • 🎬 Session replay with synced timeline — rrweb-player on the left, event-grouped API timeline on the right. Click a request, jump to the moment it fired.
  • 📡 Full network capture via CDP — request and response bodies, headers, timing. Not just URLs. Powered by chrome.debugger.
  • 📸 Smart screenshot gate — 2-layer system (rrweb event stream + stability wait → pixelmatch diff). Captures only frames where pixels actually changed.
  • 🌐 Works in your real browser — your cookies, your logins, your extensions. No --remote-debugging-port, no separate profile, no Playwright relaunch.
  • 🤖 AI action audit trail — when Claude Code drives the browser via MCP, UnderPixel silently records everything. Replay exactly what the agent did.
  • 📦 Session export & share.underpixel files (gzipped JSON). Hand a teammate a full reproduction of a bug — DOM, network, screenshots, all of it.
  • 🔌 Client agnostic — Streamable HTTP or stdio MCP transport. Works with any compliant client.
  • 🎯 ~12 focused tools, not 27 — opinionated surface, less token overhead in tool definitions, more context for actual work.

🆚 How UnderPixel Compares

Capability mcp-chrome chrome-devtools-mcp browser-tools-mcp UnderPixel
Network capture w/ response bodies ⚠️ partial
Screenshots
Works in your real browser ❌ needs flags
Network ↔ DOM correlation
DOM mutation recording ✅ (rrweb)
Pixel-level visual change detection ✅ (pixelmatch)
Synced session replay ✅ (rrweb-player)
API dependency graph
AI action audit / replay
Session export & share ✅ (.underpixel)
Tool count 27 ~25 ~15 ~12 (focused)

📝 UnderPixel builds on the proven infrastructure patterns of mcp-chrome (Native Messaging bridge, CDP capture, Streamable HTTP) and uses rrweb directly as a dependency for DOM recording and replay. The novel contribution is the correlation layer — the thing that turns raw streams into context an LLM can reason about.

🚀 Quick Start

Prerequisites

  • Node.js ≥ 20
  • Chrome (or any Chromium browser — Edge, Brave, Arc work)
  • An MCP-compatible client: Claude Code, Claude Desktop, Cursor, VS Code Copilot, Windsurf…

1. Install the bridge

# npm
npm install -g underpixel-bridge

# pnpm (postinstall scripts must be enabled)
pnpm config set enable-pre-post-scripts true
pnpm install -g underpixel-bridge

# fallback: register manually if postinstall didn't run
underpixel-bridge register

The bridge auto-registers itself as a Chrome Native Messaging host. It is a thin stdio-↔-Native-Messaging translator (~200 lines) — all logic lives in the extension, so the bridge package rarely needs updating.

2. Load the Chrome extension

  1. Download the latest extension build from GitHub Releases.
  2. Open chrome://extensions/ and enable Developer mode.
  3. Click Load unpacked and select the unzipped folder.
  4. Click the UnderPixel toolbar icon → Connect to view your local MCP endpoint.

Once stable, UnderPixel will be published to the Chrome Web Store for one-click install.

3. Wire it up to your MCP client

Streamable HTTP (recommended — supports per-session transports, no subprocess spawn):

{
  "mcpServers": {
    "underpixel": {
      "type": "streamableHttp",
      "url": "http://127.0.0.1:PORT/mcp"
    }
  }
}

stdio (for clients that don't speak HTTP yet):

{
  "mcpServers": {
    "underpixel": {
      "command": "npx",
      "args": ["-y", "underpixel-bridge"]
    }
  }
}

The exact port number is shown in the extension popup after Connect.

4. Try it

In Claude Code (or any MCP client), ask:

"Open the Hacker News front page, capture network for 5 seconds, then tell me which API delivered the story list and what its response shape looks like."

UnderPixel will navigate, record, correlate, and hand back a structured answer with the actual endpoint URL, request method, response schema, and a timestamped DOM snapshot showing the result rendered on screen.

🛠️ MCP Tools

UnderPixel exposes ~12 tools organized by purpose. The full schemas are in packages/shared/src/tool-schemas.ts.

🔗 Correlation (the differentiator)
Tool Description
underpixel_correlate(query) "What API feeds the user table?" Forward (URL/body text search), reverse (DOM element → APIs via rrweb snapshots), and value-level (DOM text → specific JSON response field). Supports CSS selectors, attribute queries, and free text.
underpixel_timeline(startTime?, endTime?, limit?) Chronological correlation bundles — API + DOM + visual state, joined on timestamp.
underpixel_snapshot_at(timestamp) Closest screenshot + active API calls at a specific moment.
📡 Network
Tool Description
underpixel_capture_start(filter?) Start recording network + DOM + visual state. Configurable URL/domain filter.
underpixel_capture_stop() Stop capture, return correlated summary.
underpixel_api_calls(filter?) Query captured API calls with full headers, request and response bodies, timing.
underpixel_api_dependencies() Auto-detected call chain — typed edges (bearer_token, id, session, …).
📸 Visual & Replay
Tool Description
underpixel_screenshot(selector?) On-demand screenshot — viewport, full page, or single element.
underpixel_dom_text(selector) Current text content of elements (TreeWalker-based, safe for any markup).
underpixel_replay(timeRange) Open the replay tab in the browser; returns the session bundle.
🎯 Browser Control (minimal)
Tool Description
underpixel_navigate(url) Open a URL (new tab or update existing).
underpixel_interact(action) Click, fill, scroll, type, press key.
underpixel_page_read(filter?) Accessibility tree of visible elements ('all' or 'interactive').

🧩 Architecture

┌──────────────────────────────────────────────────────────────┐
│  Chrome Extension (Manifest V3 · WXT · Svelte 5)             │
│                                                              │
│  Content (MAIN)                                              │
│    └─ rrweb.record()           → DOM event stream            │
│    └─ PerformanceObserver      → layout-shift signals        │
│                                                              │
│  Background (Service Worker)                                 │
│    ├─ chrome.debugger          → CDP network capture         │
│    ├─ Correlation Engine       → timestamp-window matching   │
│    ├─ Screenshot Gate          → rrweb + stability + diff    │
│    ├─ IndexedDB                → sessions · events · bodies  │
│    └─ Native Messaging         → stdio to bridge             │
│                                                              │
│  Offscreen Document                                          │
│    └─ pixelmatch               → canvas-based pixel diff     │
│                                                              │
│  Replay Page (chrome-extension://…/replay.html)              │
│    ├─ rrweb-player             → interactive replay          │
│    └─ Event-grouped API timeline (synced via Svelte store)   │
└─────────────────────────┬────────────────────────────────────┘
                          │  Native Messaging
┌─────────────────────────┴────────────────────────────────────┐
│  underpixel-bridge (npm package · Fastify · ~200 LOC)        │
│    └─ stdio ↔ Native Messaging · MCP transport routing       │
└─────────────────────────┬────────────────────────────────────┘
                          │  MCP JSON-RPC (Streamable HTTP or stdio)
┌─────────────────────────┴────────────────────────────────────┐
│  Claude Code · Cursor · Claude Desktop · any MCP client      │
└──────────────────────────────────────────────────────────────┘

Why this shape:

  • All logic in the extension. The bridge is a dumb pipe. The extension auto-updates via the Web Store; the npm package rarely changes. Single source of truth, no syncing issues.
  • Per-session MCP transports. Each MCP client gets its own StreamableHTTPServerTransport + McpServer instance — matches the official SDK pattern and supports concurrent clients.
  • IndexedDB everywhere. Long sessions with hundreds of API calls + rrweb events would exhaust memory. IndexedDB persists across MV3 service-worker restarts (30s idle timeout) and indexes by timestamp + URL for fast queries.
  • CDP, not webRequest. chrome.webRequest cannot read response bodies. Since "what data did this API return" is the heart of correlation, chrome.debugger is required.

📖 How the correlation works

The trick isn't capturing things — it's joining them.

  1. Three independent streams flow into a single per-tab buffer:
    • rrweb DOM events (mutation, layout-shift, input, etc.)
    • CDP network events (Network.requestWillBeSent / responseReceived / getResponseBody)
    • Smart screenshots (gated by rrweb activity + stability + pixelmatch diff threshold)
  2. The correlation engine groups them within a configurable window (default 500 ms): an API response at T, DOM mutations at T + 20 ms, and a screenshot at T + 100 ms become a single CorrelationBundle.
  3. The dependency engine extracts trackable values from each response (JWTs, UUIDs, hex tokens, high-entropy strings, numeric IDs) and searches for them in subsequent request URLs, auth headers, and bodies — emitting a typed edge list.
  4. MCP tools query that bundle store. correlate(query) does forward, reverse, and value-level matching. The LLM does deeper reasoning on top of pre-joined data — instead of paginating through raw HAR files.

📦 Repository Layout

underpixel/
├── extension/                  Chrome extension (WXT, Manifest V3)
│   ├── entrypoints/            background · content · popup · replay · offscreen
│   └── lib/
│       ├── network/            CDP capture, ref-counted debugger session
│       ├── correlation/        timestamp matching, rrweb DOM walker
│       ├── screenshot/         2-layer gate + pipeline
│       ├── recording/          batched rrweb event persistence
│       ├── storage/            IndexedDB schema (idb)
│       └── tools/              MCP tool handlers
├── bridge/                     npm: underpixel-bridge (Fastify + Native Messaging)
├── packages/shared/            shared types, MCP tool schemas, constants
└── docs/                       high-level vision, tech design, feature specs

🛠️ Development

pnpm install        # from monorepo root
pnpm build          # build shared → bridge → extension
pnpm dev            # WXT dev mode with HMR (extension only)
pnpm test           # vitest across all packages
pnpm lint           # ESLint
pnpm format:check   # Prettier

See CLAUDE.md for non-obvious project conventions (e.g. rrweb runs in the MAIN world content script and is bridged via window.postMessage; response bodies > 100 KB are stored in a separate IndexedDB store).

🗺️ Roadmap

  • Phase 1 — Core MVP: network capture, rrweb integration, correlation engine, 8 MCP tools, basic popup
  • Phase 2 — 2-layer screenshot gate, offscreen pixelmatch, replay page (rrweb-player + event-based API timeline), timeline / snapshot_at / replay / dom_text tools
  • Phase 3 — Value-propagation dependency graph, .underpixel session export/import (gzipped, with header-masking and body-stripping options)
  • Phase 4 — Auto-generated API docs, performance annotations on replay, visual dependency-graph UI (elkjs), advanced filters
  • Phase 5 — Edge / Brave / Arc support (Chromium-trivial), Firefox port (browser.devtools.network), browser-API abstraction layer
  • Future — Chrome Web Store listing, push-based "Explain this page" once MCP supports server→client push

🙏 Acknowledgments

UnderPixel stands on the shoulders of two excellent MIT-licensed projects:

  • mcp-chrome by @hangwin — reference implementation for the Native Messaging bridge, CDP network capture pipeline, screenshot stitching, and Streamable HTTP MCP server. UnderPixel re-implements these patterns rather than depending on the package directly (mcp-chrome is an extension, not a library), but the architectural debt is significant and gratefully acknowledged.
  • rrweb — DOM recording and replay. Used directly as an npm dependency. rrweb's smart mutation batching is also what made the screenshot gate simple enough to ship — see Design decision #5 for details.

Also leaning on pixelmatch (ISC), @modelcontextprotocol/sdk (MIT), and the WXT extension framework.

📄 License

MIT — same as our upstreams.

Made with rrweb, mcp, and a lot of pixel-counting.

MCP Server · Populars

MCP Server · New

    marcindulak

    Functionality overview

    Local speech-to-text MCP server for Tmux on Linux (for use not only with Claude Code)

    Community marcindulak
    louchi1984-coder

    DeepSeek Code Worker MCP

    DeepSeek V4 code worker MCP for Codex Desktop, powered by Claude Code

    Community louchi1984-coder
    Moeblack

    ComfyUI-AnimaTool

    AI Tool Use API for Anima anime/illustration image generation. Supports MCP Server, HTTP API, and CLI.

    Community Moeblack
    RohanAnandPandit

    Trading212 MCP Server

    The Trading212 MCP server is a Model Context Protocol server implementation that provides seamless data connectivity to the Trading212 trading platform enabling advanced interaction capabilities via the public beta API.

    Community RohanAnandPandit
    Olanetsoft

    Midnight MCP Server

    Midnight MCP server giving AI assistants access to Midnight blockchain — search contracts, analyze code, explore docs

    Community Olanetsoft