helmif

semantic-dom-mcp

Community helmif
Updated

semantic-dom-mcp

Local MCP server (stdio, Node.js + TypeScript) that drives a real Chromium browser via Playwrightto extract a live page's DOM into compact, factual Semantic JSON with Playwright-nativelocators — so AI-generated Playwright tests are consistent across the whole QA team, not justaccurate.

Same page → same extraction → same conventions → same test style, regardless of who runs it.

Evidence: benchmark/RESULTS.md — on real pages the Semantic JSON is92–97% smaller than the raw DOM an agent would otherwise consume, with every locatoruniqueness-verified by Playwright's engine and byte-identical output across runs.Docs: Team guide (setup + connecting your agent) · Benchmark methodology · Roadmap

Quickstart (5 lines)

npm install
npx playwright install chromium
npm run build
export QA_MCP_ALLOWED_HOSTS="staging.yourapp.internal"   # PowerShell: $env:QA_MCP_ALLOWED_HOSTS="..."
node ./dist/index.js    # or add the server to your MCP client (below) and let it launch this

Claude Code / client connection

{
  "mcpServers": {
    "semantic-dom": {
      "command": "node",
      "args": ["./semantic-dom-mcp/dist/index.js"],
      "env": {
        "QA_MCP_ALLOWED_HOSTS": "staging.yourapp.internal,staging.admin.internal",
        "QA_MCP_STORAGE_STATE": "./.auth/staging.json"
      }
    }
  }
}

During development, "command": "npx", "args": ["tsx", "./semantic-dom-mcp/src/index.ts"] mayreplace the built path.

Workflow

  1. Ask your agent: "extract the checkout page and write a success-path test."
  2. The agent calls extract_semantic_dom({ url }) — the server navigates a real Chromiumpage, runs the extractor inside the page, and returns Semantic JSON: every interactive nodewith a ready-to-paste Playwright locator, uniqueness verified by Playwright's own engine.
  3. The agent uses the write_playwright_test prompt (scenario + the JSON), which injects theteam conventions.
  4. The result is a Playwright test in team style, grounded in real locators — never guessed ones.

MCP surface

Kind Name Purpose
Tool extract_semantic_dom Extract a URL into Semantic JSON (url, wait_for, wait_selector, include_hidden, max_nodes). Read-only, never touches the page.
Tool extract_semantic_dom_after Same, but first runs a short declared action list (fill/click/press/wait, max 20) in the main frame and snapshots the resulting state — for toasts, validation errors, opened dialogs. Refuses to extract if the actions navigated off the allowlist.
Tool check_auth Diagnostic: navigates with the configured storageState and reports whether the session bounced to a login-looking page (expired auth shows up as an answer, not a mystery).
Tool list_frames Diagnostic frame tree with same-origin/reachability classification.
Prompt write_playwright_test Team-standard test-writing prompt (scenario, extract_json, team_name?, framework_note?).
Resource conventions://playwright The same team conventions as read-only text.

Errors (navigation failure, denied host, missing selector) come back as structured JSON in thetool result — the agent can react instead of crashing.

Configuration (environment variables)

Variable Meaning
QA_MCP_ALLOWED_HOSTS Required. Comma-separated hostnames the server may navigate to. Navigation is denied by default. Supports host, host:port, and *.domain entries.
QA_MCP_STORAGE_STATE Optional path to a Playwright storageState JSON for pre-authenticated staging sessions. This file holds a live session — it is gitignored; never commit it.
QA_MCP_TEAM_NAME Optional team name used in the write_playwright_test prompt (default QA).

Security posture

  • Tool inputs are untrusted (they arrive via an LLM): strict schemas (additionalProperties: false),http/https only, host allowlist enforced before any navigation.
  • extract_semantic_dom only reads the DOM — it never clicks, submits, or mutates the page.The one sanctioned exception is extract_semantic_dom_after, which executes only an explicit,bounded, schema-validated action list, never logs fill values, and aborts without extracting ifthe page leaves the allowlisted hosts.
  • No network egress beyond navigating to the target URL. No telemetry. Page contents are neverlogged (stderr carries only high-level events) and are not stored beyond the current call.

Semantics worth knowing

  • Snapshot honesty: the JSON is a single moment. A disabled submit button is reportedis_disabled: true with a note — the conventions instruct the model to write the interactionsthat change state, not to assume it stays disabled.
  • Hidden nodes are included and flagged is_visible: false (tests often assert hidden-ness);pass include_hidden: false to drop them (the count dropped is noted, never silent).
  • Open shadow DOM is traversed and flagged in_shadow — locators pierce it natively, so no>>>/::shadow CSS is ever emitted. Closed shadow roots appear as shadow_boundarymarker nodes (detected via pre-navigation attachShadow instrumentation; closed roots createdby declarative shadow DOM parse before scripts run and cannot be detected).
  • Same-origin iframes are extracted per-frame with frame_path set (chain frameLocator()in that order). Cross-origin iframes are recorded as opaque cross_origin_frame nodes withURL/name only — their DOM is never touched.
  • Notification & dialog surfaces (role="alert", role="status", dialogs) are extracted likeinteractive nodes. When a toast library keeps the live region empty and renders the message in asibling (a common pattern across UI libraries), the message text is pulled from the enclosingcontainer and flagged. For UI that renders late after an interaction, wait_selector_after onextract_semantic_dom_after waits deterministically instead of guessing settle_ms. Since those ARIA roles take names from the author (not contents), their rolelocator is getByRole('alert') — or with the aria-label name when one exists. For UI that onlyappears after an interaction (login-success toast), use extract_semantic_dom_after.
  • Links carry href (schema 1.1) so agents can discover which page to extract next withoutscraping. Framework-generated ids (rc_select_*, React useId, Radix, MUI...) are detectedand demoted to last-resort with a note — they change between builds and must never be primary.
  • viewport: "mobile" (375×812, touch) snapshots responsive states; visibility flags reflectthe active media queries.
  • Truncation is loud: max_nodes / depth caps set truncated: true plus a note. Non-uniquelocators carry is_unique: false and disambiguation guidance.

Development

npm run dev        # run the server over stdio via tsx
npm run typecheck  # tsc --noEmit (strict)
npm test           # Vitest suites against real fixture pages in headless Chromium

Repo layout: src/index.ts (bootstrap) · src/server.ts (MCP surface) · src/browser.ts(Playwright layer + orchestration) · src/extractor/ (in-page engine + locator resolution) ·src/types.ts (frozen v1 contract) · src/conventions.ts (single source of team conventions).

MCP Server · Populars

MCP Server · New

    longsizhuo

    openInvest

    基于multiple LLM的风险投资助手

    Community longsizhuo
    CCCpan

    Gebaini

    中国数据核验 MCP Server | 身份核验/企业查询/车辆信息/OCR识别/风险评估 | 10个Tool覆盖5大类 | 微信: chenganp | 邮箱: [email protected]

    Community CCCpan
    ucsandman

    DashClaw

    🛡️The governance runtime for AI agents. Intercept actions, enforce guard policies, require approvals, and produce audit-ready decision trails.

    Community ucsandman
    ClementRingot

    SAP Released Objects Server

    Server for SAP Cloudification Repository - Clean Core Level A/B/C/D filtering

    Community ClementRingot
    raintree-technology

    docpull

    Convert the public web into AI-ready Markdown with a local Python CLI/SDK/MCP crawler.