semantic-dom-mcp
Local MCP server (stdio, Node.js + TypeScript) that drives a real Chromium browser via Playwrightto extract a live page's DOM into compact, factual Semantic JSON with Playwright-nativelocators — so AI-generated Playwright tests are consistent across the whole QA team, not justaccurate.
Same page → same extraction → same conventions → same test style, regardless of who runs it.
Evidence: benchmark/RESULTS.md — on real pages the Semantic JSON is92–97% smaller than the raw DOM an agent would otherwise consume, with every locatoruniqueness-verified by Playwright's engine and byte-identical output across runs.Docs: Team guide (setup + connecting your agent) · Benchmark methodology · Roadmap
Quickstart (5 lines)
npm install
npx playwright install chromium
npm run build
export QA_MCP_ALLOWED_HOSTS="staging.yourapp.internal" # PowerShell: $env:QA_MCP_ALLOWED_HOSTS="..."
node ./dist/index.js # or add the server to your MCP client (below) and let it launch this
Claude Code / client connection
{
"mcpServers": {
"semantic-dom": {
"command": "node",
"args": ["./semantic-dom-mcp/dist/index.js"],
"env": {
"QA_MCP_ALLOWED_HOSTS": "staging.yourapp.internal,staging.admin.internal",
"QA_MCP_STORAGE_STATE": "./.auth/staging.json"
}
}
}
}
During development, "command": "npx", "args": ["tsx", "./semantic-dom-mcp/src/index.ts"] mayreplace the built path.
Workflow
- Ask your agent: "extract the checkout page and write a success-path test."
- The agent calls
extract_semantic_dom({ url })— the server navigates a real Chromiumpage, runs the extractor inside the page, and returns Semantic JSON: every interactive nodewith a ready-to-paste Playwright locator, uniqueness verified by Playwright's own engine. - The agent uses the
write_playwright_testprompt (scenario + the JSON), which injects theteam conventions. - The result is a Playwright test in team style, grounded in real locators — never guessed ones.
MCP surface
| Kind | Name | Purpose |
|---|---|---|
| Tool | extract_semantic_dom |
Extract a URL into Semantic JSON (url, wait_for, wait_selector, include_hidden, max_nodes). Read-only, never touches the page. |
| Tool | extract_semantic_dom_after |
Same, but first runs a short declared action list (fill/click/press/wait, max 20) in the main frame and snapshots the resulting state — for toasts, validation errors, opened dialogs. Refuses to extract if the actions navigated off the allowlist. |
| Tool | check_auth |
Diagnostic: navigates with the configured storageState and reports whether the session bounced to a login-looking page (expired auth shows up as an answer, not a mystery). |
| Tool | list_frames |
Diagnostic frame tree with same-origin/reachability classification. |
| Prompt | write_playwright_test |
Team-standard test-writing prompt (scenario, extract_json, team_name?, framework_note?). |
| Resource | conventions://playwright |
The same team conventions as read-only text. |
Errors (navigation failure, denied host, missing selector) come back as structured JSON in thetool result — the agent can react instead of crashing.
Configuration (environment variables)
| Variable | Meaning |
|---|---|
QA_MCP_ALLOWED_HOSTS |
Required. Comma-separated hostnames the server may navigate to. Navigation is denied by default. Supports host, host:port, and *.domain entries. |
QA_MCP_STORAGE_STATE |
Optional path to a Playwright storageState JSON for pre-authenticated staging sessions. This file holds a live session — it is gitignored; never commit it. |
QA_MCP_TEAM_NAME |
Optional team name used in the write_playwright_test prompt (default QA). |
Security posture
- Tool inputs are untrusted (they arrive via an LLM): strict schemas (
additionalProperties: false),http/https only, host allowlist enforced before any navigation. extract_semantic_domonly reads the DOM — it never clicks, submits, or mutates the page.The one sanctioned exception isextract_semantic_dom_after, which executes only an explicit,bounded, schema-validated action list, never logs fill values, and aborts without extracting ifthe page leaves the allowlisted hosts.- No network egress beyond navigating to the target URL. No telemetry. Page contents are neverlogged (stderr carries only high-level events) and are not stored beyond the current call.
Semantics worth knowing
- Snapshot honesty: the JSON is a single moment. A disabled submit button is reported
is_disabled: truewith a note — the conventions instruct the model to write the interactionsthat change state, not to assume it stays disabled. - Hidden nodes are included and flagged
is_visible: false(tests often assert hidden-ness);passinclude_hidden: falseto drop them (the count dropped is noted, never silent). - Open shadow DOM is traversed and flagged
in_shadow— locators pierce it natively, so no>>>/::shadowCSS is ever emitted. Closed shadow roots appear asshadow_boundarymarker nodes (detected via pre-navigationattachShadowinstrumentation; closed roots createdby declarative shadow DOM parse before scripts run and cannot be detected). - Same-origin iframes are extracted per-frame with
frame_pathset (chainframeLocator()in that order). Cross-origin iframes are recorded as opaquecross_origin_framenodes withURL/name only — their DOM is never touched. - Notification & dialog surfaces (
role="alert",role="status", dialogs) are extracted likeinteractive nodes. When a toast library keeps the live region empty and renders the message in asibling (a common pattern across UI libraries), the message text is pulled from the enclosingcontainer and flagged. For UI that renders late after an interaction,wait_selector_afteronextract_semantic_dom_afterwaits deterministically instead of guessingsettle_ms. Since those ARIA roles take names from the author (not contents), their rolelocator isgetByRole('alert')— or with thearia-labelname when one exists. For UI that onlyappears after an interaction (login-success toast), useextract_semantic_dom_after. - Links carry
href(schema 1.1) so agents can discover which page to extract next withoutscraping. Framework-generated ids (rc_select_*, ReactuseId, Radix, MUI...) are detectedand demoted to last-resort with a note — they change between builds and must never be primary. viewport: "mobile"(375×812, touch) snapshots responsive states; visibility flags reflectthe active media queries.- Truncation is loud:
max_nodes/ depth caps settruncated: trueplus a note. Non-uniquelocators carryis_unique: falseanddisambiguationguidance.
Development
npm run dev # run the server over stdio via tsx
npm run typecheck # tsc --noEmit (strict)
npm test # Vitest suites against real fixture pages in headless Chromium
Repo layout: src/index.ts (bootstrap) · src/server.ts (MCP surface) · src/browser.ts(Playwright layer + orchestration) · src/extractor/ (in-page engine + locator resolution) ·src/types.ts (frozen v1 contract) · src/conventions.ts (single source of team conventions).