Jialong-zhong

wpa-mcp

Community Jialong-zhong
Updated

wpr-xperf-mcp-server

wpa-mcp

An MCP (Model Context Protocol) server that turns Windows WPR .etl traces into structured, LLM-friendly performance insights — using WPAExporter + xperf under the hood, and optionally emitting flamegraph-ready folded stacks.

wpa-mcp bridges two worlds:

  • Windows Performance Analyzer (WPA) — the gold standard for analyzing ETW / WPR traces, but GUI-heavy and hard to automate.
  • LLMs (Claude, Copilot, GPT, …) — great at reasoning across evidence, but blind to .etl files.

This server exposes a small set of MCP tools so an LLM can:

  1. Validate a trace (does it actually contain the events needed for analysis?)
  2. Export the right WPA tables to CSV via predefined profiles
  3. Summarize the CSVs into a compact JSON (Top N processes, hot stacks, ready-thread latency, DPC/ISR offenders, UI jank)
  4. Render a Brendan-Gregg-style folded stack file for flamegraphs — or for the LLM to read directly

Table of contents

  • Architecture
  • Prerequisites
  • Install
  • Capture a trace
  • MCP tools
  • Built-in WPA profiles
  • Analysis examples
    • Example 1: Runaway CPU
    • Example 2: UI hang / "not responding"
    • Example 3: Audio/mouse glitch caused by a driver
    • Example 4: Feeding folded stacks to the LLM
  • Client configuration
  • Release process
  • Troubleshooting
  • FAQ

Architecture

+------------------+       stdio (MCP)        +--------------------+
|  LLM / MCP host  |  <-------------------->  |   wpa-mcp server   |
| (Claude, VSCode) |                          |  (this repo)       |
+------------------+                          +----------+---------+
                                                         |
                                             subprocess  |
                                                         v
                                   +---------------------+---------------------+
                                   |  xperf.exe          |  wpaexporter.exe    |
                                   |  (validate / stats) |  (+ .wpaProfile)    |
                                   +---------------------+---------------------+
                                                         |
                                                         v
                                              CSV tables (per profile)
                                                         |
                                                         v
                                       summarizer -> JSON  /  flamegraph -> .folded

Everything that the LLM sees is structured JSON or compact folded-stack text — never raw gigabyte CSVs.

Prerequisites

  • Windows 10/11 (required; the analysis tools are Windows-only)
  • Windows Performance Toolkit (WPT) installed (ships with Windows ADK / Windows SDK)
    • wpaexporter.exe
    • xperf.exe
  • Python 3.10+

If WPT is installed to a non-default path, set:

setx WPAEXPORTER_PATH \"C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\wpaexporter.exe\"
setx XPERF_PATH       \"C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\xperf.exe\"

Install

Via pipx (recommended — works once published to PyPI)

pipx install wpa-mcp
wpa-mcp    # starts the MCP stdio server

From source

git clone https://github.com/Jialong-zhong/wpr-xperf-mcp-server.git
cd wpr-xperf-mcp-server
pip install -e .
wpa-mcp

Capture a trace

The server's analyses are only as good as the providers you captured. Recommended capture for the four problem classes this server targets:

# Run as Administrator
wpr -start CPU ^
    -start GeneralProfile ^
    -start DesktopComposition ^
    -start Registry ^
    -filemode

# ... reproduce the issue ...

wpr -stop C:\traces\case01.etl \"repro notes here\"
WPR profile What it adds that wpa-mcp uses
CPU Sampled CPU, CSwitch, ReadyThread, StackWalk
GeneralProfile Processes, images, DPC/ISR
DesktopComposition DWM frame timing, Window-in-focus (UI hang evidence)
Registry Registry activity (optional; useful for startup/UI hangs)

If you skip CPU, the most valuable analyses (hot stacks, scheduling latency) won't work — validate_trace will tell you so.

MCP tools

Tool Purpose Typical caller
validate_trace(etl_path) Run xperf -a stats and report which providers / stacks exist LLM, always first
export_tables(etl_path, profile) Run one WPA profile via wpaexporter and return CSV paths Advanced / targeted
analyze_etl(etl_path, focus) Validate → export (by focus) → summarize. Returns one structured JSON LLM, default entry point
render_flamegraph(out_dir) Aggregate CPU Usage (Sampled) stacks into Brendan-Gregg folded format After analyze_etl with CPU focus

analyze_etl input schema

{
  \"etl_path\": \"C:\\traces\\case01.etl\",
  \"focus\": \"cpu | latency | ui | dpc_isr | all\",
  \"out_dir\": \"optional override\",
  \"top_n\": 20
}

analyze_etl output shape (abbreviated)

{
  \"etl\": \"C:\\traces\\case01.etl\",
  \"focus\": \"all\",
  \"validation\": {
    \"duration_sec\": 42.7,
    \"has_cpu_sampling\": true,
    \"has_cswitch\": true,
    \"has_readythread\": true,
    \"has_stacks\": true,
    \"has_dpc_isr\": true,
    \"has_dwm\": true,
    \"warnings\": []
  },
  \"exports\": [\"...\\cpu\\CPU Usage (Sampled)_...csv\", \"...\"],
  \"summary\": {
    \"cpu_top_processes\": [{\"process\": \"chrome.exe\", \"weight_ms\": 8421.3}],
    \"cpu_top_modules\":   [{\"module\":  \"ntdll.dll\",   \"weight_ms\": 2310.0}],
    \"cpu_hot_stacks\":    [{\"stack\":   \"ntdll!... ; app!hot_fn\", \"weight_ms\": 1240.0}],
    \"ready_latency_top\": [{\"process\": \"explorer.exe\", \"tid\": 1234, \"p95_ms\": 187.0}],
    \"dpc_isr_top\":       [{\"driver\":  \"ndis.sys\",     \"total_ms\": 95.2, \"max_us\": 820}],
    \"ui_focus_top\":      [{\"process\": \"myapp.exe\",    \"focus_ms\": 5400.0}],
    \"dwm_slow_frames\":   {\"count\": 38, \"p95_ms\": 41.7, \"max_ms\": 128.0}
  }
}

Built-in WPA profiles

Each profile is a .wpaProfile XML that tells wpaexporter which WPA tables + columns to dump.

Focus key File Tables exported
cpu wpa/profiles/cpu_hotpath.wpaProfile CPU Usage (Sampled)
latency wpa/profiles/scheduling_latency.wpaProfile CPU Usage (Precise), Ready Thread
ui wpa/profiles/ui_hang.wpaProfile Window In Focus, DWM Frame Details
dpc_isr wpa/profiles/dpc_isr.wpaProfile DPC/ISR Duration

Column sets are deliberately minimal to keep CSVs small and summarizer-friendly.

Analysis examples

These are end-to-end, copy-pasteable walkthroughs. Each shows the user prompt, the tool calls the LLM should make, the JSON shape you can expect, and the conclusions a well-prompted LLM should draw.

Example 1: Runaway CPU

User: "C:\traces\cpu_spike.etl — some process is pinning my CPU at 100%. Find it and tell me which function."

LLM tool calls:

// 1) validate
validate_trace({ \"etl_path\": \"C:\\traces\\cpu_spike.etl\" })

// 2) analyze focused on CPU
analyze_etl({ \"etl_path\": \"C:\\traces\\cpu_spike.etl\", \"focus\": \"cpu\", \"top_n\": 10 })

// 3) (optional) generate folded stacks for the hottest paths
render_flamegraph({
  \"out_dir\": \"C:\\traces\\cpu_spike_wpa_out\\cpu\",
  \"top_n\": 200,
  \"min_weight_ms\": 5
})

Typical summary returned:

{
  \"cpu_top_processes\": [
    { \"process\": \"myapp.exe\",  \"weight_ms\": 38210.0 },
    { \"process\": \"chrome.exe\", \"weight_ms\":  2100.0 }
  ],
  \"cpu_top_modules\": [
    { \"module\": \"myapp.exe\",  \"weight_ms\": 31020.0 },
    { \"module\": \"ucrtbase.dll\", \"weight_ms\": 4100.0 }
  ],
  \"cpu_hot_stacks\": [
    { \"stack\": \"ntdll!RtlUserThreadStart;kernel32!BaseThreadInitThunk;myapp!worker_loop;myapp!compress_chunk\",
      \"weight_ms\": 27800.0 }
  ]
}

What the LLM should conclude:

  • Trace duration 42 s, myapp.exe consumed ~38 s of CPU ≈ ~90% of wall time on a single core equivalent.
  • Nearly all weight is in myapp!compress_chunk via worker_loop → this is a hot loop in the app itself, not the runtime or OS.
  • Recommend: inspect compress_chunk, consider SIMD / chunking, and capture a second trace with -start CPU -start FileIO to see if the loop is also I/O-bound.

Example 2: UI hang / "not responding"

User: "When the user clicks Save, the window freezes for ~2 seconds. Trace at C:\traces\save_hang.etl."

LLM tool calls:

validate_trace({ \"etl_path\": \"C:\\traces\\save_hang.etl\" })
analyze_etl({ \"etl_path\": \"C:\\traces\\save_hang.etl\", \"focus\": \"ui\" })
analyze_etl({ \"etl_path\": \"C:\\traces\\save_hang.etl\", \"focus\": \"latency\" })

Expected signals:

{
  \"ui_focus_top\": [{ \"process\": \"myapp.exe\", \"focus_ms\": 5400 }],
  \"dwm_slow_frames\": { \"count\": 42, \"p95_ms\": 58.0, \"max_ms\": 1980.0 },
  \"ready_latency_top\": [
    { \"process\": \"myapp.exe\", \"tid\": 7321, \"p95_ms\": 14.0 }
  ]
}

Reasoning rubric the LLM should apply:

  • dwm_slow_frames.max_ms ≈ 1980 while the app is in focus → the hang is real and UI-thread-bound.
  • ready_latency_top for the UI thread is low (≈14 ms p95) → the thread wasn't starved for CPU; it was busy on its own work (otherwise we'd see hundreds of ms of ready-wait).
  • Therefore the UI thread is doing synchronous work during Save. Likely culprits: synchronous disk/network I/O, lock contention, or a heavy compute path on the UI thread.
  • Next step: re-capture with -start CPU -start FileIO -start DiskIO and focus on the UI thread's stacks with render_flamegraph to see exactly what it's doing during those 2 seconds.

Contrast: if ready_latency_top had been, say, p95_ms: 900 — then the UI thread was ready but not scheduled. That's a scheduling/contention story, not a "heavy work on UI thread" story.

Example 3: Audio/mouse glitch caused by a driver

User: "Audio crackles and the mouse skips for a split second every few minutes. C:\traces\glitch.etl."

LLM tool calls:

validate_trace({ \"etl_path\": \"C:\\traces\\glitch.etl\" })
analyze_etl({ \"etl_path\": \"C:\\traces\\glitch.etl\", \"focus\": \"dpc_isr\" })

Expected signal:

{
  \"dpc_isr_top\": [
    { \"driver\": \"Netwtw10.sys\", \"total_ms\": 312.4, \"max_us\": 4120, \"count\": 1820 },
    { \"driver\": \"ndis.sys\",      \"total_ms\":  95.1, \"max_us\":  820, \"count\": 4300 },
    { \"driver\": \"nvlddmkm.sys\",  \"total_ms\":  60.0, \"max_us\":  410, \"count\": 2100 }
  ]
}

What the LLM should conclude:

  • Netwtw10.sys (Intel Wi-Fi driver) has a single DPC over 4 ms — that's well above the ~1 ms "don't cause audio glitches" rule of thumb.
  • Correlation with symptom: Wi-Fi DPC storms typically line up with mouse/audio skips because DPCs run at elevated IRQL and block the audio/HID stack.
  • Recommend: update the Wi-Fi driver; if the problem persists, disable power-saving for the Wi-Fi adapter and re-capture.

Quality rules wpa-mcp's prompting guide bakes in: any driver with max_us > 1000 is suspicious, >= 500 worth mentioning.

Example 4: Feeding folded stacks to the LLM

After analyze_etl with focus="cpu", you can ask the LLM to drill deeper:

render_flamegraph({
  \"out_dir\": \"C:\\traces\\cpu_spike_wpa_out\\cpu\",
  \"output_path\": \"C:\\traces\\cpu_spike.folded\",
  \"top_n\": 300,
  \"min_weight_ms\": 2
})

Returns:

{
  \"folded_file\": \"C:\\traces\\cpu_spike.folded\",
  \"source_csv\": \"C:\\traces\\cpu_spike_wpa_out\\cpu\\CPU Usage (Sampled)_....csv\",
  \"line_count\": 287,
  \"total_weight_ms\": 39120.0,
  \"preview\": \"ntdll!RtlUserThreadStart;kernel32!BaseThreadInitThunk;myapp!worker_loop;myapp!compress_chunk 27800\\nntdll!... ; myapp!parse_header 410\\n...\"
}

You can now either:

  • Render an SVG flamegraph (requires Perl + Brendan Gregg's script):

    flamegraph.pl C:\traces\cpu_spike.folded > C:\traces\cpu_spike.svg
    
  • Or just let the LLM read the preview — the folded format is already much easier for an LLM than raw CSV.

Client configuration

Claude Desktop — %APPDATA%\\Claude\\claude_desktop_config.json

{
  \"mcpServers\": {
    \"wpa\": {
      \"command\": \"wpa-mcp\",
      \"env\": {
        \"WPAEXPORTER_PATH\": \"C:/Program Files (x86)/Windows Kits/10/Windows Performance Toolkit/wpaexporter.exe\",
        \"XPERF_PATH\":       \"C:/Program Files (x86)/Windows Kits/10/Windows Performance Toolkit/xperf.exe\"
      }
    }
  }
}

VS Code (GitHub Copilot Chat / MCP) — .vscode/mcp.json

Already included in this repo. It points at server.py in the workspace.

Custom MCP host

Any MCP client that speaks stdio works. Launch wpa-mcp (or python server.py) as a child process and send tools/list + tools/call over stdio.

Release process

This repo publishes to PyPI via GitHub Actions + PyPI trusted publishing (OIDC) — no secrets required.

One-time PyPI setup:

  1. Claim the wpa-mcp project on PyPI.
  2. Add a Trusted Publisher:
    • Owner: Jialong-zhong
    • Repository: wpr-xperf-mcp-server
    • Workflow: publish.yml
    • Environment: pypi

Then, to ship a new version:

# bump version in pyproject.toml, commit, then:
git tag v0.2.0
git push origin v0.2.0

The Publish to PyPI workflow (on tag v*) will build the sdist + wheel and publish automatically.

Troubleshooting

Symptom Likely cause Fix
wpaexporter not found WPT not installed or path wrong Install Windows Performance Toolkit; set WPAEXPORTER_PATH
xperf stats failed ETL corrupted or not a WPR trace Re-capture; ensure you ran wpr -stop <file> successfully
columns missing in summarizer Your WPA version renamed columns Open the corresponding .wpaProfile and adjust <Column Name=...> to match your WPA
has_stacks: false in validation -start CPU not used during capture, or no admin Re-capture with -start CPU as Administrator
Empty dwm_slow_frames DesktopComposition profile wasn't enabled Re-capture with -start DesktopComposition
ready_latency_top all near zero during a hang The thread isn't ready-waiting → it's doing work Run render_flamegraph on CPU exports to see what work

FAQ

Q: Does this need WPA GUI installed?No. Only wpaexporter.exe and xperf.exe (both from the Windows Performance Toolkit) are called. WPA GUI never launches.

Q: Can I use this on Linux/macOS?The MCP server itself is pure Python. But wpaexporter / xperf only exist on Windows, so analysis must run on Windows. A common setup is: capture on Windows, copy ETL to a Windows analysis box, run wpa-mcp there.

Q: Why not parse ETL directly in Python?ETL parsing is deep. Microsoft already ships an excellent, correct parser (wpaexporter) that understands every kernel + provider schema. Reusing it is cheaper and more accurate than reimplementing.

Q: Can I add my own WPA profile?Yes. Drop a .wpaProfile into wpa/profiles/, add a key to PROFILE_MAP in server.py, and (optionally) a summarizer in wpa/summarizer.py.

Q: Does the LLM see the full CSV?No — by design. The LLM sees compact summary JSON plus (optionally) folded-stack text. Raw CSVs stay on disk and are referenced by path.

License

MIT. See LICENSE.

MCP Server · Populars

MCP Server · New