wpa-mcp
An MCP (Model Context Protocol) server that turns Windows WPR
.etltraces into structured, LLM-friendly performance insights — using WPAExporter + xperf under the hood, and optionally emitting flamegraph-ready folded stacks.
wpa-mcp bridges two worlds:
- Windows Performance Analyzer (WPA) — the gold standard for analyzing ETW / WPR traces, but GUI-heavy and hard to automate.
- LLMs (Claude, Copilot, GPT, …) — great at reasoning across evidence, but blind to
.etlfiles.
This server exposes a small set of MCP tools so an LLM can:
- Validate a trace (does it actually contain the events needed for analysis?)
- Export the right WPA tables to CSV via predefined profiles
- Summarize the CSVs into a compact JSON (Top N processes, hot stacks, ready-thread latency, DPC/ISR offenders, UI jank)
- Render a Brendan-Gregg-style folded stack file for flamegraphs — or for the LLM to read directly
Table of contents
- Architecture
- Prerequisites
- Install
- Capture a trace
- MCP tools
- Built-in WPA profiles
- Analysis examples
- Example 1: Runaway CPU
- Example 2: UI hang / "not responding"
- Example 3: Audio/mouse glitch caused by a driver
- Example 4: Feeding folded stacks to the LLM
- Client configuration
- Release process
- Troubleshooting
- FAQ
Architecture
+------------------+ stdio (MCP) +--------------------+
| LLM / MCP host | <--------------------> | wpa-mcp server |
| (Claude, VSCode) | | (this repo) |
+------------------+ +----------+---------+
|
subprocess |
v
+---------------------+---------------------+
| xperf.exe | wpaexporter.exe |
| (validate / stats) | (+ .wpaProfile) |
+---------------------+---------------------+
|
v
CSV tables (per profile)
|
v
summarizer -> JSON / flamegraph -> .folded
Everything that the LLM sees is structured JSON or compact folded-stack text — never raw gigabyte CSVs.
Prerequisites
- Windows 10/11 (required; the analysis tools are Windows-only)
- Windows Performance Toolkit (WPT) installed (ships with Windows ADK / Windows SDK)
wpaexporter.exexperf.exe
- Python 3.10+
If WPT is installed to a non-default path, set:
setx WPAEXPORTER_PATH \"C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\wpaexporter.exe\"
setx XPERF_PATH \"C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\xperf.exe\"
Install
Via pipx (recommended — works once published to PyPI)
pipx install wpa-mcp
wpa-mcp # starts the MCP stdio server
From source
git clone https://github.com/Jialong-zhong/wpr-xperf-mcp-server.git
cd wpr-xperf-mcp-server
pip install -e .
wpa-mcp
Capture a trace
The server's analyses are only as good as the providers you captured. Recommended capture for the four problem classes this server targets:
# Run as Administrator
wpr -start CPU ^
-start GeneralProfile ^
-start DesktopComposition ^
-start Registry ^
-filemode
# ... reproduce the issue ...
wpr -stop C:\traces\case01.etl \"repro notes here\"
| WPR profile | What it adds that wpa-mcp uses |
|---|---|
CPU |
Sampled CPU, CSwitch, ReadyThread, StackWalk |
GeneralProfile |
Processes, images, DPC/ISR |
DesktopComposition |
DWM frame timing, Window-in-focus (UI hang evidence) |
Registry |
Registry activity (optional; useful for startup/UI hangs) |
If you skip
CPU, the most valuable analyses (hot stacks, scheduling latency) won't work —validate_tracewill tell you so.
MCP tools
| Tool | Purpose | Typical caller |
|---|---|---|
validate_trace(etl_path) |
Run xperf -a stats and report which providers / stacks exist |
LLM, always first |
export_tables(etl_path, profile) |
Run one WPA profile via wpaexporter and return CSV paths |
Advanced / targeted |
analyze_etl(etl_path, focus) |
Validate → export (by focus) → summarize. Returns one structured JSON | LLM, default entry point |
render_flamegraph(out_dir) |
Aggregate CPU Usage (Sampled) stacks into Brendan-Gregg folded format |
After analyze_etl with CPU focus |
analyze_etl input schema
{
\"etl_path\": \"C:\\traces\\case01.etl\",
\"focus\": \"cpu | latency | ui | dpc_isr | all\",
\"out_dir\": \"optional override\",
\"top_n\": 20
}
analyze_etl output shape (abbreviated)
{
\"etl\": \"C:\\traces\\case01.etl\",
\"focus\": \"all\",
\"validation\": {
\"duration_sec\": 42.7,
\"has_cpu_sampling\": true,
\"has_cswitch\": true,
\"has_readythread\": true,
\"has_stacks\": true,
\"has_dpc_isr\": true,
\"has_dwm\": true,
\"warnings\": []
},
\"exports\": [\"...\\cpu\\CPU Usage (Sampled)_...csv\", \"...\"],
\"summary\": {
\"cpu_top_processes\": [{\"process\": \"chrome.exe\", \"weight_ms\": 8421.3}],
\"cpu_top_modules\": [{\"module\": \"ntdll.dll\", \"weight_ms\": 2310.0}],
\"cpu_hot_stacks\": [{\"stack\": \"ntdll!... ; app!hot_fn\", \"weight_ms\": 1240.0}],
\"ready_latency_top\": [{\"process\": \"explorer.exe\", \"tid\": 1234, \"p95_ms\": 187.0}],
\"dpc_isr_top\": [{\"driver\": \"ndis.sys\", \"total_ms\": 95.2, \"max_us\": 820}],
\"ui_focus_top\": [{\"process\": \"myapp.exe\", \"focus_ms\": 5400.0}],
\"dwm_slow_frames\": {\"count\": 38, \"p95_ms\": 41.7, \"max_ms\": 128.0}
}
}
Built-in WPA profiles
Each profile is a .wpaProfile XML that tells wpaexporter which WPA tables + columns to dump.
| Focus key | File | Tables exported |
|---|---|---|
cpu |
wpa/profiles/cpu_hotpath.wpaProfile |
CPU Usage (Sampled) |
latency |
wpa/profiles/scheduling_latency.wpaProfile |
CPU Usage (Precise), Ready Thread |
ui |
wpa/profiles/ui_hang.wpaProfile |
Window In Focus, DWM Frame Details |
dpc_isr |
wpa/profiles/dpc_isr.wpaProfile |
DPC/ISR Duration |
Column sets are deliberately minimal to keep CSVs small and summarizer-friendly.
Analysis examples
These are end-to-end, copy-pasteable walkthroughs. Each shows the user prompt, the tool calls the LLM should make, the JSON shape you can expect, and the conclusions a well-prompted LLM should draw.
Example 1: Runaway CPU
User: "C:\traces\cpu_spike.etl — some process is pinning my CPU at 100%. Find it and tell me which function."
LLM tool calls:
// 1) validate
validate_trace({ \"etl_path\": \"C:\\traces\\cpu_spike.etl\" })
// 2) analyze focused on CPU
analyze_etl({ \"etl_path\": \"C:\\traces\\cpu_spike.etl\", \"focus\": \"cpu\", \"top_n\": 10 })
// 3) (optional) generate folded stacks for the hottest paths
render_flamegraph({
\"out_dir\": \"C:\\traces\\cpu_spike_wpa_out\\cpu\",
\"top_n\": 200,
\"min_weight_ms\": 5
})
Typical summary returned:
{
\"cpu_top_processes\": [
{ \"process\": \"myapp.exe\", \"weight_ms\": 38210.0 },
{ \"process\": \"chrome.exe\", \"weight_ms\": 2100.0 }
],
\"cpu_top_modules\": [
{ \"module\": \"myapp.exe\", \"weight_ms\": 31020.0 },
{ \"module\": \"ucrtbase.dll\", \"weight_ms\": 4100.0 }
],
\"cpu_hot_stacks\": [
{ \"stack\": \"ntdll!RtlUserThreadStart;kernel32!BaseThreadInitThunk;myapp!worker_loop;myapp!compress_chunk\",
\"weight_ms\": 27800.0 }
]
}
What the LLM should conclude:
- Trace duration 42 s,
myapp.execonsumed ~38 s of CPU ≈ ~90% of wall time on a single core equivalent. - Nearly all weight is in
myapp!compress_chunkviaworker_loop→ this is a hot loop in the app itself, not the runtime or OS. - Recommend: inspect
compress_chunk, consider SIMD / chunking, and capture a second trace with-start CPU -start FileIOto see if the loop is also I/O-bound.
Example 2: UI hang / "not responding"
User: "When the user clicks Save, the window freezes for ~2 seconds. Trace at C:\traces\save_hang.etl."
LLM tool calls:
validate_trace({ \"etl_path\": \"C:\\traces\\save_hang.etl\" })
analyze_etl({ \"etl_path\": \"C:\\traces\\save_hang.etl\", \"focus\": \"ui\" })
analyze_etl({ \"etl_path\": \"C:\\traces\\save_hang.etl\", \"focus\": \"latency\" })
Expected signals:
{
\"ui_focus_top\": [{ \"process\": \"myapp.exe\", \"focus_ms\": 5400 }],
\"dwm_slow_frames\": { \"count\": 42, \"p95_ms\": 58.0, \"max_ms\": 1980.0 },
\"ready_latency_top\": [
{ \"process\": \"myapp.exe\", \"tid\": 7321, \"p95_ms\": 14.0 }
]
}
Reasoning rubric the LLM should apply:
dwm_slow_frames.max_ms ≈ 1980while the app is in focus → the hang is real and UI-thread-bound.ready_latency_topfor the UI thread is low (≈14 ms p95) → the thread wasn't starved for CPU; it was busy on its own work (otherwise we'd see hundreds of ms of ready-wait).- Therefore the UI thread is doing synchronous work during Save. Likely culprits: synchronous disk/network I/O, lock contention, or a heavy compute path on the UI thread.
- Next step: re-capture with
-start CPU -start FileIO -start DiskIOand focus on the UI thread's stacks withrender_flamegraphto see exactly what it's doing during those 2 seconds.
Contrast: if
ready_latency_tophad been, say,p95_ms: 900— then the UI thread was ready but not scheduled. That's a scheduling/contention story, not a "heavy work on UI thread" story.
Example 3: Audio/mouse glitch caused by a driver
User: "Audio crackles and the mouse skips for a split second every few minutes. C:\traces\glitch.etl."
LLM tool calls:
validate_trace({ \"etl_path\": \"C:\\traces\\glitch.etl\" })
analyze_etl({ \"etl_path\": \"C:\\traces\\glitch.etl\", \"focus\": \"dpc_isr\" })
Expected signal:
{
\"dpc_isr_top\": [
{ \"driver\": \"Netwtw10.sys\", \"total_ms\": 312.4, \"max_us\": 4120, \"count\": 1820 },
{ \"driver\": \"ndis.sys\", \"total_ms\": 95.1, \"max_us\": 820, \"count\": 4300 },
{ \"driver\": \"nvlddmkm.sys\", \"total_ms\": 60.0, \"max_us\": 410, \"count\": 2100 }
]
}
What the LLM should conclude:
Netwtw10.sys(Intel Wi-Fi driver) has a single DPC over 4 ms — that's well above the ~1 ms "don't cause audio glitches" rule of thumb.- Correlation with symptom: Wi-Fi DPC storms typically line up with mouse/audio skips because DPCs run at elevated IRQL and block the audio/HID stack.
- Recommend: update the Wi-Fi driver; if the problem persists, disable power-saving for the Wi-Fi adapter and re-capture.
Quality rules wpa-mcp's prompting guide bakes in: any driver with
max_us > 1000is suspicious,>= 500worth mentioning.
Example 4: Feeding folded stacks to the LLM
After analyze_etl with focus="cpu", you can ask the LLM to drill deeper:
render_flamegraph({
\"out_dir\": \"C:\\traces\\cpu_spike_wpa_out\\cpu\",
\"output_path\": \"C:\\traces\\cpu_spike.folded\",
\"top_n\": 300,
\"min_weight_ms\": 2
})
Returns:
{
\"folded_file\": \"C:\\traces\\cpu_spike.folded\",
\"source_csv\": \"C:\\traces\\cpu_spike_wpa_out\\cpu\\CPU Usage (Sampled)_....csv\",
\"line_count\": 287,
\"total_weight_ms\": 39120.0,
\"preview\": \"ntdll!RtlUserThreadStart;kernel32!BaseThreadInitThunk;myapp!worker_loop;myapp!compress_chunk 27800\\nntdll!... ; myapp!parse_header 410\\n...\"
}
You can now either:
Render an SVG flamegraph (requires Perl + Brendan Gregg's script):
flamegraph.pl C:\traces\cpu_spike.folded > C:\traces\cpu_spike.svgOr just let the LLM read the
preview— the folded format is already much easier for an LLM than raw CSV.
Client configuration
Claude Desktop — %APPDATA%\\Claude\\claude_desktop_config.json
{
\"mcpServers\": {
\"wpa\": {
\"command\": \"wpa-mcp\",
\"env\": {
\"WPAEXPORTER_PATH\": \"C:/Program Files (x86)/Windows Kits/10/Windows Performance Toolkit/wpaexporter.exe\",
\"XPERF_PATH\": \"C:/Program Files (x86)/Windows Kits/10/Windows Performance Toolkit/xperf.exe\"
}
}
}
}
VS Code (GitHub Copilot Chat / MCP) — .vscode/mcp.json
Already included in this repo. It points at server.py in the workspace.
Custom MCP host
Any MCP client that speaks stdio works. Launch wpa-mcp (or python server.py) as a child process and send tools/list + tools/call over stdio.
Release process
This repo publishes to PyPI via GitHub Actions + PyPI trusted publishing (OIDC) — no secrets required.
One-time PyPI setup:
- Claim the
wpa-mcpproject on PyPI. - Add a Trusted Publisher:
- Owner:
Jialong-zhong - Repository:
wpr-xperf-mcp-server - Workflow:
publish.yml - Environment:
pypi
- Owner:
Then, to ship a new version:
# bump version in pyproject.toml, commit, then:
git tag v0.2.0
git push origin v0.2.0
The Publish to PyPI workflow (on tag v*) will build the sdist + wheel and publish automatically.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
wpaexporter not found |
WPT not installed or path wrong | Install Windows Performance Toolkit; set WPAEXPORTER_PATH |
xperf stats failed |
ETL corrupted or not a WPR trace | Re-capture; ensure you ran wpr -stop <file> successfully |
columns missing in summarizer |
Your WPA version renamed columns | Open the corresponding .wpaProfile and adjust <Column Name=...> to match your WPA |
has_stacks: false in validation |
-start CPU not used during capture, or no admin |
Re-capture with -start CPU as Administrator |
Empty dwm_slow_frames |
DesktopComposition profile wasn't enabled |
Re-capture with -start DesktopComposition |
ready_latency_top all near zero during a hang |
The thread isn't ready-waiting → it's doing work | Run render_flamegraph on CPU exports to see what work |
FAQ
Q: Does this need WPA GUI installed?No. Only wpaexporter.exe and xperf.exe (both from the Windows Performance Toolkit) are called. WPA GUI never launches.
Q: Can I use this on Linux/macOS?The MCP server itself is pure Python. But wpaexporter / xperf only exist on Windows, so analysis must run on Windows. A common setup is: capture on Windows, copy ETL to a Windows analysis box, run wpa-mcp there.
Q: Why not parse ETL directly in Python?ETL parsing is deep. Microsoft already ships an excellent, correct parser (wpaexporter) that understands every kernel + provider schema. Reusing it is cheaper and more accurate than reimplementing.
Q: Can I add my own WPA profile?Yes. Drop a .wpaProfile into wpa/profiles/, add a key to PROFILE_MAP in server.py, and (optionally) a summarizer in wpa/summarizer.py.
Q: Does the LLM see the full CSV?No — by design. The LLM sees compact summary JSON plus (optionally) folded-stack text. Raw CSVs stay on disk and are referenced by path.
License
MIT. See LICENSE.