ποΈ Argus β Agentic QA Framework
An agentic QA framework that authors, generates, triages, and self-heals Playwrighttests for any web app β usable from Claude Code/Desktop as an MCP server or from CI as aCLI β with the whole loop running as a deployment gate in GitHub Actions.
[!NOTE]π§ Early development. Build progress is tracked as milestones M0 β M4 (see Roadmap).
Why
Test suites are expensive to write and brittle to maintain. Argus puts a Claude agent in theloop to do the slow parts: explore an app and write real Playwright tests, then β when the UIdrifts β diagnose the failure and open a fix PR, while still refusing to paper over genuine bugs.
The idea: one core, two consumers
Argus defines its QA tools once and exposes them twice.
ββββββββββββββββββββββββββββββββ
β @argus/core β
β Agent loop (Claude) β Anthropic Messages API + tool use
β + single Tool Registry β browser Β· dom Β· fs Β· playwright Β· git
βββββββββ¬ββββββββββββββββ¬βββββββ
β β
ββββββββββββββββΌβββ ββββΌββββββββββββββββ
β @argus/mcp β β @argus/cli β
β MCP server β β npx argus ... β
β (Claude Desktop)β β (used in CI) β
βββββββββββββββββββ ββββββββββββββββββββ
The loop: four behaviors
| Stage | Input | The agent⦠| Output |
|---|---|---|---|
| Author | Plain-English intent | compiles intent into a structured test plan | *.plan.json |
| Generate | A URL | explores the app, writes specs with assertions | tests/*.spec.ts |
| Triage | A failed run | classifies real bug vs DOM drift vs flake | root-cause report |
| Heal | A drift verdict | rewrites the locator, verifies green, opens a PR | a pull request |
Quickstart
pnpm install
cp .env.example .env # add your ANTHROPIC_API_KEY
pnpm build
Watch the agent run (E2E)
The agent loop is live. Point it at the bundled demo app and watch it explore β navigate, snapshotthe DOM, read data-testids, and click through the login β cart flow:
pnpm --filter @argus/core exec playwright install chromium # one-time
pnpm --filter @argus/sample-shop dev # terminal 1 β http://localhost:3100
node --env-file=.env packages/cli/dist/index.js smoke http://localhost:3100/login # terminal 2
It prints a step-by-step trace and a token/cost line (~$0.05β0.15 per run on the fast model).Requires a real Anthropic API key (a Max subscription doesn't fund the API).
Or have it write a test and run it green:
node --env-file=.env packages/cli/dist/index.js generate http://localhost:3100/login --run
It explores the app, writes tests/generated/login.spec.ts, and runs it against sample-shop(3 passed, 0 failed). Defaults to Opus for quality; add --model claude-haiku-4-5 for ~10Β’ runs.
Full CLI / MCP usage docs land with milestones M2 and M4.
Repo layout
argus/
ββ packages/
β ββ core/ # agent loop, tool registry, Claude client, prompts
β ββ mcp/ # MCP server wrapping the registry
β ββ cli/ # `argus author|generate|triage|heal`
ββ apps/
β ββ sample-shop/ # Next.js demo target (login + products + cart)
ββ tests/ # generated Playwright specs land here
Roadmap
- M0 β Foundations Β· monorepo, tooling, CI stub
- M1 β Core + sample-shop + Generate Β· the first "AI writes real tests" moment
- M2 β CLI + GitHub Actions gate Β· failing tests block deployment
- M3 β Triage + Heal Β· self-healing PRs
- M4 β MCP server + polish Β· drive Argus from Claude Desktop; demo GIFs
License
MIT Β© Piyush Pathak