pentest-ai

Autonomous pentests from one command. Real tools. Real PoCs. Real reports.

License

Website · Install · Docs · Agents

⚠️ Authorized testing only

pentest-ai is offensive security tooling. It executes real network and hostoperations against the targets you specify. You are solely responsible forensuring you have explicit, written authorization to test every target.

By installing or running ptai you agree to theAcceptable Use Policy and theTerms of Service. Testing systems you do notown without written authorization may violate the Computer Fraud and AbuseAct, the Computer Misuse Act 1990, GDPR Article 32, and equivalents in yourjurisdiction. Misuse is your sole responsibility.

First-run prompts you to confirm AUP acceptance and persists the choice to~/.pentest-ai/aup-consent.txt. Set PENTEST_AI_AUP_ACCEPTED=1 in CI tobypass the prompt non-interactively.

Point it at a target. It runs recon, logs into the app, chains vulnerabilities into attack paths, proves every finding with a working PoC, and hands back a report your blue team can act on.

No cloud. No telemetry. Your laptop, your keys, your data.

See it run
Install
What makes it different
How it works
Who uses it for what
Playbooks
Drop it into your CI
vs the field
What's inside
Responsible use
The ecosystem
Beyond the OSS
Community
Contributing
Star history
License

See it run

$ ptai auth profile add staging-acme            # one-time, password from your secrets manager
$ ptai start https://staging.acme.com --auth-profile staging-acme

[+] engagement eng-e512f47b  target=staging.acme.com  scope=web

[auth]      ✓ Logged in as admin. Session captured, refresh in 14:32.
[recon]     ✓ 3 open ports, 7 subdomains, Apache/PHP fingerprint.
[web]       ✓ 21 findings behind auth. 3 SQLi, 4 XSS, missing CSP, CSRF gap.
[chain]     ✓ Attack path found in 2 hops:
              reflected XSS + cookie without Secure flag → admin session hijack
[validate]  ✓ 3 findings proven with non-destructive PoCs.
[detect]    ✓ Generated Sigma, SPL, KQL rules for the blue team.
[report]    ✓ reports/eng-e512f47b.html  ·  12 pages  ·  client-ready

Total: 4m 18s. Cost: $0.73 in Claude tokens.

That was one command. You were pouring coffee. The password came from an env var, 1Password, HashiCorp Vault, or AWS Secrets Manager — see Credentialed scans.

Install

pip install ptai

Just paid for a workspace at pentestai.xyz? SeeGetting Started for the 15-minute path from"You're in" to your first engagement.

Use it with your Claude Code account (recommended)

Already pay for Claude Pro or Max? Skip the API key. Wire ptai into Claude Code as an MCP server and your subscription runs the engagement.

Option A — one-line CLI (Claude Code users):

claude mcp add pentest-ai -- ptai mcp

Done. Restart Claude Code and the tools show up.

Option B — interactive wizard (Claude Desktop, Cursor, VS Code Copilot):

ptai setup --mcp

Auto-detects the clients you have installed, writes their config files, and tells you to restart them.

Then, in any of those clients:

Run an authenticated pentest against staging.acme.com. Login is at /login with username admin and password in $APP_PASS. Summarize the high-severity findings when done.

Claude Code (or Cursor, or Copilot) picks up the tools, runs the engagement through your subscription, and streams results back into your conversation. Zero API spend.

Or use an API key

For CI pipelines, scheduled runs, or standalone use without an MCP client:

export ANTHROPIC_API_KEY=sk-ant-...   # Claude (best results)
# or
export OPENAI_API_KEY=sk-...          # OpenAI
# or, fully local, no cloud
export OLLAMA_HOST=localhost:11434    # Ollama
# or, any of 300+ models via LiteLLM (OpenRouter, Azure, DeepSeek, Groq, Mistral, Together AI, Bedrock, Vertex AI, Cohere, ...)
pip install ptai[litellm]
ptai start <target> --provider litellm --model openrouter/anthropic/claude-sonnet-4

ptai start https://your-target.com

First run installs the tool deps it needs (nmap, nuclei, ffuf, sqlmap, gobuster, and more). No setup afterwards.

No LLM at all (interactive launcher)

Want to drive the underlying tools without an LLM? Run:

ptai menu

Numeric category navigation, search (/term), tag filtering (t web),and a keyword-based recommendation engine. Lower the friction to zerofor first-time users. Real engagements still go through ptai startwith full scope confirmation.

HTTP REST API (for dashboards and integrations)

pip install ptai[api]
ptai serve --port 8888

Endpoints: /health, /version, /agents, /tools, /engagements(list, detail, findings, chains, detection rules, SARIF export). Writeendpoints (POST /engagements, POST /engagements/{id}/abort) requireAuthorization: Bearer $PENTEST_AI_API_TOKEN so the server can'taccidentally launch real scans if exposed. Live event stream atWS /engagements/{id}/stream.

For non-MCP clients, web dashboards, and CI integrations.

Load other MCP servers as tool sources

Compose pentest-ai with hexstrike or any other MCP-compatible securityserver. External tools become available to the agents alongside thenative ones. Edit ~/.pentest-ai/mcp_servers.json:

{
  "servers": [
    {"name": "hexstrike", "command": "python3 hexstrike_mcp.py", "transport": "stdio"}
  ]
}

Take over mid-run (HITL teleoperation)

While an engagement is running, press Ctrl+C twice within 600ms topause the orchestrator and drop into a REPL: step, inspect findings,inject <instruction>, skip, resume, abort. Acknowledges thatcurrent LLMs aren't fully autonomous. The operator owns the call whenit matters.

Public benchmarks

Reproducible solve-rate measurements live in benchmarks/.Run them yourself:

./benchmarks/scripts/run_all.sh   # writes JSON per run + RESULTS.md

Spec, harness, results all in git. No "98.7% detection rate" claims youcan't audit.

Optional: cloud workspace (Pro / Team / Enterprise)

The CLI is free forever and stores everything locally. If you want engagement history, branded client-ready PDF reports, and team collaboration in a dashboard, link the CLI to an app.pentestai.xyz workspace:

# 1. Sign up, then Dashboard → API Keys → Generate → copy ptai_...
ptai auth login        # paste the key (hidden prompt)
ptai auth status       # confirm link
# or use an env var for CI:
export PENTESTAI_API_KEY=ptai_...

Now every ptai start / ptai scan run auto-syncs findings to your cloud workspace. No cloud = no calls; the integration is silently off unless you log in. To unlink: ptai auth logout.

What makes it different


🤖 Autonomous	Ten agents cover recon, web, AD, cloud, chaining, PoC, detection, and report. They coordinate on their own.
🔐 It logs in	Most scanners die at the login page. This one holds a session, rotates creds, and every downstream tool inherits the cookie.
🔑 Credentials never leak	Auth profiles store references (env vars, `op://`, Vault paths, AWS Secrets Manager ARNs), never the value. Passwords never enter your shell history, the LLM context, the findings DB, or process argv.
🧪 Every finding is proven	A working proof of concept runs against the target. No more triaging 40 maybes from a noisy scanner.
📋 Your methodology, in YAML	Encode your pentest checklist as a playbook. Share it. Fork someone else's. Like Nuclei templates, for methodology.
🔄 Diff mode	`ptai retest <id>` shows what's new, fixed, or still broken. The fix → retest → confirm loop becomes one command.
⚡ CI-native	A GitHub Action, GitLab template, severity gates, SARIF output, and PR comments. Works the day you drop it in.
🧠 LLM red team	Probe your AI features for prompt injection, jailbreaks, and OWASP LLM Top 10. Eighty probes built in.
🔌 Works with Claude, Cursor, Copilot	An MCP server with 35+ tools. Talk to your assistant: "diff last week's engagement against today's."
💾 Runs on your laptop	MIT licensed. No cloud calls. Works offline with Ollama. Your findings stay on your disk.

How it works

┌─────────────────────────────────────────────────────────────┐
│                    ptai start <target>                      │
└─────────────────────────────────────────────────────────────┘
                             │
          ┌──────────────────┼──────────────────┐
          ▼                  ▼                  ▼
      ┌────────┐        ┌────────┐        ┌─────────┐
      │ recon  │   →    │  auth  │   →    │   web   │
      └────────┘        └────────┘        └─────────┘
                                               │
          ┌────────────────────────────────────┤
          ▼                                    ▼
      ┌────────┐                          ┌─────────┐
      │   ad   │   ┌──────────────────┐   │ cloud   │
      └────────┘   │  Findings DB     │   └─────────┘
          │        │  (sqlite + evidence)│       │
          └───────▶│  scope-guarded     │◀──────┘
                   │  deduplicated      │
                   └──────────────────┘
                             │
                ┌────────────┼────────────┐
                ▼            ▼            ▼
           ┌──────┐    ┌─────────┐  ┌──────────┐
           │chain │    │validate │  │ detect   │
           └──────┘    └─────────┘  └──────────┘
                             │
                             ▼
                       ┌──────────┐
                       │  report  │   md · html · pdf · SARIF · JUnit
                       └──────────┘

Each agent runs with an LLM when you've set a key, or as a deterministic tool loop when you haven't. Either way the phase order is the same.

Who uses it for what

AppSec teams. Wire ptai into your CI. Every PR against staging gets an authenticated scan. The build fails on high-severity findings. The fix → retest → confirm loop runs on its own.

Consultants. Scope a week-long engagement, point ptai at the estate, and spend your time on the creative work instead of glueing scanners together and writing the report. The report is already written.

Bug bounty hunters. Run it over breakfast. Come back to a list of validated findings with PoCs ready to paste into HackerOne.

Red teamers. Drop your internal AD methodology into a YAML playbook. Run it against every new engagement. Share it with your team.

Developers shipping AI features. Enable --enable-llm-redteam against your chatbot. Get an OWASP LLM Top 10 report in minutes.

Playbooks

Your methodology as a file. Checked into git. Shared with your team.

name: internal-ad-pentest
inputs:
  domain: { required: true, prompt: "AD domain" }
  dc_ip:  { required: true, prompt: "DC IP" }

phases:
  - id: recon
    tools: [nmap, masscan]

  - id: ad-enum
    depends_on: [recon]
    condition: "any_finding(type='open_port', port=445)"
    tools: [enum4linux, ldapsearch, bloodhound-python]

  - id: kerberoast
    requires_finding: { type: ad_user_enumerated }
    tools: [impacket-getuserspns]
    llm_decide: true         # let the LLM skip if context says useless

ptai playbook list                  # show installed playbooks
ptai playbook show web-app-quick    # preview before running
ptai playbook run ./my-ad.yaml      # execute

Five playbooks ship built-in. A community catalog is coming.

Drop it into your CI

# .github/workflows/security.yml
name: Security scan
on: [pull_request]

jobs:
  ptai:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install ptai
      - run: |
          ptai start ${{ vars.STAGING_URL }} \
            --ci \
            --fail-on high \
            --sarif pentest.sarif
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      - uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: pentest.sarif

Findings post as a PR comment, SARIF uploads to GitHub Code Scanning, and the build fails on gated severity. GitLab and Jenkins templates in docs/ci-cd.md.

vs the field

	`ptai`	Sn1per	Nuclei	Burp Pro	PentestGPT
Autonomous phase loop	✓	✓			✓
Authenticated scanning	✓	partial	raw HTTP	✓
Exploit chaining	✓				partial
PoC validation	✓			partial
Diff and retest	✓
CI-native (SARIF + gates)	✓		partial	partial
LLM red team	✓
YAML playbooks	✓		templates
MCP server	✓
License	MIT	GPL	MIT	commercial	MIT

What's inside

12 agents across recon, web, AD, cloud, mobile, wireless, browser, exploit chaining, PoC validation, detection, reporting, LLM red team, and social engineering
200+ tool wrappers with auto-install: nmap, masscan, nuclei, ffuf, sqlmap, gobuster, wapiti, nikto, dalfox, xsstrike, enum4linux, bloodhound-python, impacket's full suite, trufflehog, gitleaks, kube-hunter, trivy, and more
4000+ Nuclei templates integrated for atomic vulnerability detection across web, network, cloud, and CVE-specific checks
35+ MCP tools for LLM-driven engagements
300+ LLM models supported via the LiteLLM provider (Anthropic, OpenAI, Ollama direct; Azure, OpenRouter, DeepSeek, Groq, Mistral, Together AI, Bedrock, Vertex AI, Cohere via LiteLLM)
HTTP REST API + WebSocket surface (ptai serve) for non-MCP integrations
Local web dashboard with live engagement view, findings table, attack chain visualization, SARIF export
Browser automation agent with screenshot capture, DOM analysis, network capture, security header grading (Playwright-driven)
Human-In-The-Loop teleoperation (Ctrl+C twice to take over an engagement mid-run)
MCP client capability to load external MCP servers as tool sources
Public reproducible benchmark harness in benchmarks/ — your numbers, your code, in git
6 output formats: Markdown, HTML, PDF, SARIF 2.1.0, JUnit XML, compliance mappings (OWASP, CWE, CVE, CVSS v3.1)
500+ tests with CI on Python 3.11 and 3.12
MIT licensed, 100% yours

All ten agents (click to expand)

Agent	Phase	Does
`recon`	1	Port scan, DNS and subdomain enum, service fingerprinting
`web`	2	Authenticated OWASP Testing Guide v4 pass
`ad`	3	AD enum, Kerberoasting, BloodHound pathfinding, delegation abuse
`cloud`	4	AWS, Azure, GCP IAM, misconfig, K8s RBAC, serverless
`exploit_chain`	5	Correlates findings into multi-step attack paths
`poc_validator`	6	Non-destructive proof of concept per finding
`detection`	7	Sigma, SPL, KQL rules for the blue team
`report`	8	Markdown, HTML, PDF, SARIF, JUnit, compliance maps
`llm_redteam`	opt	OWASP LLM Top 10 probes
`social_engineer`	opt	Phishing corpus and pretext generation

Plus mobile and wireless agents for out-of-band engagements.

Responsible use

ptai is for authorized testing. On startup it loads a scope file. Out-of-scope hosts are refused at tool-invocation time. PoCs are non-destructive by default. Rate limits kick in automatically in stealth mode.

You are responsible for having written authorization before pointing this at anything you don't own. Don't be that person.

The ecosystem

Repo	What
pentest-ai	This repo. The CLI and MCP server. Python product.
pentest-ai-agents	Separate companion repo. Standalone Claude Code subagent markdown files. Optional, runs without this CLI.

Beyond the OSS

Running this on a team and need more? The website has the team dashboard and managed-assessment options.

The OSS tool stays OSS. Free forever.

Community

Questions, ideas, feedback: GitHub Discussions
Bug reports: GitHub Issues
Show and tell: post the wildest finding ptai gave you in Show and tell

Contributing

PRs welcome. Before you submit:

ruff check . && mypy . && pytest -q

See CONTRIBUTING.md for the full flow.

Star history

License

MIT. Do whatever you want with it.

If ptai saved you a Sunday, star the repo. It's the only payment I ask for.

pentest-ai

pentest-ai

⚠️ Authorized testing only

Table of Contents

See it run

Install

Use it with your Claude Code account (recommended)

Or use an API key

No LLM at all (interactive launcher)

HTTP REST API (for dashboards and integrations)

Load other MCP servers as tool sources

Take over mid-run (HITL teleoperation)

Public benchmarks

Optional: cloud workspace (Pro / Team / Enterprise)

What makes it different

How it works

Who uses it for what

Playbooks

Drop it into your CI

vs the field

What's inside

Responsible use

The ecosystem

Beyond the OSS

Community

Contributing

Star history

License

MCP Server · Populars

🦞 OpenClaw — Personal AI Assistant

MarkItDown-MCP

MarkItDown

Awesome MCP Servers

mcp-server-sentry: A Sentry MCP server

MCP Server · New

pentest-ai

ServiceGraph Agent Skills

Peekaboo 🫣 - Mac automation that sees the screen and does the clicks.

@jshookmcp/jshook

ableton-live-mcp