MCP Observatory
Regression detection for MCP servers. Checks capabilities, invokes tools, detects schema drift, and diffs runs over time. Supports local stdio and remote HTTP/SSE targets.
Install
npx @kryptosai/mcp-observatory
That's it. Running with no arguments auto-discovers your MCP servers from Claude Code and Claude Desktop configs and checks them all.
To check a specific server, pass the command directly:
npx @kryptosai/mcp-observatory run -- npx -y @modelcontextprotocol/server-everything
npx @kryptosai/mcp-observatory run --invoke-tools -- npx -y @modelcontextprotocol/server-everything
npx @kryptosai/mcp-observatory check tools -- npx -y @modelcontextprotocol/server-filesystem .
Or use a target config file for more options (env vars, metadata, custom timeout):
{
"targetId": "filesystem-server",
"adapter": "local-process",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "."],
"timeoutMs": 15000
}
npx @kryptosai/mcp-observatory run --target ./target.json
Server Compatibility
Works with ~95% of the MCP server ecosystem across both standard transports:
| Transport | Examples | Adapter |
|---|---|---|
| stdio (most servers) | filesystem, memory, sequential-thinking, context7, brave-search, sentry, notion, stripe, eslint | local-process |
| HTTP/SSE (remote) | Cloudflare, Exa, Tavily | http |
| Docker | All @modelcontextprotocol/server-* images |
local-process via docker run -i |
Servers needing API keys work via env in the target config. Python servers work via uvx. See the full compatibility matrix for tested servers, setup examples, and known incompatibilities.
Working Surface
run: execute checks against one target and persist a run artifactdiff: compare two runs and classify regressions, recoveries, and schema driftreport: turn a saved run artifact into readable terminal, JSON, or Markdown outputscan: auto-discover MCP servers from local config files and check them all (default command)check: run a single capability check (tools, prompts, resources, or tools-invoke)
Scan
Auto-discover MCP server configs from Claude Code, Claude Desktop, and project-level config files, then run checks against every discovered server. This is the default command — running mcp-observatory with no arguments runs scan:
mcp-observatory
mcp-observatory scan
mcp-observatory scan --config ~/.claude.json
mcp-observatory scan --invoke-tools
Scanned locations (in order):
~/.claude.json(Claude Code)~/Library/Application Support/Claude/claude_desktop_config.json(Claude Desktop, macOS)%APPDATA%/Claude/claude_desktop_config.json(Claude Desktop, Windows).claude.json(current directory).mcp.json(current directory)
Check
Run a single capability check for faster iteration:
mcp-observatory check tools --target ./my-server.json
mcp-observatory check prompts --target ./my-server.json
mcp-observatory check resources --target ./my-server.json
mcp-observatory check tools-invoke --target ./my-server.json
Tool Invocation
Go beyond listing — actually call tools and verify they execute. Only safe tools are invoked: those with no required parameters or with readOnlyHint annotation. Arguments are auto-generated from the tool's JSON Schema.
mcp-observatory scan --invoke-tools
mcp-observatory run --target ./my-server.json --invoke-tools
mcp-observatory check tools-invoke --target ./my-server.json
Schema Drift
When diffing two runs, schema changes are detected automatically. Added/removed required fields, property changes, and type changes are surfaced alongside status regressions:
mcp-observatory diff --base run-a.json --head run-b.json
Example output:
Schema Drift:
- search (tools): added required field 'limit', changed 'query' type from 'number' to 'string'
- old-tool (tools): removed
Watch
Re-run checks on an interval and output only when something changes:
mcp-observatory run --target ./my-server.json --watch
mcp-observatory run --target ./my-server.json --watch --interval 60
Press Ctrl+C to stop.
HTTP / SSE Targets
In addition to local-process stdio, you can check remote MCP servers over HTTP (Streamable HTTP and SSE):
{
"targetId": "my-remote-server",
"adapter": "http",
"url": "http://localhost:3000/mcp",
"authToken": "optional-bearer-token",
"timeoutMs": 15000
}
mcp-observatory run --target ./remote-target.json
mcp-observatory run --target ./remote-target.json --invoke-tools
HTML Reports
Generate a self-contained HTML report from any run or diff artifact:
mcp-observatory report --run ./run-artifact.json --format html --output report.html
mcp-observatory diff --base run-a.json --head run-b.json --format html --output diff.html
Open the file in any browser — no server required, shareable via Slack/email/GitHub comments.
Limitations
- Servers requiring interactive OAuth (e.g., Google Drive) need pre-authentication before Observatory can connect
- Custom WebSocket transports (e.g., BrowserTools MCP) are not supported
- A few servers time out or close before init — see known issues and compatibility
Repo-Local Validation
If you do want the repo-local path, use the fixture flow:
npm install
npm run cli -- run --target tests/fixtures/sample-target-config.json
npm run cli -- diff --base tests/fixtures/sample-run-a.json --head tests/fixtures/sample-run-b.json
npm run cli -- report --run tests/fixtures/sample-run-b.json --format markdown --output examples/results/sample-report.md
This path exists to validate the repo itself. It is not the only way to use the tool.
Artifact Contract
Every artifact is versioned and intentionally boring:
artifactType:runordiffschemaVersion: currently1.0.0gate:passorfail
Compatibility rules:
- additive changes stay within
1.x - breaking artifact changes require a major schema bump
Published schemas:
- schemas/run-artifact.schema.json
- schemas/diff-artifact.schema.json
Validate checked-in artifacts locally:
npm run validate:artifacts
Contributing
See CONTRIBUTING.md for the contribution bar and the kinds of work likely to be declined.
The fastest way to contribute something credible is to add evidence:
- a real passing target with a distinct capability shape
- a clearer report surface
- a cleaner startup diagnosis
Known Issues
See docs/known-issues.md for the difference between unsupported and failed, and docs/compatibility.md for the full compatibility matrix including servers that don't work and why.