thruk-mcp

License: MIT

Model Context Protocol (MCP) server for Thruk — the unified web frontend for Naemon, Nagios, Icinga and Shinken.

Expose Thruk's REST API to MCP-compatible clients (Claude Desktop, Dust, LibreChat, OpenWebUI...) so that an LLM can query hosts/services, schedule downtimes, acknowledge problems, force rechecks and more in natural language.

Features

Read: hosts, services, hostgroups, servicegroups, downtimes, comments, sites, aggregated stats, current problems
Write: schedule/delete downtimes, acknowledge & remove acks, force rechecks
Escape hatch: thruk_query tool to call any Thruk REST endpoint
Multi-backend support (Thruk federated sites): pass backends="prod,dr" to any tool
Transports: stdio (default) or Streamable-HTTP (--listen <port>, endpoint /mcp)
Async httpx client with proper error handling and TLS verification
Tested with pytest + respx, linted with ruff, packaged with hatchling

Quick start

1. Configure

cp .env.example .env
$EDITOR .env   # set THRUK_BASE_URL and THRUK_API_KEY

An API key can be created from the Thruk user profile page (requires api_keys_enabled in thruk_local.conf) or via the REST API itself.

2a. Run with Docker

docker compose up -d
# MCP Streamable-HTTP endpoint: http://localhost:8001/mcp

2b. Run locally

pip install thruk-mcp        # or: pipx install thruk-mcp

# stdio mode (for Claude Desktop, LibreChat, etc.)
thruk-mcp

# Streamable-HTTP mode — endpoint http://localhost:8001/mcp
thruk-mcp --listen 8001
# equivalently: thruk-mcp --transport streamable-http --listen 8001

# Behind a load balancer / multiple replicas, drop per-session state
# (no sticky routing required):
thruk-mcp --listen 8001 --stateless --json-response

# Multi-tenant: each request brings its own Thruk credentials via headers
# (no fixed THRUK_API_KEY at boot). Requires --stateless; serve over TLS.
thruk-mcp --listen 8001 --stateless --header-auth

For local development of the project itself, see CONTRIBUTING.md.

3. Wire it to an MCP client

Claude Desktop (~/.config/Claude/claude_desktop_config.json or macOS equivalent):

{
  "mcpServers": {
    "thruk": {
      "command": "thruk-mcp",
      "env": {
        "THRUK_BASE_URL": "https://monitor.example.com/thruk",
        "THRUK_API_KEY": "xxxxxxxx"
      }
    }
  }
}

4. Use with the Docker MCP Gateway

The image at ghcr.io/k9fr4n/thruk-mcp:latest defaults to stdio transport, so it can be spawned natively by the gateway.

Option A — Private local catalog

# 1. Create your private catalog
docker mcp catalog create thruk-private

# 2. Register this server (catalog/server.yaml ships with the repo)
docker mcp catalog add thruk-private thruk-mcp ./catalog/server.yaml

# 3. Configure credentials & enable
docker mcp secret set thruk-mcp.api_key=YOUR_KEY
docker mcp config write thruk-mcp.base_url=https://monitor.example.com/thruk
docker mcp server enable thruk-mcp

# 4. Run the gateway with your catalog
docker mcp gateway run --catalog thruk-private

Then point any MCP client (Claude Desktop, VS Code, Cursor, ...) at the gateway as documented here.

Option B — Submit upstream

catalog/server.yaml, catalog/tools.json and catalog/readme.md follow the docker/mcp-registry schema and can be submitted to the official Docker MCP Catalog via PR.

What's exposed

65 MCP Tools

Read — statethruk_list_hosts, thruk_get_host, thruk_list_services, thruk_get_service,thruk_list_hostgroups, thruk_list_servicegroups, thruk_list_contacts, thruk_get_contact,thruk_problems, thruk_stats, thruk_totals (compact 16-field host+service totals, fasterthan thruk_stats), thruk_sites.

Read — history & commentsthruk_list_logs, thruk_list_alerts, thruk_list_notifications,thruk_notification_summary (notifications grouped by contact/host/service/state/command),thruk_recent_events, thruk_list_comments, thruk_list_downtimes, thruk_get_downtime,thruk_state_at (reconstruct the parc state at a past instant from /logs — a post-mortemsnapshot), thruk_state_diff (what changed between two past instants t1 → t2, replayedfrom /logs).

Read — noise & flap analysisthruk_top_noisy_hosts (hosts ranked by alert count over a window),thruk_top_noisy_services (services ranked by alert count),thruk_flap_summary (hosts/services ranked by state transition count).

Read — problem intelligencethruk_oldest_problems (unhandled problems sorted by age, oldest first),thruk_unacked_critical (CRITICAL/DOWN not acknowledged for > N minutes),thruk_stale_acks (acknowledgements older than N days — forgotten problems),thruk_problem_counts (flat aggregate of unhealthy-state counts, filterable by hostgroup,custom vars or any structured filter — replaces the former thruk_problems_by_hostgroup),thruk_stale_checks (surface checks that stopped running — the dangerous "false green"),thruk_backend_health (per-site supervision-backend health: latency, replication lag,blind spots), thruk_worker_health (distinguish a real outage from a mod-gearmansupervision blind spot).

Read — analyticsthruk_alert_heatmap (alert counts bucketed by time, useful for spotting recurringpatterns), thruk_notification_heatmap (notification counts bucketed by time — spotmail/paging storms), thruk_concurrent_failures (windows where multiple hosts failedsimultaneously),thruk_recurring_problems (hosts/services generating repeated alerts over a window),thruk_root_cause (collapse a DOWN/UNREACHABLE storm into its root cause(s) via parenttopology), thruk_unreachable_vs_down (split a host outage window into DOWN cause vsUNREACHABLE consequence).

Read — availability / SLAthruk_host_availability (uptime % for a single host — time_up_percent, time_down_percent,time_unreachable_percent and scheduled equivalents),thruk_service_availability (ok/warning/critical/unknown % for a single service),thruk_hostgroup_availability (availability for all hosts or services in a hostgroup,sorted worst-first; type = hosts | services | both),thruk_hostgroup_availability_summary (one aggregated rollup instead of one row per host —time-weighted availability_percent, worst/best, below_threshold count, statedistribution; ideal for incident/SLA reports on large groups).All accept since/until (Thruk relative or ISO) or a timeperiod shortcut(lastmonth, thismonth, last24hours, lastweek, …).thruk_reliability_report (per host/service reliability metrics — MTTR / MTBF /incident counts — derived from the log over a window).thruk_incident_timeline (ordered event chronology — the post-mortem "déroulé" — for ahost, service or hostgroup: every state change, notification, downtime, flap andacknowledgement in time order, plus an incident/MTTR summary; a scoping filter is required).

Read — performance datathruk_get_perfdata (fetch and parse performance data for a single host or service),thruk_perfdata_snapshot (parsed perfdata for every service matching a filter, in one call),thruk_perfdata_near_threshold (metrics within within_percent % of breaching theirwarn/crit range — early-warning signal before an alert fires).

Write — downtime managementthruk_schedule_downtime (host/service), thruk_schedule_host_services_downtime(all services of a host), thruk_schedule_propagated_host_downtime (parent+children),thruk_schedule_hostgroup_downtime, thruk_schedule_servicegroup_downtime,thruk_delete_downtime, thruk_delete_active_downtimes,thruk_delete_downtimes_by_filter.

Write — problem handlingthruk_acknowledge, thruk_bulk_acknowledge (acknowledge multiple hosts/services in one call),thruk_remove_acknowledgement, thruk_recheck,thruk_add_comment, thruk_delete_comment,thruk_checks (enable/disable active checks for a host or service),thruk_notifications (enable/disable host or service notifications, with optionalcascade to all services of a host).

Escape hatchesthruk_query (raw call to any REST endpoint), thruk_run_background_query(long-running endpoint via Thruk's ?background=1 mechanism with automaticjob polling).

All list-style tools share a consistent limit / offset / sort / columnscontract. By default they return a tight subset of columns (~10 fields per row)to keep LLM token consumption low. Pass columns="" to opt out and receiveevery column the Thruk row contains.

5 MCP Resources

URI templates that MCP clients with a resource browser (Claude Desktop, VSCode, ...) can "open" like files:

URI	Content
`thruk://hosts/{name}`	Full host JSON
`thruk://services/{host}/{service}`	Full service JSON
`thruk://hostgroups/{name}`	Host group config + members
`thruk://problems`	Current unhandled problems (hosts + services)
`thruk://stats`	Aggregated host/service stats (cached)

3 MCP Prompts

Pre-canned workflows the user can invoke as a slash-command in the MCPclient UI:

Prompt	Arguments	Purpose
`investigate_alert`	`host`, optional `service`	7-step incident triage
`schedule_maintenance`	`target`, `duration_minutes`, `kind`	Safe downtime workflow with confirmation
`diagnose_flapping`	`host`, `service`	Root-cause a flapping service (uses `thruk_flap_summary`)
`daily_health_report`	optional `hostgroup`	Morning read-only health digest (totals, unacked, stale, oldest, noisiest)
`incident_triage`	optional `hostgroup`	Major-incident triage: blast radius, common cause, prioritised actions
`capacity_review`	optional `hostgroup`, `within_percent`	Saturation review of metrics nearing their warn/crit thresholds
`sla_report`	`target`, `kind`, `timeperiod`	Availability / SLA report with downtime breakdown and 99.9% verdict
`noise_review`	optional `since`	Alert-fatigue hygiene: noisiest, flapping, recurring, heatmap clustering

Robustness

Connection retries — httpx.AsyncHTTPTransport(retries=3) handles DNSfailures, connection refusals, TLS handshakes.
HTTP retries with backoff — 5xx and 429 responses are retried up to3 times with exponential backoff + jitter (cap 5 s).
Opt-in TTL cache — slow-moving endpoints (/sites, /processinfo,/hosts/stats, /services/stats, /contacts, /timeperiods, ...) arecached in-process for 15 s. Any tool can request caching viacache_ttl= on the underlying client. This absorbs the burst of identicalcalls an LLM agent typically issues across a multi-tool turn.
Pagination helper — ThrukClient.get_all() is an async generator thatiterates pages of 500 rows up to a configurable hard limit (default 50 000),so internal callers can scan entire backends without manual offset math.
Long-running queries — the thruk_run_background_query tool wrapsThruk's ?background=1 flow and polls /thruk/jobs/<id>/output until thejob completes (5 min default timeout).

Environment variables

Connection

Variable	Default	Description
`THRUK_BASE_URL`	`http://localhost/thruk`	Thruk URL (no trailing slash)
`THRUK_API_KEY`	(required)	`X-Thruk-Auth-Key` header
`THRUK_AUTH_USER`		Impersonation user (superuser key only)
`THRUK_VERIFY_SSL`	`true`	Set `false` for self-signed certs
`THRUK_TIMEOUT`	`30`	HTTP timeout in seconds
`THRUK_DEFAULT_BACKENDS`		CSV of default backend names (federated Thruk)

Security / multi-tenant (v0.6)

Variable	Default	Description
`THRUK_READ_ONLY`	`false`	Strip every write tool (ack, downtime, recheck, ...)
`THRUK_ENABLED_TOOLS`		Allowlist of tool names. CSV with fnmatch wildcards. Empty = all
`THRUK_AUDIT_LOG`	`true`	Emit one JSON audit line on stderr per write tool invocation
`THRUK_MAX_CONCURRENT`	`0`	Cap of concurrent in-flight HTTP requests. 0 = unlimited
`THRUK_HTTP_HEADER_AUTH`	`false`	Streamable-HTTP multi-tenant: take credentials from per-request headers (= `--header-auth`)
`MCP_HTTP_TOKEN`		Bearer token gating the `/mcp` endpoint (transport level). HTTP serving fails closed unless this or the opt-out is set
`MCP_HTTP_ALLOW_UNAUTHENTICATED`	`false`	Opt out of the bearer requirement (proxy-fronted deploys). Leaves `/mcp` open — TLS + your own auth layer only
`MCP_HTTP_ALLOWED_HOSTS`	`localhost,127.0.0.1,[::1]`	CSV `Host` header allowlist (anti-DNS-rebinding) via `TrustedHostMiddleware`

Security

Read-only mode — set THRUK_READ_ONLY=true to remove every write tool(thruk_acknowledge, thruk_schedule_*_downtime, thruk_recheck,thruk_delete_*, thruk_run_background_query) from the MCP server. TheLLM literally cannot mutate monitoring state. Use this for general-purposeagents that should only observe.
Tool allowlist — THRUK_ENABLED_TOOLS=thruk_list_*,thruk_problems,thruk_statsrestricts the exposed surface to the listed tools (fnmatch wildcardssupported). Useful when fronting multiple LLM clients with the same gatewaybut different scopes.
Audit log — every write tool invocation emits one JSON line onthruk_mcp.audit (stderr by default):
```
{"ts":"2026-05-17T22:00:00+00:00","tool":"thruk_acknowledge","user":"alice",
 "args":{"host":"srv01","comment":"investigating"},"target":"srv01","status":"ok"}
```
Disable with THRUK_AUDIT_LOG=false. Sensitive keys (api_key, password,token) are redacted as *** before logging.
Rate limit — THRUK_MAX_CONCURRENT=8 caps in-flight HTTP requests withan asyncio.Semaphore. Combined with the v0.3 TTL cache, this protects theThruk core from an LLM that loops on tools or chains them aggressively.
Transport-level HTTP auth — gate the Streamable-HTTP /mcp endpointitself, independently of the Thruk credentials a request carries (no effect onstdio). Set MCP_HTTP_TOKEN=<secret> to require an Authorization: Bearer <token> header (constant-time compare; 401 + WWW-Authenticate: Bearerotherwise). HTTP serving fails closed — --listen / --transport streamable-http refuses to start unless MCP_HTTP_TOKEN is set orMCP_HTTP_ALLOW_UNAUTHENTICATED=true is given (explicit opt-out forproxy-fronted deploys). MCP_HTTP_ALLOWED_HOSTS enforces a Host allowlist(anti-DNS-rebinding, defaults to loopback). The chain is TrustedHost → Bearer → HeaderAuth → /mcp, so the bearer gate composes with header-auth multi-tenantmode below.

Header-auth multi-tenant — run thruk-mcp --listen 8001 --stateless --header-auth (or THRUK_HTTP_HEADER_AUTH=1) to serve many users from oneprocess, each with their own Thruk credentials supplied per request viaheaders:

Header	Maps to	Required
`X-Thruk-Auth-Key`	`api_key`	yes (else `401`)
`X-Thruk-Base-Url`	`base_url`	no (falls back to `THRUK_BASE_URL`)
`X-Thruk-Auth-User`	`auth_user`	no
`X-Thruk-Backends`	`default_backends` (CSV)	no

The server boots without THRUK_API_KEY. Only credential/endpoint fields comefrom headers — THRUK_READ_ONLY, THRUK_ENABLED_TOOLS and THRUK_AUDIT_LOGremain server-owned, so a tenant cannot grant itself write access orsilence the audit log (which still attributes each call to the tenant'sauth_user). Per-tenant HTTP clients are pooled in a bounded LRU cache. TheAPI key travels in a header, so serve only over TLS (terminate TLS infront, or behind a trusted reverse proxy). Requires --stateless.

Development

pip install -e ".[dev]"
pre-commit install                              # one-time setup of git hooks

ruff check src tests && ruff format src tests   # lint + format
mypy src                                        # type-check
pytest -v --cov=thruk_mcp --cov-fail-under=80   # tests with coverage gate

Conventions:

Conventional Commits (feat:, fix:, chore:, docs:, refactor:,test:).
No direct push to main: branch → PR → squash merge.
Any new tool must come with a respx-mocked unit test in tests/test_tools.py;regenerate catalog/tools.json (Docker MCP Registry contract) withpython scripts/gen_tools_json.py — it is generated from the live registry,not hand-edited, and CI enforces it via --check.
CI gate: ruff, ruff format --check, mypy, pytest with 80 %coverage minimum.

References

Thruk REST API: https://www.thruk.org/documentation/rest.html
Thruk REST commands: https://www.thruk.org/documentation/rest_commands.html
MCP spec: https://spec.modelcontextprotocol.io/
Inspired by: https://github.com/lausser/omd-mcp (initial proof-of-concept)

Project docs

CHANGELOG.md — what changed in each release.
UPGRADING.md — per-version migration notes.
SUPPORT.md — supported Python / Thruk / MCP-client versions,security policy, release cadence.
CONTRIBUTING.md — dev setup, PR conventions, tool /env-var contribution checklists.

License

MIT — see LICENSE.

thruk-mcp

thruk-mcp

Features

Quick start

1. Configure

2a. Run with Docker

2b. Run locally

3. Wire it to an MCP client

4. Use with the Docker MCP Gateway

Option A — Private local catalog

Option B — Submit upstream

What's exposed

65 MCP Tools

5 MCP Resources

3 MCP Prompts

Robustness

Environment variables

Connection

Security / multi-tenant (v0.6)

Security

Development

References

Project docs

License

MCP Server · Populars

🦞 OpenClaw — Personal AI Assistant

MarkItDown-MCP

MarkItDown

Awesome MCP Servers

mcp-server-sentry: A Sentry MCP server

MCP Server · New

Oh My Cassette: Chat Your Raw Clips Into a Finished Cut

Trends MCP

Gemini Notebook (formerly Google NotebookLM) CLI & MCP Server

Fitter — web data for AI agents

foehn