databricks-mcp

Safe, read-only SQL analytics for AI agents — over MCP. Point an agent at a SQL warehouseand let it explore, profile, and query data without any risk of mutating it.

License: MIT

What it is

databricks-mcp is a Model Context Protocol server that givesan AI agent safe, read-only analytics access to a SQL warehouse. It exposes five typed tools —list_tables, describe_table, sample_rows, run_sql, and profile_table — and routes everyquery through an AST-based safety guard that enforces read-only, single-statement, and row-capguarantees.

Two backends ship in the box:

DuckDB (default) — runs fully offline against a bundled synthetic logistics warehouse(shipments, carriers, lanes). Zero setup, ~30 seconds to first query.
Databricks SQL Warehouse — connect to a real warehouse with a few environment variables.

About the name: the project is named for its Databricks backend, but it runs completelyoffline on DuckDB out of the box — you don't need a Databricks account to try it.

30-second quickstart

uvx databricks-mcp           # runs on the bundled DuckDB logistics sample data

That's it — the server starts on stdio with the sample warehouse loaded and waits for an MCPclient.

Claude Desktop config

Add this to your Claude Desktop MCP configuration (claude_desktop_config.json):

{
  "mcpServers": {
    "databricks-mcp": { "command": "uvx", "args": ["databricks-mcp"] }
  }
}

Restart Claude Desktop and the five tools become available to the assistant.

Connecting a real Databricks SQL Warehouse

Set the backend to databricks and provide your warehouse credentials via environment variables:

export DB_BACKEND=databricks
export DATABRICKS_SERVER_HOSTNAME=...     # e.g. dbc-xxxxxxxx-xxxx.cloud.databricks.com
export DATABRICKS_HTTP_PATH=...           # e.g. /sql/1.0/warehouses/abc123
export DATABRICKS_TOKEN=...               # a Databricks personal access token

Optionally cap the maximum rows any single query may return (default 1000):

export MAX_ROWS=500

Secrets are only ever read from the environment and are never logged.

Tools

Tool	Input	Output
`list_tables`	—	Table names and column counts.
`describe_table`	`table`	Columns, types, and row count.
`sample_rows`	`table`, `n` (default 10, max 100)	Preview rows from the table.
`run_sql`	`query`	Guarded read-only result rows (row-capped).
`profile_table`	`table`	Per-column null fraction, distinct count, and min/max.

All inputs and outputs are typed with pydantic models, so the agent receives clean JSON schemas.

Safety / guardrails

Every query passed to run_sql — and every statement the other tools generate internally —goes through safety.py, which validates against the parsed sqlglotAST rather than fragile string matching:

Parse or reject. Anything that fails to parse is rejected with a structured error.
Single statement only. Multi-statement input is rejected, blocking stacked-query injection.
Read-only only. Only SELECT and CTE (WITH) queries are allowed. AnyINSERT/UPDATE/DELETE/DROP/ALTER/CREATE/GRANT/COPY/CALL/PRAGMA/ATTACH is rejected.
System-table block. References to information_schema, pg_catalog, system, and similarcatalogs are denied — agents introspect schema through list_tables/describe_table instead.
Filesystem-function block. Read-only SELECTs can still call table functions likeread_csv, read_parquet, read_text, and glob to read local files. The guard walks the ASTand denies these, so an agent can't exfiltrate the host filesystem (e.g.SELECT * FROM read_text('/etc/passwd')).
Auto-LIMIT. A LIMIT (default 1000, configurable via MAX_ROWS) is injected when absent,so an agent can never pull unbounded data.

Identifier arguments (table) are additionally checked against the known-table list before theyare ever interpolated into SQL, preventing identifier injection.

Every one of these rules is backed by a passing test — seetests/test_safety.py (read-only allowlist, multi-statement, unparseable,system-table, filesystem-function, and auto-LIMIT cases) and tests/test_duckdb_backend.py(unknown-table rejection, row-cap truncation). The README makes no guardrail claim that isn'tproven by the suite.

Recorded transcript

An agent exploring the bundled logistics warehouse:

> list_tables
[
  {"name": "carriers",  "column_count": 4},
  {"name": "lanes",     "column_count": 4},
  {"name": "shipments", "column_count": 7}
]

> run_sql: SELECT c.mode,
                  count(*)                                        AS shipments,
                  round(100.0 * avg(s.delivered_on_time::INT), 1) AS on_time_pct
           FROM shipments s
           JOIN carriers c ON s.carrier_id = c.carrier_id
           GROUP BY c.mode
           ORDER BY shipments DESC

columns: ["mode", "shipments", "on_time_pct"]
rows:
  ["LTL",        1250, 88.0]
  ["Intermodal", 1250, 88.0]
  ["FTL",        1250, 88.0]
  ["Parcel",     1250, 88.0]
truncated: false

A DDL attempt is refused before it ever reaches the warehouse:

> run_sql: DROP TABLE shipments
SQLValidationError: Only read-only SELECT queries are allowed.

Development

uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"
pytest
ruff check .

The DuckDB sample data is regenerated deterministically with:

python sample_data/generate.py

License

MIT — see LICENSE.

databricks-mcp

databricks-mcp

What it is

30-second quickstart

Claude Desktop config

Connecting a real Databricks SQL Warehouse

Tools

Safety / guardrails

Recorded transcript

Development

License

MCP Server · Populars

🦞 OpenClaw — Personal AI Assistant

MarkItDown-MCP

MarkItDown

Awesome MCP Servers

mcp-server-sentry: A Sentry MCP server

MCP Server · New

velane

Jean Memory

HitKeep

Prometheus MCP Server

EdgeOne Makers MCP