ys1173

duckdb-iceberg-mcp

Community ys1173
Updated

MCP server for querying Apache Iceberg tables via AWS Glue using DuckDB

duckdb-iceberg-mcp

An MCP server that lets AI assistants query Apache Iceberg tables on S3 via AWS Glue Data Catalog. DuckDB is the embedded query engine — Apache Arrow columnar format, vectorized execution, direct S3 reads with no data movement.

Built on the official Python MCP SDK.

This MCP server is complementary to telemetry-iceberg-adaptor, which ingests telemetry data into Apache Iceberg. Use that project to write data and this project to query it through MCP-enabled AI clients.

Architecture

                                              +----------------------+
                                              | MCP Clients          |
                                              | - OpenAI Codex       |
                                              | - Claude Desktop     |
                                              | - OpenCode           |
                                              | - LibreChat          |
                                              +----------------------+
                                                        |
                                                        | MCP Protocol (tools/list · tools/call)
                                                        v
            +--------------------------------------------------------------------------------------------+
            | duckdb-iceberg-mcp                                                                         |
            |                                                                                            |
            | +----------------------------------------------------------------------------------------+ |
            | | MCP Protocol Layer                                                                     | |
            | | stdio · Streamable HTTP · SSE                                                          | |
            | | JWT auth · write guard · row/char limits                                               | |
            | +----------------------------------------------------------------------------------------+ |
            |                                          <-->                                              |
            | +----------------------------------------------------------------------------------------+ |
            | | ⚡ DuckDB                                                                               | |
            | | Apache Arrow columnar engine                                                           | |
            | | Vectorized execution · Direct S3 reads                                                 | |
            | | httpfs · iceberg · aws extensions                                                      | |
            | +----------------------------------------------------------------------------------------+ |
            +--------------------------------------------------------------------------------------------+
                          |                                                  |
                          | httpfs extension                                 | boto3
                          | columnar Parquet reads                           | metadata · schema
                          |                                                  | Iceberg manifest resolution
                          v                                                  v
            +--------------------------------------------------------------------------------------------+
            | AWS                                                                                        |
            | +-------------- Amazon S3 --------------+  +------------ AWS Glue Data Catalog ----------+ |
            | | Apache Iceberg tables                 |  | Databases · Tables · Schema                 | |
            | | Parquet data files                    |  | Iceberg metadata                            | |
            | +---------------------------------------+  +---------------------------------------------+ |
            +--------------------------------------------------------------------------------------------+

Features

  • Query Iceberg tables on S3 via AWS Glue Data Catalog
  • Three transports: stdio, Streamable HTTP, SSE
  • Easy mode — single-tenant, optional static API key, no IdP required
  • Full mode — JWT/JWKS token validation (Auth0, Cognito, Keycloak, Okta, …)
  • Writes disabled by default; only available in full mode with explicit opt-in
  • Configurable row and character limits to prevent runaway responses

Requirements

  • Python 3.12+
  • AWS credentials with access to Glue Data Catalog and S3

Installation

git clone <repo>
cd duckdb-iceberg-mcp
python -m venv .venv
source .venv/bin/activate
pip install -e .

Quick Start

stdio (local AI client)

Copy the example config and fill in your AWS details:

cp config/easy.env.example .env
# .env
MCP_MODE=easy
MCP_TRANSPORT=stdio
CATALOG_TYPE=glue
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...

Run directly:

duckdb-iceberg-mcp

Or configure in your MCP client (Claude Desktop, OpenAI Codex):

{
  "mcpServers": {
    "duckdb-iceberg-mcp": {
      "command": "/path/to/.venv/bin/duckdb-iceberg-mcp",
      "env": {
        "MCP_MODE": "easy",
        "CATALOG_TYPE": "glue",
        "AWS_REGION": "us-east-1",
        "AWS_ACCESS_KEY_ID": "AKIA...",
        "AWS_SECRET_ACCESS_KEY": "..."
      }
    }
  }
}

Streamable HTTP (network clients, e.g. LibreChat in Docker)

MCP_MODE=easy
MCP_TRANSPORT=http
MCP_HOST=0.0.0.0
MCP_PORT=8766
MCP_ALLOWED_HOSTS=host.docker.internal:*,localhost:*
CATALOG_TYPE=glue
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
env $(grep -v '^#' .env | grep -v '^$' | xargs) .venv/bin/duckdb-iceberg-mcp

MCP client URL: http://localhost:8766/mcp

LibreChat (librechat.yaml):

mcpServers:
  duckdb-iceberg-mcp:
    type: streamable-http
    url: 'http://host.docker.internal:8766/mcp'
    timeout: 60000
    initTimeout: 20000

OpenAI Codex: add via the Codex UI — Streamable HTTP, URL http://localhost:8766/mcp.

AWS Authentication

Configure credentials using one of these methods (key/secret takes priority if both are set):

Method Env vars
Explicit credentials AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY
Named profile AWS_PROFILE=my-profile
Default chain Set neither — falls back to env vars, ~/.aws/credentials, instance role

MCP Tools

list_tables(database?)

Lists Glue catalog tables. Optionally filter by database name.

describe_table(table_name)

Returns column names, types, and partition keys. Use database.table format.

glue_table(table_name)

Registers a Glue Iceberg table as a queryable DuckDB view. Required for tables where data files live outside the Glue-registered table root (a common layout with shared S3 prefixes).

glue_table('mydb.mytable')
→ Registered view 'mydb__mytable'. Query with: SELECT * FROM mydb__mytable

query_lakehouse(sql)

Executes a SQL query against registered views or direct S3 paths (read_parquet(), iceberg_scan()).

Configuration Reference

Variable Default Description
MCP_MODE easy easy or full
MCP_TRANSPORT stdio stdio, http (Streamable HTTP), sse
MCP_HOST 127.0.0.1 Bind address for HTTP/SSE
MCP_PORT 8000 Port for HTTP/SSE
MCP_ALLOWED_HOSTS (empty) Comma-separated allowed Host headers (e.g. host.docker.internal:*,localhost:*). Empty = SDK default
MCP_API_KEY (empty) Static bearer token for easy mode HTTP. Empty = no auth
JWKS_URL (required in full mode) JWKS endpoint for JWT validation
JWT_AUDIENCE duckdb-iceberg-mcp Expected aud claim in JWTs
CATALOG_TYPE glue Only glue supported
AWS_REGION us-east-1 AWS region
AWS_PROFILE (empty) Named AWS profile
AWS_ACCESS_KEY_ID (empty) AWS access key
AWS_SECRET_ACCESS_KEY (empty) AWS secret key
WRITE_MODE disabled disabled or enabled. Always disabled in easy mode
MAX_ROWS 250 Maximum rows returned per query
MAX_CHARS 40000 Maximum characters in a query response

Full Mode (JWT Auth)

Full mode validates a JWT bearer token on every request. All authenticated users share one DuckDB connection — per-user session isolation is deferred to a future release.

MCP_MODE=full
MCP_TRANSPORT=http
MCP_HOST=0.0.0.0
MCP_PORT=8766
JWKS_URL=https://your-idp.example.com/.well-known/jwks.json
JWT_AUDIENCE=duckdb-iceberg-mcp
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...

The client passes a JWT as Authorization: Bearer <token>. The server validates it against the JWKS endpoint. Any IdP that issues standard JWTs works (Auth0, AWS Cognito, Keycloak, Okta).

See config/full.env.example for a full template.

Smoke Test

Run a quick end-to-end check against your real Glue catalog:

cp config/easy.env.example .env  # fill in AWS credentials
.venv/bin/python scripts/smoke_test.py

Run Tests

pip install -e ".[dev]"
pytest

MCP Server · Populars

MCP Server · New

    butterbase-ai

    butterbase

    Open-source backend-as-a-service. Postgres, auth, storage, functions, AI gateway, MCP.

    Community butterbase-ai
    GoPlusSecurity

    GoPlus AgentGuard

    Security guard for AI agents — blocks malicious skills, prevents data leaks, protects secrets. 24 detection rules, runtime action evaluation, trust registry.

    Community GoPlusSecurity
    respawn-llc

    tool-filter-mcp

    MCP proxy server that filters tools from upstream MCP servers via regex-based deny list

    Community respawn-llc
    Kaelio

    ktx-ai-data-agents-context

    ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, and any AI agent to query data accurately through MCP with skills, memory and a semantic layer

    Community Kaelio
    bergside

    TypeUI - Build better UI with AI

    Build better UI with Codex, Claude, Cursor and other AI tools

    Community bergside