mcp-federated-data

Federated MCP server that joins relational metadata (MySQL) with timeseries values (InfluxDB) behind a single, LLM-friendly entity layer.

License: MIT Version Node MCP

Languages: English · 中文

The problem

When an LLM needs to answer a question like:

"show me last-7-day temperature trends for active sensors in zone A"

it has to traverse two stores:

Relational metadata (what sensors exist, where, what kind, what units) — MySQL
Timeseries values (the readings over time) — InfluxDB

Plugging in two independent MCP servers forces the LLM to:

Query MySQL → find matching entities → extract their ids
Push those ids into an InfluxDB query
Stitch the two result sets in its own context

LLMs reliably stumble at step 3 — especially when the join key is composite (e.g. an InfluxDB tag is the concatenation of two relational fields).

This server collapses all three steps into one tool call.

What the LLM sees

Five tools, no SQL or Flux in the prompt:

Tool	Purpose
`list_entities`	Filter business entities by their fields
`get_entity`	Look up a single entity by primary key
`list_related`	Walk a configured relation between entities
`get_entity_timeseries`	Federated — metadata + timeseries in one call
`compare_timeseries`	Compare 2–20 specific entities over the same window

Every tool ships with rich JSON-Schema descriptions so the LLM picks the right arguments without prompt tricks.

How it differs from naive two-server setups

Scenario	Two independent MCP servers	`mcp-federated-data`
"trends for sensors in zone A"	LLM: query MySQL → extract ids → push into Flux → join in its head	one `get_entity_timeseries` call
Composite tag like `{deviceId}.{signalId}`	LLM composes strings in-prompt — error-prone	`composer: "{deviceId}.{signalId}"` in YAML, server does it
Metadata ↔ timeseries alignment	LLM does the join, frequently mis-pairs	server joins by configured key
Data-volume blow-up	unprotected	enforced max entities + auto-downsampling + per-entity point cap
Business semantics for the LLM	raw `CREATE TABLE` only	YAML fields with `description`
Audit trail	none	structured per-call audit log

Quickstart

git clone https://github.com/baller-coder/mcp-federated-data.git
cd mcp-federated-data
pnpm install

# Sample environment (MySQL + InfluxDB in Docker, with seeded data)
docker compose -f examples/industrial-monitoring/docker-compose.yml up -d
pnpm seed

# Start the MCP server over stdio
pnpm dev -- --config examples/industrial-monitoring/config.yaml

Connect from any MCP client (Claude Desktop, Cursor, mcp-inspector).

Try it

// list active sites
{
  "name": "list_entities",
  "arguments": {
    "entity": "site",
    "filters": [{ "field": "status", "op": "eq", "value": "active" }]
  }
}

// list sensors attached to site 1
{
  "name": "list_related",
  "arguments": {
    "source_entity": "site",
    "source_id": 1,
    "target_entity": "sensor"
  }
}

// federated query — metadata + timeseries in ONE call
{
  "name": "get_entity_timeseries",
  "arguments": {
    "entity": "sensor",
    "filters": [
      { "field": "site_id", "op": "eq", "value": 1 },
      { "field": "kind",    "op": "eq", "value": "temperature" }
    ],
    "time_range":  { "start": "-7d" },
    "aggregation": { "window": "1h", "fn": "mean" }
  }
}

// compare 3 specific sensors over the same window
{
  "name": "compare_timeseries",
  "arguments": {
    "entity": "sensor",
    "ids":    [101, 201, 301],
    "time_range":  { "start": "-24h" },
    "aggregation": { "window": "10m", "fn": "mean" }
  }
}

Configuration

The whole server behavior is defined in a single YAML file. Three sections.

1. Datasources

datasources:
  - name: business
    type: mysql
    host: localhost
    port: 3306
    database: my_db
    username: readonly_user
    password: secret

  - name: timeseries
    type: influxdb
    url: http://localhost:8086
    token: my-token
    org: my_org
    bucket: my_bucket

2. Entities

Each entity binds to a relational table or view, with optional relations and an optional timeseries binding.

entities:
  - name: site
    description: Physical monitoring location.
    source:
      datasource: business
      table: sites
      primary_key: id
      fields:
        - { name: id,     type: number }
        - { name: name,   type: string, description: Display name }
        - { name: region, type: string }
        - { name: status, type: string, description: "active / inactive / maintenance" }

  - name: sensor
    description: A sensor attached to a site.
    source:
      datasource: business
      table: sensors
      primary_key: id
      fields:
        - { name: id,      type: number }
        - { name: site_id, type: number }
        - { name: name,    type: string }
        - { name: kind,    type: string, description: "temperature / humidity / voltage / ..." }
        - { name: unit,    type: string }
    relations:
      - target: site
        type: many-to-one
        local_key: site_id
        foreign_key: id
    timeseries:
      datasource: timeseries
      measurement: sensor_data
      value_field: value
      join_key:
        local: id
        remote_tag: sensor_id

3. Defaults (guard rails)

defaults:
  max_entities_per_query: 50
  max_points_per_entity: 500
  query_timeout_ms: 15000

Composite join keys (v0.2 highlight)

When the InfluxDB tag value is a composite of multiple relational fields — common in IoT / industrial systems where a tag like 400001240.438000066 encodes {deviceId}.{signalId} — declare it like this:

timeseries:
  measurement: sensor_data
  value_field: value
  join_key:
    local: [device_id, signal_id]
    remote_tag: signal_id
    composer: "{device_id}.{signal_id}"

The server:

Pulls metadata from MySQL (the composer's local fields are auto-included in SELECT).
Composes each row's tag value using the template.
Pushes the composed list into the InfluxDB tag filter.
Joins results back by the same template.

Single-field bindings (the v0.1 form) keep working unchanged — the server treats them as a one-element composite, so all paths stay uniform.

How it works — the 50-line core

Every federated tool follows the same three steps:

Fetch metadata — relational query against the business store, with safe-identifier checks and parameterised WHERE. Allowed filter fields are restricted to those declared in the entity config.
Fetch timeseries — pull join-key values from step 1, push them into a tag filter against the timeseries store, with optional aggregation and per-call point cap.
Merge — group timeseries points by the remote tag value, then stitch each metadata row with its sorted series.

No SQL parser. No cross-store query planner. By design.

Architecture

┌──────────────────────────────────┐
│  MCP client (Claude / Cursor)    │
└────────────┬─────────────────────┘
             │ stdio  (JSON-RPC)
             ▼
┌──────────────────────────────────┐
│       mcp-federated-data         │
│  ┌────────────────────────────┐  │
│  │  Tools  (5 tools)          │  │
│  ├────────────────────────────┤  │
│  │  Entity registry           │  │
│  │  Join-key normalizer       │  │
│  │  Composer engine           │  │
│  │  Guards (limits/timeout)   │  │
│  │  Audit logger              │  │
│  ├────────────────────────────┤  │
│  │  Datasource adapters       │  │
│  └─────────┬────────┬─────────┘  │
└────────────┼────────┼────────────┘
             ▼        ▼
        ┌──────┐ ┌──────────┐
        │MySQL │ │InfluxDB  │
        └──────┘ └──────────┘

Where this fits

mcp-federated-data is schema-driven, not domain-specific. It applies anywhere business metadata lives in a relational store and observed values live in a timeseries store:

IoT device telemetry
Industrial process monitoring
Asset performance management
Building automation
Energy / power monitoring
Environmental sensor networks
Network device monitoring

If your stack is MySQL + InfluxDB and you want LLMs to reason over it — this server is for you.

Comparison with related projects

	mcp-federated-data	mcp-server-mysql	mcp-server-influxdb	Wren AI / Vanna
Cross-store join	✅	❌	❌	partial (own engine)
Composite tag composer	✅	n/a	❌	❌
LLM-safe SQL guards	✅	varies	n/a	✅
Schema as semantic layer	✅	❌	❌	✅
Configuration	YAML	env / args	env / args	dedicated DSL
Scope	narrow but deep	thin wrapper	thin wrapper	full BI platform

Contributing

Issues and PRs welcome. Before opening one:

For new data-source adapters — open an issue first so we can align on the interface.
For new tools — match the existing JSON-Schema + audit-log pattern.
Public APIs must keep backward compatibility within minor versions.

Acknowledgments

Model Context Protocol — the standard this server speaks
@modelcontextprotocol/sdk — TypeScript reference SDK
Cube.dev / dbt — prior art on the "semantic layer" idea
InfluxDB and MySQL communities

License

MIT

mcp-federated-data