baller-coder

mcp-federated-data

Community baller-coder
Updated

**Federated MCP server** that joins relational metadata (MySQL) with timeseries values (InfluxDB) behind a single, LLM-friendly entity layer.

mcp-federated-data

Federated MCP server that joins relational metadata (MySQL) with timeseries values (InfluxDB) behind a single, LLM-friendly entity layer.

License: MITVersionNodeMCP

Languages: English · 中文

The problem

When an LLM needs to answer a question like:

"show me last-7-day temperature trends for active sensors in zone A"

it has to traverse two stores:

  • Relational metadata (what sensors exist, where, what kind, what units) — MySQL
  • Timeseries values (the readings over time) — InfluxDB

Plugging in two independent MCP servers forces the LLM to:

  1. Query MySQL → find matching entities → extract their ids
  2. Push those ids into an InfluxDB query
  3. Stitch the two result sets in its own context

LLMs reliably stumble at step 3 — especially when the join key is composite (e.g. an InfluxDB tag is the concatenation of two relational fields).

This server collapses all three steps into one tool call.

What the LLM sees

Five tools, no SQL or Flux in the prompt:

Tool Purpose
list_entities Filter business entities by their fields
get_entity Look up a single entity by primary key
list_related Walk a configured relation between entities
get_entity_timeseries Federated — metadata + timeseries in one call
compare_timeseries Compare 2–20 specific entities over the same window

Every tool ships with rich JSON-Schema descriptions so the LLM picks the right arguments without prompt tricks.

How it differs from naive two-server setups

Scenario Two independent MCP servers mcp-federated-data
"trends for sensors in zone A" LLM: query MySQL → extract ids → push into Flux → join in its head one get_entity_timeseries call
Composite tag like {deviceId}.{signalId} LLM composes strings in-prompt — error-prone composer: "{deviceId}.{signalId}" in YAML, server does it
Metadata ↔ timeseries alignment LLM does the join, frequently mis-pairs server joins by configured key
Data-volume blow-up unprotected enforced max entities + auto-downsampling + per-entity point cap
Business semantics for the LLM raw CREATE TABLE only YAML fields with description
Audit trail none structured per-call audit log

Quickstart

git clone https://github.com/baller-coder/mcp-federated-data.git
cd mcp-federated-data
pnpm install

# Sample environment (MySQL + InfluxDB in Docker, with seeded data)
docker compose -f examples/industrial-monitoring/docker-compose.yml up -d
pnpm seed

# Start the MCP server over stdio
pnpm dev -- --config examples/industrial-monitoring/config.yaml

Connect from any MCP client (Claude Desktop, Cursor, mcp-inspector).

Try it

// list active sites
{
  "name": "list_entities",
  "arguments": {
    "entity": "site",
    "filters": [{ "field": "status", "op": "eq", "value": "active" }]
  }
}

// list sensors attached to site 1
{
  "name": "list_related",
  "arguments": {
    "source_entity": "site",
    "source_id": 1,
    "target_entity": "sensor"
  }
}

// federated query — metadata + timeseries in ONE call
{
  "name": "get_entity_timeseries",
  "arguments": {
    "entity": "sensor",
    "filters": [
      { "field": "site_id", "op": "eq", "value": 1 },
      { "field": "kind",    "op": "eq", "value": "temperature" }
    ],
    "time_range":  { "start": "-7d" },
    "aggregation": { "window": "1h", "fn": "mean" }
  }
}

// compare 3 specific sensors over the same window
{
  "name": "compare_timeseries",
  "arguments": {
    "entity": "sensor",
    "ids":    [101, 201, 301],
    "time_range":  { "start": "-24h" },
    "aggregation": { "window": "10m", "fn": "mean" }
  }
}

Configuration

The whole server behavior is defined in a single YAML file. Three sections.

1. Datasources

datasources:
  - name: business
    type: mysql
    host: localhost
    port: 3306
    database: my_db
    username: readonly_user
    password: secret

  - name: timeseries
    type: influxdb
    url: http://localhost:8086
    token: my-token
    org: my_org
    bucket: my_bucket

2. Entities

Each entity binds to a relational table or view, with optional relations and an optional timeseries binding.

entities:
  - name: site
    description: Physical monitoring location.
    source:
      datasource: business
      table: sites
      primary_key: id
      fields:
        - { name: id,     type: number }
        - { name: name,   type: string, description: Display name }
        - { name: region, type: string }
        - { name: status, type: string, description: "active / inactive / maintenance" }

  - name: sensor
    description: A sensor attached to a site.
    source:
      datasource: business
      table: sensors
      primary_key: id
      fields:
        - { name: id,      type: number }
        - { name: site_id, type: number }
        - { name: name,    type: string }
        - { name: kind,    type: string, description: "temperature / humidity / voltage / ..." }
        - { name: unit,    type: string }
    relations:
      - target: site
        type: many-to-one
        local_key: site_id
        foreign_key: id
    timeseries:
      datasource: timeseries
      measurement: sensor_data
      value_field: value
      join_key:
        local: id
        remote_tag: sensor_id

3. Defaults (guard rails)

defaults:
  max_entities_per_query: 50
  max_points_per_entity: 500
  query_timeout_ms: 15000

Composite join keys (v0.2 highlight)

When the InfluxDB tag value is a composite of multiple relational fields — common in IoT / industrial systems where a tag like 400001240.438000066 encodes {deviceId}.{signalId} — declare it like this:

timeseries:
  measurement: sensor_data
  value_field: value
  join_key:
    local: [device_id, signal_id]
    remote_tag: signal_id
    composer: "{device_id}.{signal_id}"

The server:

  1. Pulls metadata from MySQL (the composer's local fields are auto-included in SELECT).
  2. Composes each row's tag value using the template.
  3. Pushes the composed list into the InfluxDB tag filter.
  4. Joins results back by the same template.

Single-field bindings (the v0.1 form) keep working unchanged — the server treats them as a one-element composite, so all paths stay uniform.

How it works — the 50-line core

Every federated tool follows the same three steps:

  1. Fetch metadata — relational query against the business store, with safe-identifier checks and parameterised WHERE. Allowed filter fields are restricted to those declared in the entity config.
  2. Fetch timeseries — pull join-key values from step 1, push them into a tag filter against the timeseries store, with optional aggregation and per-call point cap.
  3. Merge — group timeseries points by the remote tag value, then stitch each metadata row with its sorted series.

No SQL parser. No cross-store query planner. By design.

Architecture

┌──────────────────────────────────┐
│  MCP client (Claude / Cursor)    │
└────────────┬─────────────────────┘
             │ stdio  (JSON-RPC)
             ▼
┌──────────────────────────────────┐
│       mcp-federated-data         │
│  ┌────────────────────────────┐  │
│  │  Tools  (5 tools)          │  │
│  ├────────────────────────────┤  │
│  │  Entity registry           │  │
│  │  Join-key normalizer       │  │
│  │  Composer engine           │  │
│  │  Guards (limits/timeout)   │  │
│  │  Audit logger              │  │
│  ├────────────────────────────┤  │
│  │  Datasource adapters       │  │
│  └─────────┬────────┬─────────┘  │
└────────────┼────────┼────────────┘
             ▼        ▼
        ┌──────┐ ┌──────────┐
        │MySQL │ │InfluxDB  │
        └──────┘ └──────────┘

Where this fits

mcp-federated-data is schema-driven, not domain-specific. It applies anywhere business metadata lives in a relational store and observed values live in a timeseries store:

  • IoT device telemetry
  • Industrial process monitoring
  • Asset performance management
  • Building automation
  • Energy / power monitoring
  • Environmental sensor networks
  • Network device monitoring

If your stack is MySQL + InfluxDB and you want LLMs to reason over it — this server is for you.

Comparison with related projects

mcp-federated-data mcp-server-mysql mcp-server-influxdb Wren AI / Vanna
Cross-store join partial (own engine)
Composite tag composer n/a
LLM-safe SQL guards varies n/a
Schema as semantic layer
Configuration YAML env / args env / args dedicated DSL
Scope narrow but deep thin wrapper thin wrapper full BI platform

Contributing

Issues and PRs welcome. Before opening one:

  • For new data-source adapters — open an issue first so we can align on the interface.
  • For new tools — match the existing JSON-Schema + audit-log pattern.
  • Public APIs must keep backward compatibility within minor versions.

Acknowledgments

License

MIT

MCP Server · Populars

MCP Server · New