Observability MCP Server

FastMCP 2.14.1-powered observability server for monitoring MCP ecosystems

A comprehensive observability server built on FastMCP 2.14.1 that leverages OpenTelemetry integration, persistent storage, and advanced monitoring capabilities to provide production-grade observability for MCP server ecosystems.

🚀 Features

FastMCP 2.14.1 Integration

✅ OpenTelemetry Integration - Distributed tracing and metrics collection
✅ Enhanced Storage Backend - Persistent metrics and historical data
✅ Production-Ready - Built for high-performance monitoring

Comprehensive Monitoring

🔍 Real-time Health Checks - Monitor MCP server availability and response times
📊 Performance Metrics - CPU, memory, disk, and network monitoring
🔗 Distributed Tracing - Track interactions across MCP server ecosystems
🚨 Intelligent Alerting - Anomaly detection and automated alerts
📈 Performance Reports - Automated analysis and optimization recommendations

Advanced Analytics

🔬 Usage Pattern Analysis - Understand how MCP servers are being used
📉 Trend Detection - Identify performance trends and bottlenecks
🎯 Optimization Insights - Data-driven recommendations for improvement
📤 Multi-Format Export - Prometheus, OpenTelemetry, and JSON export

🛠️ Installation

Prerequisites

Python 3.11+
FastMCP 2.14.1+ (automatically installed)

Install from Source

git clone https://github.com/sandraschi/observability-mcp
cd observability-mcp
pip install -e .

Docker Installation

docker build -t observability-mcp .
docker run -p 9090:9090 observability-mcp

🚀 Quick Start

1. Start the Server

# Using the CLI
observability-mcp run

# Or directly with Python
python -m observability_mcp.server

2. Verify Installation

# Check server health
observability-mcp health

# View available metrics
observability-mcp metrics

3. Configure Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "observability": {
      "command": "observability-mcp",
      "args": ["run"]
    }
  }
}

📊 Available Tools

🔍 Health Monitoring

monitor_server_health - Real-time health checks with OpenTelemetry metrics
monitor_system_resources - Comprehensive system resource monitoring

📈 Performance Analysis

collect_performance_metrics - CPU, memory, disk, and network metrics
generate_performance_reports - Automated performance analysis and recommendations
analyze_mcp_interactions - Usage pattern analysis and optimization insights

🚨 Alerting & Anomaly Detection

alert_on_anomalies - Intelligent anomaly detection and alerting
trace_mcp_calls - Distributed tracing for MCP server interactions

📤 Data Export

export_metrics - Export metrics in Prometheus, OpenTelemetry, or JSON formats

🔧 Configuration

Environment Variables

# Prometheus metrics server port
PROMETHEUS_PORT=9090

# OpenTelemetry service name
OTEL_SERVICE_NAME=observability-mcp

# OTLP exporter endpoint (optional)
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

# Metrics retention period (days)
METRICS_RETENTION_DAYS=30

Alert Configuration

The server comes with pre-configured alerts for common issues:

CPU Usage > 90% (Warning)
Memory Usage > 1GB (Error)
Error Rate > 5% (Error)

Alerts are stored persistently and can be customized through the MCP tools.

📈 Monitoring Dashboard

Prometheus Metrics

Access metrics at: http://localhost:9090/metrics

Available metrics:

# Health checks
mcp_health_checks_total{status="healthy|degraded|unhealthy", service="..."} 1

# Performance metrics
mcp_performance_metrics_collected{service="..."} 1

# System resources
mcp_cpu_usage_percent{} 45.2
mcp_memory_usage_mb{} 1024.5

# Traces and alerts
mcp_traces_created{service="...", operation="..."} 1
mcp_alerts_triggered{type="active|anomaly"} 1

Integration with Grafana

Add Prometheus as a data source in Grafana
Import the provided dashboard JSON
Visualize your MCP ecosystem's health and performance

🏗️ Architecture

FastMCP 2.14.1 Features Leveraged

OpenTelemetry Integration

Distributed Tracing: Track requests across multiple MCP servers
Metrics Collection: Structured performance data collection
Context Propagation: Maintain context across service boundaries

Enhanced Persistent Storage

Historical Data: Store metrics and traces for trend analysis
Cross-Session Persistence: Data survives server restarts
Efficient Storage: Optimized for time-series data

Production Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   MCP Servers   │───▶│ Observability    │───▶│  Prometheus     │
│   (Monitored)   │    │   MCP Server     │    │   Metrics       │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                │                        │
                                ▼                        ▼
                       ┌──────────────────┐    ┌─────────────────┐
                       │  Persistent      │    │   Grafana       │
                       │   Storage        │    │   Dashboard     │
                       └──────────────────┘    └─────────────────┘

📚 Usage Examples

Health Monitoring

# Check MCP server health
result = await monitor_server_health(
    service_url="http://localhost:8000/health",
    timeout_seconds=5.0
)
print(f"Status: {result['health_check']['status']}")

Performance Analysis

# Collect system metrics
metrics = await collect_performance_metrics(service_name="my-mcp-server")
print(f"CPU: {metrics['metrics']['cpu_percent']}%")
print(f"Memory: {metrics['metrics']['memory_mb']} MB")

Distributed Tracing

# Record a trace
trace = await trace_mcp_calls(
    operation_name="process_document",
    service_name="ocr-mcp",
    duration_ms=150.5,
    attributes={"file_size": "2.3MB", "format": "PDF"}
)

Generate Reports

# Create performance report
report = await generate_performance_reports(
    service_name="web-mcp",
    days=7
)
print("Performance Summary:", report['summary'])
print("Recommendations:", report['recommendations'])

🔧 Development

Running Tests

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=observability_mcp --cov-report=html

Code Quality

# Format code
black src/

# Lint code
ruff check src/

# Type checking
mypy src/

Docker Development

# Build development image
docker build -t observability-mcp:dev -f Dockerfile.dev .

# Run with hot reload
docker run -p 9090:9090 -v $(pwd):/app observability-mcp:dev

📊 Performance Benchmarks

FastMCP 2.14.1 Benefits

OpenTelemetry Overhead: <1ms per trace
Storage Performance: 1000+ metrics/second
Memory Usage: 50MB baseline + 10MB per monitored service
Concurrent Monitoring: 100+ services simultaneously

Recommended Hardware

CPU: 2+ cores for metrics processing
RAM: 2GB minimum, 4GB recommended
Storage: 10GB for metrics history (30 days retention)

🚨 Troubleshooting

Common Issues

Server Won't Start

# Check Python version
python --version  # Should be 3.11+

# Check FastMCP installation
pip show fastmcp  # Should be 2.14.1+

# Check dependencies
pip check

Metrics Not Appearing

# Check Prometheus endpoint
curl http://localhost:9090/metrics

# Verify OpenTelemetry configuration
observability-mcp metrics

High Memory Usage

Reduce METRICS_RETENTION_DAYS
Implement metric aggregation
Monitor with monitor_system_resources

Storage Issues

Check available disk space
Clean old metrics: rm -rf ~/.observability-mcp/metrics/*
Restart server to recreate storage

🤝 Contributing

Development Setup

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request

Code Standards

FastMCP 2.14.1+: Use latest features and patterns
OpenTelemetry: Follow OTEL best practices
Async First: All operations should be async
Type Hints: Full type coverage required
Documentation: Comprehensive docstrings

Testing Strategy

Unit Tests: Core functionality
Integration Tests: MCP server interactions
Performance Tests: Benchmarking and load testing
Chaos Tests: Failure scenario testing

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

FastMCP Team - For the amazing 2.14.1 framework with OpenTelemetry integration
OpenTelemetry Community - For the observability standards and tools
Prometheus Team - For the metrics collection and alerting system

🔗 Related Projects

FastMCP - The framework this server is built on
OpenTelemetry Python - Observability instrumentation
Prometheus - Metrics collection and alerting
Grafana - Visualization and dashboards

Built with ❤️ using FastMCP 2.14.1 and OpenTelemetry