sandraschi

Observability MCP Server

Community sandraschi
Updated

Observability MCP Server

FastMCP 2.14.1-powered observability server for monitoring MCP ecosystems

FastMCPOpenTelemetryPrometheusGitHub

A comprehensive observability server built on FastMCP 2.14.1 that leverages OpenTelemetry integration, persistent storage, and advanced monitoring capabilities to provide production-grade observability for MCP server ecosystems.

๐Ÿš€ Features

FastMCP 2.14.1 Integration

  • โœ… OpenTelemetry Integration - Distributed tracing and metrics collection
  • โœ… Enhanced Storage Backend - Persistent metrics and historical data
  • โœ… Production-Ready - Built for high-performance monitoring

Comprehensive Monitoring

  • ๐Ÿ” Real-time Health Checks - Monitor MCP server availability and response times
  • ๐Ÿ“Š Performance Metrics - CPU, memory, disk, and network monitoring
  • ๐Ÿ”— Distributed Tracing - Track interactions across MCP server ecosystems
  • ๐Ÿšจ Intelligent Alerting - Anomaly detection and automated alerts
  • ๐Ÿ“ˆ Performance Reports - Automated analysis and optimization recommendations

Advanced Analytics

  • ๐Ÿ”ฌ Usage Pattern Analysis - Understand how MCP servers are being used
  • ๐Ÿ“‰ Trend Detection - Identify performance trends and bottlenecks
  • ๐ŸŽฏ Optimization Insights - Data-driven recommendations for improvement
  • ๐Ÿ“ค Multi-Format Export - Prometheus, OpenTelemetry, and JSON export

๐Ÿ› ๏ธ Installation

Prerequisites

  • Python 3.11+
  • FastMCP 2.14.1+ (automatically installed)

Install from Source

git clone https://github.com/sandraschi/observability-mcp
cd observability-mcp
pip install -e .

Docker Installation

docker build -t observability-mcp .
docker run -p 9090:9090 observability-mcp

๐Ÿš€ Quick Start

1. Start the Server

# Using the CLI
observability-mcp run

# Or directly with Python
python -m observability_mcp.server

2. Verify Installation

# Check server health
observability-mcp health

# View available metrics
observability-mcp metrics

3. Configure Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "observability": {
      "command": "observability-mcp",
      "args": ["run"]
    }
  }
}

๐Ÿ“Š Available Tools

๐Ÿ” Health Monitoring

  • monitor_server_health - Real-time health checks with OpenTelemetry metrics
  • monitor_system_resources - Comprehensive system resource monitoring

๐Ÿ“ˆ Performance Analysis

  • collect_performance_metrics - CPU, memory, disk, and network metrics
  • generate_performance_reports - Automated performance analysis and recommendations
  • analyze_mcp_interactions - Usage pattern analysis and optimization insights

๐Ÿšจ Alerting & Anomaly Detection

  • alert_on_anomalies - Intelligent anomaly detection and alerting
  • trace_mcp_calls - Distributed tracing for MCP server interactions

๐Ÿ“ค Data Export

  • export_metrics - Export metrics in Prometheus, OpenTelemetry, or JSON formats

๐Ÿ”ง Configuration

Environment Variables

# Prometheus metrics server port
PROMETHEUS_PORT=9090

# OpenTelemetry service name
OTEL_SERVICE_NAME=observability-mcp

# OTLP exporter endpoint (optional)
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

# Metrics retention period (days)
METRICS_RETENTION_DAYS=30

Alert Configuration

The server comes with pre-configured alerts for common issues:

  • CPU Usage > 90% (Warning)
  • Memory Usage > 1GB (Error)
  • Error Rate > 5% (Error)

Alerts are stored persistently and can be customized through the MCP tools.

๐Ÿ“ˆ Monitoring Dashboard

Prometheus Metrics

Access metrics at: http://localhost:9090/metrics

Available metrics:

# Health checks
mcp_health_checks_total{status="healthy|degraded|unhealthy", service="..."} 1

# Performance metrics
mcp_performance_metrics_collected{service="..."} 1

# System resources
mcp_cpu_usage_percent{} 45.2
mcp_memory_usage_mb{} 1024.5

# Traces and alerts
mcp_traces_created{service="...", operation="..."} 1
mcp_alerts_triggered{type="active|anomaly"} 1

Integration with Grafana

  1. Add Prometheus as a data source in Grafana
  2. Import the provided dashboard JSON
  3. Visualize your MCP ecosystem's health and performance

๐Ÿ—๏ธ Architecture

FastMCP 2.14.1 Features Leveraged

OpenTelemetry Integration
  • Distributed Tracing: Track requests across multiple MCP servers
  • Metrics Collection: Structured performance data collection
  • Context Propagation: Maintain context across service boundaries
Enhanced Persistent Storage
  • Historical Data: Store metrics and traces for trend analysis
  • Cross-Session Persistence: Data survives server restarts
  • Efficient Storage: Optimized for time-series data
Production Architecture
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   MCP Servers   โ”‚โ”€โ”€โ”€โ–ถโ”‚ Observability    โ”‚โ”€โ”€โ”€โ–ถโ”‚  Prometheus     โ”‚
โ”‚   (Monitored)   โ”‚    โ”‚   MCP Server     โ”‚    โ”‚   Metrics       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                โ”‚                        โ”‚
                                โ–ผ                        โ–ผ
                       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                       โ”‚  Persistent      โ”‚    โ”‚   Grafana       โ”‚
                       โ”‚   Storage        โ”‚    โ”‚   Dashboard     โ”‚
                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“š Usage Examples

Health Monitoring

# Check MCP server health
result = await monitor_server_health(
    service_url="http://localhost:8000/health",
    timeout_seconds=5.0
)
print(f"Status: {result['health_check']['status']}")

Performance Analysis

# Collect system metrics
metrics = await collect_performance_metrics(service_name="my-mcp-server")
print(f"CPU: {metrics['metrics']['cpu_percent']}%")
print(f"Memory: {metrics['metrics']['memory_mb']} MB")

Distributed Tracing

# Record a trace
trace = await trace_mcp_calls(
    operation_name="process_document",
    service_name="ocr-mcp",
    duration_ms=150.5,
    attributes={"file_size": "2.3MB", "format": "PDF"}
)

Generate Reports

# Create performance report
report = await generate_performance_reports(
    service_name="web-mcp",
    days=7
)
print("Performance Summary:", report['summary'])
print("Recommendations:", report['recommendations'])

๐Ÿ”ง Development

Running Tests

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=observability_mcp --cov-report=html

Code Quality

# Format code
black src/

# Lint code
ruff check src/

# Type checking
mypy src/

Docker Development

# Build development image
docker build -t observability-mcp:dev -f Dockerfile.dev .

# Run with hot reload
docker run -p 9090:9090 -v $(pwd):/app observability-mcp:dev

๐Ÿ“Š Performance Benchmarks

FastMCP 2.14.1 Benefits

  • OpenTelemetry Overhead: <1ms per trace
  • Storage Performance: 1000+ metrics/second
  • Memory Usage: 50MB baseline + 10MB per monitored service
  • Concurrent Monitoring: 100+ services simultaneously

Recommended Hardware

  • CPU: 2+ cores for metrics processing
  • RAM: 2GB minimum, 4GB recommended
  • Storage: 10GB for metrics history (30 days retention)

๐Ÿšจ Troubleshooting

Common Issues

Server Won't Start
# Check Python version
python --version  # Should be 3.11+

# Check FastMCP installation
pip show fastmcp  # Should be 2.14.1+

# Check dependencies
pip check
Metrics Not Appearing
# Check Prometheus endpoint
curl http://localhost:9090/metrics

# Verify OpenTelemetry configuration
observability-mcp metrics
High Memory Usage
  • Reduce METRICS_RETENTION_DAYS
  • Implement metric aggregation
  • Monitor with monitor_system_resources
Storage Issues
  • Check available disk space
  • Clean old metrics: rm -rf ~/.observability-mcp/metrics/*
  • Restart server to recreate storage

๐Ÿค Contributing

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Submit a pull request

Code Standards

  • FastMCP 2.14.1+: Use latest features and patterns
  • OpenTelemetry: Follow OTEL best practices
  • Async First: All operations should be async
  • Type Hints: Full type coverage required
  • Documentation: Comprehensive docstrings

Testing Strategy

  • Unit Tests: Core functionality
  • Integration Tests: MCP server interactions
  • Performance Tests: Benchmarking and load testing
  • Chaos Tests: Failure scenario testing

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ™ Acknowledgments

  • FastMCP Team - For the amazing 2.14.1 framework with OpenTelemetry integration
  • OpenTelemetry Community - For the observability standards and tools
  • Prometheus Team - For the metrics collection and alerting system

๐Ÿ”— Related Projects

Built with โค๏ธ using FastMCP 2.14.1 and OpenTelemetry

MCP Server ยท Populars

MCP Server ยท New