AiAgentKarl

LLM Benchmark MCP Server

Community AiAgentKarl
Updated

MCP Server for LLM comparison, benchmarks, and pricing — find the best model for any task

LLM Benchmark MCP Server

MCP server that gives AI agents access to LLM benchmark data, pricing comparisons, and model recommendations.

Features

  • compare_models — Side-by-side benchmark comparison of LLMs (MMLU, HumanEval, MATH, GPQA, ARC, HellaSwag)
  • get_model_details — Detailed info about a specific model including strengths/weaknesses
  • recommend_model — Get the best model recommendation for your task and budget
  • list_top_models — Top models ranked by category (coding, math, reasoning, chat)
  • get_pricing — Pricing comparison via OpenRouter API

Supported Models

GPT-4o, GPT-4o-mini, GPT-4 Turbo, o1, o3-mini, Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus, Gemini 2.0 Flash, Gemini 2.0 Pro, Gemini 1.5 Pro, Llama 3.1 (8B/70B/405B), Llama 3.3 70B, Mistral Large, Mistral Small, Mixtral 8x22B, DeepSeek V3, DeepSeek R1, Qwen 2.5 72B

Installation

pip install llm-benchmark-mcp-server

Usage with Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "llm-benchmark": {
      "command": "benchmark-server"
    }
  }
}

Or via uvx (no install needed):

{
  "mcpServers": {
    "llm-benchmark": {
      "command": "uvx",
      "args": ["llm-benchmark-mcp-server"]
    }
  }
}

Example Queries

  • "Compare GPT-4o vs Claude 3.5 Sonnet vs Gemini 2.0 Pro"
  • "Which model is best for coding on a low budget?"
  • "Show me the top 10 models for math"
  • "What does GPT-4o cost compared to Claude?"
  • "Give me details about DeepSeek R1"

Data Sources

  • Benchmarks: Hardcoded from official papers and public leaderboards (MMLU, HumanEval, MATH, GPQA, ARC-Challenge, HellaSwag)
  • Pricing: Live data from OpenRouter API
  • Arena Rankings: Chatbot Arena Leaderboard (when available)

More MCP Servers by AiAgentKarl

Category Servers
🔗 Blockchain Solana
🌍 Data Weather · Germany · Agriculture · Space · Aviation · EU Companies
🔒 Security Cybersecurity · Policy Gateway · Audit Trail
🤖 Agent Infra Memory · Directory · Hub · Reputation
🔬 Research Academic · LLM Benchmark · Legal

→ Full catalog (40+ servers)

License

MIT

MCP Server · Populars

MCP Server · New

    globau

    Firefox DevTools MCP

    Model Context Protocol server for Firefox DevTools - enables AI assistants to inspect and control Firefox browser through the Remote Debugging Protocol

    Community globau
    lyonzin

    Knowledge RAG

    Local RAG System for Claude Code — Hybrid search + Cross-encoder Reranking + Markdown-aware Chunking + 12 MCP Tools. No external servers, pure ONNX in-process.

    Community lyonzin
    nukeop

    nuclear

    Streaming music player that finds free music for you

    Community nukeop
    qualixar

    SuperLocalMemory V3

    World's first local-only AI memory to break 74% retrieval and 60% zero-LLM on LoCoMo. No cloud, no APIs, no data leaves your machine. Additionally, mode C (LLM/Cloud) - 87.7% LoCoMo. Research-backed. arXiv: 2603.14588

    Community qualixar
    proxy-intell

    Facebook Ads Library MCP Server

    MCP Server for Facebook ADs Library - Get instant answers from FB's ad library

    Community proxy-intell