yigitkonur

mcp-research-powerpack

Community yigitkonur
Updated

MCP server for deep research — Google, Reddit, scraping, and AI synthesis

MCP server that gives your AI assistant research tools. Google search, Reddit deep-dives, web scraping with LLM extraction, and multi-model deep research — all as MCP tools that chain into each other.

npx mcp-research-powerpack

five tools, zero config to start. each API key you add unlocks more capabilities.

npmnodelicense

tools

tool what it does requires
web_search parallel Google search across 3-100 keywords, CTR-weighted ranking, consensus detection SERPER_API_KEY
search_reddit same engine but filtered to reddit.com, 10-50 queries in parallel SERPER_API_KEY
get_reddit_post fetches 2-50 Reddit posts with full comment trees, optional LLM extraction REDDIT_CLIENT_ID + REDDIT_CLIENT_SECRET
scrape_links scrapes 1-50 URLs with JS rendering fallback, HTML-to-markdown, optional LLM extraction SCRAPEDO_API_KEY
deep_research sends questions to research-capable models (Grok, Gemini) with web search enabled, supports local file attachments OPENROUTER_API_KEY

tools are designed to chain: web_search suggests calling scrape_links, which suggests search_reddit, which suggests get_reddit_post, which suggests deep_research for synthesis.

install

Claude Desktop / Claude Code

add to your MCP config:

{
  "mcpServers": {
    "research-powerpack": {
      "command": "npx",
      "args": ["mcp-research-powerpack"],
      "env": {
        "SERPER_API_KEY": "...",
        "OPENROUTER_API_KEY": "..."
      }
    }
  }
}

from source

git clone https://github.com/yigitkonur/mcp-research-powerpack.git
cd mcp-research-powerpack
pnpm install && pnpm build
pnpm start

HTTP mode

MCP_TRANSPORT=http MCP_PORT=3000 npx mcp-research-powerpack

exposes /mcp (POST/GET/DELETE with session headers) and /health.

API keys

each key unlocks a capability. missing keys silently disable their tools — the server never crashes.

variable enables free tier
SERPER_API_KEY web_search, search_reddit 2,500 searches/mo at serper.dev
REDDIT_CLIENT_ID + REDDIT_CLIENT_SECRET get_reddit_post unlimited (reddit.com/prefs/apps, "script" type)
SCRAPEDO_API_KEY scrape_links 1,000 credits/mo at scrape.do
OPENROUTER_API_KEY deep_research, LLM extraction in scrape/reddit pay-per-token at openrouter.ai

configuration

optional tuning via environment variables:

variable default description
RESEARCH_MODEL x-ai/grok-4-fast primary deep research model
RESEARCH_FALLBACK_MODEL google/gemini-2.5-flash fallback if primary fails
LLM_EXTRACTION_MODEL openai/gpt-oss-120b:nitro model for scrape/reddit LLM extraction
DEFAULT_REASONING_EFFORT high research depth (low, medium, high)
DEFAULT_MAX_URLS 100 max search results per research question (10-200)
API_TIMEOUT_MS 1800000 request timeout in ms (default 30 min)
MCP_TRANSPORT stdio stdio or http
MCP_PORT 3000 port for HTTP mode

how it works

search ranking

results from multiple queries are deduplicated by normalized URL and scored using CTR-weighted position values (position 1 = 100.0, position 10 = 12.56). URLs appearing across multiple queries get a consensus marker. threshold tries >= 3, falls back to >= 2, then >= 1.

Reddit comment budget

global budget of 1,000 comments, max 200 per post. after the first pass, surplus from posts with fewer comments is redistributed to truncated posts in a second fetch pass.

scraping pipeline

three-mode fallback per URL: basic → JS rendering → JS + US geo-targeting. results go through HTML-to-markdown conversion (turndown), then optional LLM extraction with a 100k char input cap and 8,000 token output per URL.

deep research

32,000 token budget divided across questions (1 question = 32k, 10 questions = 3.2k each). Gemini models get google_search tool access. Grok/Perplexity get search_parameters with citations. primary model fails → automatic fallback.

file attachments

deep_research can read local files and include them as context. files over 600 lines are smart-truncated (first 500 + last 100 lines). line numbers preserved.

concurrency

operation parallel limit
web search keywords 8
Reddit search queries 8
Reddit post fetches per batch 5 (batches of 10)
URL scraping per batch 10 (batches of 30)
LLM extraction 3
deep research questions 3

all clients use manual retry with exponential backoff and jitter. the OpenAI SDK's built-in retry is disabled (maxRetries: 0).

project structure

src/
  index.ts                — entry point, STDIO + HTTP transport, signal handling
  worker.ts               — Cloudflare Workers entry (Durable Objects)
  config/
    index.ts              — env parsing (lazy Proxy objects), capability detection
    loader.ts             — YAML → Zod → JSON Schema pipeline, cached
    yaml/tools.yaml       — single source of truth for all tool definitions
  schemas/
    deep-research.ts      — Zod validation for research questions + file attachments
    scrape-links.ts       — Zod validation for URLs, timeout, LLM options
    web-search.ts         — Zod validation for keyword arrays
  tools/
    registry.ts           — tool lookup → capability check → validate → execute
    search.ts             — web_search handler
    reddit.ts             — search_reddit + get_reddit_post handlers
    scrape.ts             — scrape_links handler
    research.ts           — deep_research handler
  clients/
    search.ts             — Serper API client
    reddit.ts             — Reddit OAuth + comment fetching
    scraper.ts            — scrape.do client with fallback modes
    research.ts           — OpenRouter client with model-specific handling
  services/
    llm-processor.ts      — shared LLM extraction (singleton OpenAI client)
    markdown-cleaner.ts   — HTML → markdown via turndown
    file-attachment.ts    — local file reading with line ranges
  utils/
    concurrency.ts        — bounded parallel execution (pMap, pMapSettled)
    url-aggregator.ts     — CTR-weighted scoring and consensus detection
    errors.ts             — error classification, fetchWithTimeout
    logger.ts             — MCP logging protocol
    response.ts           — standardized output formatting

deploy

Cloudflare Workers

npx wrangler deploy

uses Durable Objects with SQLite storage. YAML-based tool definitions are replaced with inline definitions in the worker entry since there's no filesystem.

license

MIT

MCP Server · Populars

MCP Server · New

    klever-io

    Klever MCP Server

    MCP server for Klever blockchain smart contract development

    Community klever-io
    luohy15

    y-agent

    A Simple Agent App

    Community luohy15
    PrefectHQ

    FastMCP 🚀

    🚀 The fast, Pythonic way to build MCP servers and clients.

    Community PrefectHQ
    meal-inc

    bonnard-cli

    Ultra-fast to deploy agentic-first mcp-ready semantic layer. Let your data be like water.

    Community meal-inc
    DevsHero

    🥷 ShadowCrawl MCP — v2.3.0

    🥷 The FREE, Sovereign alternative to Firecrawl & Tavily. Pure Rust Stealth Scraper + Built-in God-Tier Meta-Search for AI Agents. Bypass Cloudflare & DataDome via HITL. Zero-bloat, ultra-clean LLM data. 99.99% Success Rate. 🦀

    Community DevsHero