yc9yc

Spider MCP - Web Search Crawler Service

Community yc9yc
Updated

Spider MCP - Web Search and Web Scraping Tool Without APIs

Spider MCP - Web Search Crawler Service

A web search MCP service based on pure crawler technology, built with Node.js.

Features

  • โŒ No Official API Required: Completely based on crawler technology, no dependency on third-party official APIs
  • ๐Ÿ” Intelligent Search: Supports Bing web and news search
  • ๐Ÿ“ฐ News Search: Built-in news search with time filtering
  • ๐Ÿ•ท๏ธ Pure Crawler: No official API dependency, uses Puppeteer for web scraping
  • ๐Ÿš€ High Performance: Supports batch web scraping
  • ๐Ÿ“Š Health Monitoring: Complete health check and metrics monitoring
  • ๐Ÿ“ Structured Logging: Uses Winston for structured logs
  • ๐Ÿ”’ Anti-Detection: Supports User-Agent rotation and other anti-bot measures
  • ๐Ÿ”— Smart URL Cleaning: Automatically cleans promotional parameters while preserving essential information

Tech Stack

  • Node.js (>= 18.0.0)
  • Express.js - Web framework
  • Puppeteer - Browser automation
  • Cheerio - HTML parsing
  • Axios - HTTP client
  • Winston - Logging
  • @modelcontextprotocol/sdk - MCP protocol support

Quick Start

1. Install dependencies

npm install

or use pnpm

pnpm install

2. Download Puppeteer browser

npx puppeteer browsers install chrome

3. Environment configuration

Copy and configure the environment variables file:

cp .env.example .env

Edit the .env file according to your needs.

4. Start the service

Development mode:

npm run dev

Production mode:

npm start

The service will start at http://localhost:3000.

MCP Tools

web_search

Unified search tool supporting both web and news search:

  • Web Search: searchType: "web"
  • News Search: searchType: "news" with time filtering
  • Note: searchType is a required parameter and must be explicitly specified
Usage Examples:
# Web search
Use web_search tool to search "Node.js tutorial" with searchType set to web, return 10 results

# News search
Use web_search tool to search "tech news" with searchType set to news, return 5 results from past 24 hours

Other Tools

  • get_webpage_content: Get webpage content and convert to specified format
  • get_webpage_source: Get raw HTML source code of webpage
  • batch_webpage_scrape: Batch scrape multiple webpages

MCP Configuration

Chatbox Configuration

Create mcp-config.json file in Chatbox:

{
  "mcpServers": {
    "spider-mcp": {
      "command": "node",
      "args": ["src/mcp/server.js"],
      "env": {
        "NODE_ENV": "production"
      },
      "description": "Spider MCP - Web search and webpage scraping tools",
      "capabilities": {
        "tools": {}
      }
    }
  }
}

Other MCP Clients

{
  "mcpServers": {
    "spider-mcp": {
      "command": "node",
      "args": ["path/to/spider-mcp/src/mcp/server.js"]
    }
  }
}

Important Notes

  1. Anti-bot Measures: This service uses various techniques to avoid detection, but still needs to comply with robots.txt and terms of use
  2. Rate Limiting: It's recommended to control request frequency reasonably to avoid putting pressure on target websites
  3. Legal Compliance: Please ensure compliance with local laws and website terms of use when using this service
  4. Resource Consumption: Puppeteer will start Chrome browser, please pay attention to memory and CPU usage
  5. URL Cleaning: Automatically cleans promotional parameters but may affect some special link functionality

Development

Project Structure

spider-mcp/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ index.js          # Main entry file
โ”‚   โ”œโ”€โ”€ mcp/
โ”‚   โ”‚   โ””โ”€โ”€ server.js     # MCP server
โ”‚   โ”œโ”€โ”€ routes/           # Route definitions
โ”‚   โ”‚   โ”œโ”€โ”€ search.js     # Search routes
โ”‚   โ”‚   โ””โ”€โ”€ health.js     # Health check routes
โ”‚   โ”œโ”€โ”€ services/         # Business logic
โ”‚   โ”‚   โ””โ”€โ”€ searchService.js # Search service
โ”‚   โ””โ”€โ”€ utils/            # Utility functions
โ”‚       โ””โ”€โ”€ logger.js     # Logging utility
โ”œโ”€โ”€ logs/                 # Log files directory
โ”œโ”€โ”€ tests/                # Test files
โ”œโ”€โ”€ package.json          # Project configuration
โ”œโ”€โ”€ .env.example          # Environment variables example
โ”œโ”€โ”€ mcp-config.json       # MCP configuration example
โ””โ”€โ”€ README.md             # Project documentation

License

MIT License

Contributing

Issues and Pull Requests are welcome!

MCP Server ยท Populars

MCP Server ยท New

    smart-mcp-proxy

    MCPProxy โ€“ Smart Proxy for AI Agents

    Supercharge AI Agents, Safely

    Community smart-mcp-proxy
    wowyuarm

    File Ops

    A local file operations skill for AI agents: convert, inspect, archive, and extract text

    Community wowyuarm
    cyanheads

    @cyanheads/mcp-ts-core

    TypeScript template for building Model Context Protocol (MCP) servers. Ships with declarative tools/resources, pluggable auth, multi-backend storage, OpenTelemetry observability, and first-class support for both local and edge (Cloudflare Workers) runtimes.

    Community cyanheads
    bitbonsai

    MCP-Vault

    A lightweight Model Context Protocol (MCP) server for safe Obsidian vault access

    Community bitbonsai
    KeyID-AI

    @keyid/agent-kit

    Give Claude/Cursor email powers. 27 MCP tools โ€” inbox, send, reply, contacts, search. Free, no signup.

    Community KeyID-AI