Crawl4AI MCP Server
π High-performance MCP Server for Crawl4AI - Enable AI assistants to access web scraping, crawling, and deep research via Model Context Protocol. Faster and more efficient than FireCrawl!
Overview
This project implements a custom Model Context Protocol (MCP) Server that integrates with Crawl4AI, an open-source web scraping and crawling library. The server is deployed as a remote MCP server on CloudFlare Workers, allowing AI assistants like Claude to access Crawl4AI's powerful web scraping capabilities.
Features
- π Single Webpage Scraping: Extract content from individual webpages
- π Deep Research: Conduct comprehensive research across multiple pages
- πΊοΈ URL Discovery: Map and discover URLs from a starting point
- πΈοΈ Asynchronous Crawling: Crawl entire websites efficiently
- π Structured Data Extraction: Extract specific data using CSS selectors or LLM-based strategies
- π Authentication Options: Secure access via OAuth or API key (Bearer token)
Project Structure
crawl4ai-mcp/
βββ src/
β βββ index.ts # Main entry point with OAuth provider setup
β βββ auth-handler.ts # Authentication handler
β βββ mcp-server.ts # MCP server implementation
β βββ crawl4ai-adapter.ts # Adapter for Crawl4AI API
β βββ tool-schemas/ # MCP tool schema definitions
β β βββ [...].ts # Tool schemas
β βββ utils/ # Utility functions
βββ tests/ # Test cases
βββ wrangler.toml # CloudFlare Workers configuration
βββ package.json # Node.js dependencies
Development with Claude Code
This project is designed to be developed using Claude Code, with multiple sessions working on different components. Each issue in the repository corresponds to a specific component that can be implemented by a Claude Code session.
Setup Instructions
Create a Claude Code Session for an Issue
# Clone the repository git clone https://github.com/BjornMelin/crawl4ai-mcp-server.git cd crawl4ai-mcp-server # Create a new branch for the issue you want to work on git checkout -b feature/issue-name # Start a Claude Code session claude code
Connect Claude to the Issue
In the Claude Code session, provide context about the issue:
I'm working on issue #X (Title) from the crawl4ai-mcp-server repository. The goal is to implement [feature]. Please help me implement this component following the project architecture and best practices.
Follow Conventional Commits
When making commits, follow the conventional commits format:
feat: Add new feature fix: Fix bug docs: Update documentation chore: Update dependencies test: Add tests
Create Pull Requests
After completing an issue:
# Push your branch git push origin feature/issue-name # Create a pull request using GitHub CLI or web interface gh pr create --title "feat: Implement feature" --body "Closes #X" --base main
Issues to Implement
Project Setup and Configuration (Issue #1)
- Initialize CloudFlare Worker project
- Set up TypeScript configuration
- Create package.json and wrangler.toml
MCP Server and Tool Schemas (Issue #2)
- Implement MCP server with McpAgent
- Define tool schemas for Crawl4AI capabilities
Crawl4AI Adapter (Issue #3)
- Create adapter for Crawl4AI operations
- Implement error handling and response formatting
Authentication (Issue #4)
- Implement OAuth authentication with workers-oauth-provider
- Add API key authentication using Bearer tokens
- Create login page and token management
Main Worker Entry Point (Issue #5)
- Tie everything together with the main entry point
- Configure the OAuth provider and routing
Utility Functions (Issue #6)
- Implement response formatting and error handling utilities
Testing, Deployment, and Documentation (Issue #7)
- Set up testing and deployment workflows
- Create comprehensive documentation
License
MIT