lmorg

mcp-web-scraper

Community lmorg
Updated

A package for scraping websites for use in AI tools. Supports langchaingo and MCP

mcp-web-scraper

This package uses Google Chrome's headless APIs to scrape web pages for AI/LLM agents.

Because it uses Chrome as its default user agent, any sites that require Javascript (for example, single page applications) should also be parsable with this tool.

It supports being called either from Go (go lang) via LangChainGo, or as an MCP server.

MCP Server

First compile the code using go:

go build .

Claude Desktop

{
  "mcpServers": {
    "mcp-web-scraper": {
      "command": "/path/to/mcp-web-scraper",
      "args": []
    }
  }
}

Visual Studio Code

{
  "mcp": {
    "servers": {
      "mcp-web-scraper": {
        "command": "/path/to/mcp-web-scraper",
        "args": []
      }
    }
  }
}

LangChainGo tool

Integration into langchain is easy:

import 	"github.com/lmorg/mcp-web-scraper/langchain"

func example() {
    scraper := langchain.NewScraper()
}

Please consult the langchaingo docs for how to use tools with their libraries.

Fallback Modes

If Google Chrome is not installed

If you do not have Google Chrome installed, then mcp-web-scraper will fallback to use Go's HTTP user agent.

This will work in the majority of cases, however you might not get any content for sites that requires Javascript to render.

MCP Server · Populars

MCP Server · New