achetronic

browse-mcp

Community achetronic
Updated

Give your AI the tools to browse Internet

A Model Context Protocol server that gives AI assistants access to the internet.Search, fetch pages, download files. No API key required to get started.

What it does • Getting Started • Tools • Providers • Security • Limitations • Contributing

🎯 What can it do?

This MCP gives your AI assistant real internet access:

  • Search the web using DuckDuckGo (no setup), Tavily or Serper (with API key)
  • Fetch any URL and get the content as clean readable text
  • Download files from the web directly to disk
  • Restrict which URLs each user can access based on their JWT claims

🚀 Getting started

1. Create a config file

Before running anything, create a config.yaml. The transport section defines how the MCP server is exposed.

STDIO — simplest option, no network exposure. Ideal for local use with tools like Claude Desktop or Cursor.

server:
  name: "browse-mcp"
  version: "0.1.0"
  transport:
    type: "stdio"

web:
  default_provider: "duckduckgo"
  download_dir: "/data"  # optional — restricts web_download to this directory

HTTP — exposes the server over the network. Required for multi-user setups, production deployments, or when you need JWT auth and URL policies.

server:
  name: "browse-mcp"
  version: "0.1.0"
  transport:
    type: "http"
    http:
      host: ":8080"

middleware:
  access_logs:
    redacted_headers: ["Authorization"]
  jwt:
    enabled: true
    jwks_uri: "https://your-idp.com/.well-known/jwks.json"
    cache_interval: 5m

web:
  default_provider: "tavily"
  providers:
    tavily:
      api_key: "$TAVILY_API_KEY"

See docs/config-http.yaml for the full example including policies.

2. Run it

Binary — lower overhead, direct access to the host filesystem (useful if you use web_download to save files locally).

go mod tidy
make build
./bin/browse-mcp -config config.yaml

Docker — fully isolated, no host dependencies. The downloaded files go inside the container unless you mount a volume.

docker build -t browse-mcp .
docker run \
  -v $(pwd)/config.yaml:/config/config.yaml \
  -v $(pwd)/downloads:/downloads \
  browse-mcp

🛠️ Available tools

Tool What it does
web_search Search the web — returns title, URL and snippet per result
web_fetch Fetch a URL and return clean readable text (HTML noise removed)
web_download Download a file from a URL and save it to disk

Recommended flow

1. web_search  → find relevant URLs
2. web_fetch   → read the full content of the best results
3. web_download → save files you need to keep

🔍 Search providers

Provider API Key Notes
duckduckgo No Default. Works out of the box. May occasionally rate-limit.
tavily Yes (tavily.com) Built for AI. 1,000 credits/month free.
serper Yes (serper.dev) Scrapes Google. Paid, credit-based.

Switch provider per-request by passing provider to web_search, or set a default in config.

🔐 Security (HTTP mode)

When running in HTTP mode, Browse MCP supports a full security stack:

JWT validation

Validates incoming JWTs against a JWKS endpoint. The JWKS is fetched from jwks_uri and cached. Tokens are always read from the Authorization: Bearer header.

allow_conditions are CEL expressions evaluated against the JWT payload. All must return true. Use these for coarse-grained checks like verifying the issuer. Fine-grained per-tool and per-URL restrictions are configured separately under policies.

middleware:
  jwt:
    enabled: true
    jwks_uri: "https://your-idp.com/.well-known/jwks.json"
    cache_interval: 5m
    allow_conditions:
      - expression: 'payload.iss == "https://your-idp.com"'

Access logs

Logs every request with method, URL, duration and headers. Sensitive headers can be redacted or excluded entirely.

middleware:
  access_logs:
    excluded_headers: ["X-Internal-Token"]
    redacted_headers: ["Authorization"]

Tool policies

Control which tools each group or claim can call. Uses CEL expressions evaluated against the JWT payload.

policies:
  tools:
    - expression: 'payload.groups.exists(g, g == "admins")'
      allowed_tools: ["*"]
    - expression: 'payload.scope.contains("web:read")'
      allowed_tools: ["web_search", "web_fetch"]

Supported patterns: exact match ("web_fetch"), wildcard ("*"), prefix ("web_*").

URL policies

Control which domains each group can access via web_fetch and web_download. CEL expression against JWT payload, domain allowlist with wildcard subdomain support.

policies:
  web:
    - expression: 'payload.groups.exists(g, g == "admins")'
      allowed_domains: ["*"]

    - expression: 'payload.groups.exists(g, g == "developers")'
      allowed_domains:
        - "*.github.com"
        - "docs.k8s.io"
        - "pkg.go.dev"

    - expression: 'payload.scope.contains("web:restricted")'
      allowed_domains:
        - "internal.company.com"
        - "*.internal.company.com"

web_search is not URL-restricted by design — results are snippets, no content is fetched. Restriction applies at fetch/download time.

⚠️ Limitations

Fetch size — The fetcher reads up to 5MB per request. Pages larger than 50KB are saved to a temp file instead of returned inline.

JavaScript — The fetcher doesn't run JS. Pages that render entirely client-side will return little or no content.

DuckDuckGo — Works without a key but may rate-limit under heavy use. Switch to Tavily for production.

Protocols — Only HTTP and HTTPS are supported.

Download directory — By default web_download accepts any path. Set web.download_dir to restrict downloads to a specific directory. Path traversal attempts are rejected.

🤝 Contributing

Humans — Issues, PRs and love are welcome.

AI agents — Read AGENTS.md before touching anything.

📄 License

Apache 2.0

MCP Server · Populars

MCP Server · New