OwlOCR MCP
MCP (Model Context Protocol) server for PDF and image OCR on macOS. Supports two backends:
- OwlOCR CLI - Higher accuracy (recommended)
- Vision Framework - No external dependencies
Features
- ๐ PDF OCR - Extract text from PDF files page by page with separators
- ๐ผ๏ธ Image OCR - Extract text from PNG, JPEG, and other image formats
- ๐ Multi-language - Korean + English by default (configurable)
- ๐ Dual Backend - Auto-selects OwlOCR if available, falls back to Vision Framework
- โก Async - Non-blocking execution for MCP clients
Benchmark Results
Tested on a 4-page Korean theological document with Hebrew text:
| Metric | Vision Framework | OwlOCR CLI |
|---|---|---|
| Time | 9.87s | 9.30s |
| Time/Page | 2.47s | 2.33s |
| Word Accuracy | 85.62% | 91.79% |
| Character Accuracy | 94.46% | 95.07% |
Winner: OwlOCR CLI - Faster and more accurate.
Requirements
- macOS (uses Apple Vision Framework / OwlOCR.app)
- Python 3.11+
- OwlOCR.app (optional, for better accuracy)
Installation
Using uv (recommended)
git clone https://github.com/yourusername/owlocr-mcp.git
cd owlocr-mcp
uv sync
Using pip
git clone https://github.com/yourusername/owlocr-mcp.git
cd owlocr-mcp
pip install -e .
MCP Client Configuration
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"owlocr": {
"command": "uv",
"args": ["run", "--directory", "/path/to/owlocr-mcp", "owlocr-mcp"]
}
}
}
Generic MCP Client
{
"mcpServers": {
"owlocr": {
"command": "/path/to/owlocr-mcp/.venv/bin/python",
"args": ["-m", "owlocr_mcp.server"]
}
}
}
Available Tools
ocr_pdf_to_text
Extract text from a PDF file.
Parameters:| Parameter | Type | Default | Description ||-----------|------|---------|-------------|| pdf_path | string | required | Absolute path to the PDF file || pages | list[int] | null | Page numbers to process (1-based). If null, all pages || dpi | int | 200 | Resolution for rendering. Higher = better quality but slower || backend | string | "auto" | "auto", "owlocr", or "vision" || languages | list[string] | null | Language codes (Vision only). Default: ["ko-KR", "en-US"] |
Example:
Extract text from /Users/me/document.pdf using OwlOCR
Output:
์ฒซ ๋ฒ์งธ ํ์ด์ง ๋ด์ฉ...
===== Page 2 =====
๋ ๋ฒ์งธ ํ์ด์ง ๋ด์ฉ...
--- OCR Complete: 2 page(s) processed using OwlOCR CLI ---
ocr_image_to_text
Extract text from an image file.
Parameters:| Parameter | Type | Default | Description ||-----------|------|---------|-------------|| image_path | string | required | Absolute path to the image file || backend | string | "auto" | "auto", "owlocr", or "vision" || languages | list[string] | null | Language codes (Vision only) |
check_ocr_backends
Check available OCR backends on the system.
Output:
OCR Backend Status:
โ
Vision Framework: Available (macOS built-in)
โ
OwlOCR CLI: Available (/Applications/OwlOCR.app)
Recommendation: Use backend='owlocr' for best accuracy
Backend Selection
| Backend | Accuracy | Speed | Requirements |
|---|---|---|---|
owlocr |
โญโญโญโญโญ | โญโญโญโญ | OwlOCR.app installed |
vision |
โญโญโญโญ | โญโญโญโญ | None (macOS built-in) |
auto |
Best available | - | Uses OwlOCR if available |
Running the Benchmark
Compare backends on your own PDF:
# Both backends
uv run python benchmark.py /path/to/your.pdf
# With accuracy comparison (requires ground truth)
uv run python benchmark.py /path/to/your.pdf --show-text
# Specific backend only
uv run python benchmark.py /path/to/your.pdf --method owlocr
uv run python benchmark.py /path/to/your.pdf --method vision
Project Structure
owlocr-mcp/
โโโ src/owlocr_mcp/
โ โโโ __init__.py
โ โโโ server.py # MCP server with tools
โ โโโ ocr.py # Vision Framework backend
โ โโโ ocr_owlocr.py # OwlOCR CLI backend
โ โโโ pdf.py # PDF processing utilities
โโโ benchmark.py # Performance comparison script
โโโ pyproject.toml
โโโ README.md
How It Works
OwlOCR Backend
- Render PDF pages to PNG using
pypdfium2 - Copy images to OwlOCR sandbox:
~/Library/Containers/JonLuca-DeCaro.OwlOCR/Data/tmp/ - Run CLI:
/Applications/OwlOCR.app/Contents/MacOS/OwlOCR --cli --input <file> - Combine results with page separators
Vision Framework Backend
- Render PDF pages to PNG using
pypdfium2 - Load as
CIImagevia PyObjC - Create
VNRecognizeTextRequestwith accurate recognition level - Process with
VNImageRequestHandler - Sort results by position and combine
Troubleshooting
"OwlOCR.app not found"
Install OwlOCR from owlocr.com or use backend="vision".
File picker dialog appears
This happens when OwlOCR can't access files outside its sandbox. The MCP server handles this by copying files to the sandbox temp directory automatically.
Poor accuracy on specific languages
For Vision Framework, specify languages explicitly:
ocr_pdf_to_text(pdf_path, languages=["ja-JP", "en-US"])
Supported language codes: ko-KR, en-US, ja-JP, zh-Hans, zh-Hant, etc.
License
MIT License - see LICENSE file.
Acknowledgments
- OwlOCR by JonLuca DeCaro
- MCP Python SDK
- Apple Vision Framework