Video MCP Server

Bridge the gap between Claude and video content - An MCP (Model Context Protocol) server that extracts keyframes and transcribes audio from videos, enabling Claude to analyze video files.

Overview

While Claude is multimodal and can analyze images, it cannot directly process video files. This MCP server automatically:

🎬 Extracts keyframes using intelligent scene detection or fixed intervals
🎤 Transcribes audio using OpenAI's Whisper
⚡ Provides structured, timestamped output combining visual and textual information
💾 Caches results for instant follow-up queries
🖼️ Optimized image compression to fit within Claude Code's token limits

Use Cases

📹 Meeting Analysis: Process Teams/Zoom recordings to extract key points and visual content
🎓 Tutorial Understanding: Analyze demo videos and instructional content
🔍 Content Review: Quickly scan through long videos to find specific moments
📊 Presentation Analysis: Extract slides and speaker notes from recorded presentations

Installation

Prerequisites

Python 3.11 or higher
```
python3 --version  # Should be 3.11+
```
FFmpeg (required for video processing)

macOS:
```
brew install ffmpeg
```
Ubuntu/Debian:
```
sudo apt-get update
sudo apt-get install ffmpeg
```
Windows:
- Download from ffmpeg.org
- Add to PATH
Verify FFmpeg installation:
```
ffmpeg -version
```

Install Video MCP Server

Clone repository:

Create a virtual environment (recommended):

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the package:
```
pip install -e .
```

Verify installation:

video-mcp --help  # Should show no errors (will start MCP server)

Configuration

MCP Configuration for Claude Code

Add the following to your Claude Code MCP configuration file:

Location: ~/.claude.json

{
  "mcpServers": {
    "video-mcp": {
      "command": "/path/to/vidmcp/venv/bin/video-mcp",
      "args": [],
      "env": {
        "VIDEO_MCP_FRAME_INTERVAL_SECONDS": "10",
        "VIDEO_MCP_WHISPER_MODEL": "base",
        "VIDEO_MCP_MAX_FRAMES_PER_VIDEO": "100",
        "VIDEO_MCP_USE_SCENE_DETECTION": "true",
        "VIDEO_MCP_SCENE_THRESHOLD": "0.3",
        "VIDEO_MCP_SCENE_MAX_INTERVAL_SECONDS": "30",
        "PYTHONUNBUFFERED": "1"
      }
    }
  }
}

Important Notes:

Adjust the command path to match your virtual environment location
Restart Claude Code after modifying ~/.claude.json

Environment Variables

You can customize behavior using environment variables (all optional):

Variable	Default	Description
`VIDEO_MCP_FRAME_INTERVAL_SECONDS`	`10`	Interval between frames (fixed-interval mode)
`VIDEO_MCP_MAX_FRAMES_PER_VIDEO`	`100`	Maximum frames to extract per video
`VIDEO_MCP_USE_SCENE_DETECTION`	`true`	Use scene detection instead of fixed intervals
`VIDEO_MCP_SCENE_THRESHOLD`	`0.3`	Scene detection sensitivity (0.0-1.0, lower=more sensitive)
`VIDEO_MCP_SCENE_MAX_INTERVAL_SECONDS`	`30`	Maximum seconds between frames (ensures static scenes are captured)
`VIDEO_MCP_WHISPER_MODEL`	`base`	Whisper model (tiny, base, small, medium, large)
`VIDEO_MCP_WHISPER_DEVICE`	`cpu`	Device for Whisper (cpu or cuda)
`VIDEO_MCP_CACHE_DIR`	`~/.cache/video-mcp`	Directory for cache storage
`VIDEO_MCP_CACHE_MAX_AGE_DAYS`	`30`	Maximum age of cache entries
`VIDEO_MCP_CACHE_MAX_SIZE_GB`	`10`	Maximum total cache size

Scene Detection

Enabled by default - Intelligently extracts frames at scene changes instead of fixed intervals.

Settings:

VIDEO_MCP_USE_SCENE_DETECTION=true - Enable scene detection
VIDEO_MCP_SCENE_THRESHOLD=0.3 - Sensitivity (0.0-1.0)
- 0.1-0.2: Very sensitive (subtle changes, camera movements)
- 0.3: Default (balanced, most content)
- 0.4-0.5: Conservative (only major scene changes)
VIDEO_MCP_SCENE_MAX_INTERVAL_SECONDS=30 - Maximum seconds between frames
- Ensures static content is captured even if scene doesn't change
- Prevents missing long presentations, talking heads, or static slides
- Set to 0 to disable (pure scene detection only)

Benefits:

Captures key moments automatically
Avoids redundant frames in static scenes
Max interval ensures no content is missed during static periods
Better coverage for dynamic content (movies, vlogs, edited videos)

Example:If someone talks for 2 minutes without moving (no scene change), max interval of 30s ensures you still get frames at 0s, 30s, 60s, 90s, 120s.

When to use fixed intervals:

You need perfectly predictable, evenly-spaced coverage
Content is extremely static throughout

Whisper Model Selection

Choose based on your needs:

Model	Size	Speed	Accuracy	Best For
`tiny`	39M	Fastest	Good	Quick scans, low-end hardware
`base`	74M	Fast	Better	Recommended default
`small`	244M	Medium	Great	High accuracy needs
`medium`	769M	Slow	Excellent	Production quality
`large`	1550M	Slowest	Best	Maximum accuracy

Token Limits and Video Length

Claude Code has a 25,000 token limit for MCP responses. Current settings are optimized to fit within this limit.

Image Token Calculation:

Formula (from Anthropic docs): tokens = (width × height) / 750
For 640x360 frames: (640 × 360) / 750 = ~307 tokens per frame

Token Budget Breakdown:

Frame images: ~307 tokens each
Transcript text: ~1 token per 4 characters
Metadata: ~200 tokens

Capacity at 25K Limit:

Safe limit: ~70 frames (accounting for transcript)
Maximum: ~80 frames (minimal transcript)

Recommended Video Lengths (10-second intervals):

2-5 minutes: 12-30 frames (~4-10K tokens) ✅ Always safe
5-10 minutes: 30-60 frames (~10-20K tokens) ✅ Usually safe
10-12 minutes: 60-72 frames (~20-24K tokens) ⚠️ Approaching limit
12+ minutes: 72+ frames ❌ Will likely truncate

If you hit truncation:

Increase scene threshold: VIDEO_MCP_SCENE_THRESHOLD=0.4
Reduce max frames: VIDEO_MCP_MAX_FRAMES_PER_VIDEO=60
Increase frame interval: VIDEO_MCP_FRAME_INTERVAL_SECONDS=15
Process video in segments

Usage

Basic Usage with Claude Code

Once configured, you can ask Claude to process videos:

Process this video: /.../meeting_recording.mp4

Claude will automatically:

Extract frames every 10 seconds (configurable)
Transcribe the audio
Present a timeline with frames and transcript
Cache results for future queries

Follow-up Queries

Ask questions about the video content:

What were the main points discussed in the meeting?

Show me the slides that were presented

What was said around the 5-minute mark?

These follow-up queries are instant - they use the cached processed video data.

Custom Processing Options

You can customize processing parameters:

Process this video with 5-second intervals: /path/to/video.mp4

Process using the small Whisper model: /path/to/video.mp4

List Cached Videos

List all processed videos

Shows all videos in cache with their settings and cache timestamps.

Tools

The MCP server provides two tools:

1. `process_video`

Process a video file into frames and transcription.

Parameters:

video_path (required): Absolute path to video file
frame_interval (optional): Interval between frames (fixed-interval mode only)
use_scene_detection (optional): Use scene detection (default: true)
scene_threshold (optional): Scene detection sensitivity 0.0-1.0 (default: 0.3)
whisper_model (optional): Whisper model to use
max_frames (optional): Maximum number of frames to extract (default: 100)

Example:

{
  "video_path": "/path/to/video.mp4",
  "frame_interval": 5,
  "whisper_model": "small",
  "max_frames": 100
}

2. `list_processed_videos`

List all videos currently in cache.

Returns: Summary of cached videos with paths, timestamps, and settings.

Resources

The server also provides MCP resources for cached videos:

URI Format: video:///absolute/path/to/video.mp4

Cached videos are automatically exposed as resources that Claude can access.

Cache Management

Cache Location

By default, cached data is stored in:

~/.cache/video-mcp/
├── {cache_key}/
│   ├── metadata.json
│   ├── transcript.json
│   ├── timeline.json
│   └── frames/
│       ├── frame_0000.jpg
│       ├── frame_0001.jpg
│       └── ...
└── cache_index.json

Cache Key Format

Cache keys are generated from:

Video file hash (MD5 of first 1MB + file size)
Processing settings (scene detection/interval, Whisper model)

Fixed Interval Mode:Format: {hash}_{interval}s_{model}Example: abc123def456_10s_base

Scene Detection Mode:Format: {hash}_scene_{threshold}_{model}Example: abc123def456_scene_0.3_base

Manual Cache Management

Check cache size:

du -sh ~/.cache/video-mcp

Clear cache:

rm -rf ~/.cache/video-mcp

View cache index:

cat ~/.cache/video-mcp/cache_index.json | python3 -m json.tool

Automatic Cleanup

The cache automatically cleans up:

Entries older than 30 days (configurable)
When total size exceeds 10GB (configurable)

Supported Formats

MP4 (.mp4)
MOV (.mov)
AVI (.avi)
MKV (.mkv)
WebM (.webm)

Most common video codecs are supported via FFmpeg.

Performance

Processing Time

Approximate times for a 10-minute video (on Apple M1):

Task	Time
Frame extraction (10s intervals)	~10s
Audio transcription (base model)	~60s
Timeline building	~1s
Total	~70s

Cache Benefits

First query: ~70s (processes video)
Follow-up queries: Instant (uses cache)

Troubleshooting

Corporate Environments / JFrog Artifactory

Error: Could not find a version that satisfies the requirement fastmcp or No matching distribution found for fastmcp

Solutions:

Install with PyPI fallback:
```
pip install -e . --extra-index-url https://pypi.org/simple
```
This checks your JFrog repo first, then falls back to public PyPI for missing packages.

Install FastMCP directly from PyPI first:

# Install FastMCP from public PyPI
pip install --index-url https://pypi.org/simple fastmcp

# Then install video-mcp (uses JFrog for other dependencies)
pip install -e .

FFmpeg Not Found

Error: FFmpeg not found. Please install it...

Solution:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Verify
ffmpeg -version

Video File Not Found

Error: Video file not found: /path/to/video.mp4

Solution:

Provide absolute path, not relative path
Check file exists: ls -la /path/to/video.mp4
Use tab completion to avoid typos

Unsupported Format

Error: Unsupported video format: .flv

Solution:

Convert to MP4: ffmpeg -i input.flv output.mp4
Supported formats: MP4, MOV, AVI, MKV, WebM

Whisper Model Download Issues

Error: Failed to load Whisper model...

Solution:

Models download automatically on first use
Requires internet connection
Check disk space (models: 39MB - 1.5GB)
Download location: ~/.cache/whisper/

Out of Memory

Error: Processing fails on large videos

Solution:

Reduce max_frames: VIDEO_MCP_MAX_FRAMES_PER_VIDEO=30
Increase frame_interval: VIDEO_MCP_FRAME_INTERVAL_SECONDS=15
Use smaller Whisper model: VIDEO_MCP_WHISPER_MODEL=tiny

Slow Processing

Issue: Transcription takes too long

Solution:

Use smaller Whisper model (tiny or base)
For GPU: Set VIDEO_MCP_WHISPER_DEVICE=cuda (requires CUDA setup)
Process shorter video segments

Development

Project Structure

vidmcp/
├── src/vidmcp/
│   ├── __init__.py          # Package initialization
│   ├── server.py            # MCP server implementation
│   ├── config.py            # Configuration management
│   ├── models.py            # Data models
│   ├── video_processor.py   # FFmpeg integration
│   ├── audio_processor.py   # Whisper integration
│   ├── timeline_builder.py  # Timeline synchronization
│   └── cache.py             # Cache management
├── tests/                   # Test suite
├── pyproject.toml          # Project configuration
└── README.md               # This file

Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=vidmcp

Code Formatting

# Format code
black src/vidmcp

# Lint code
ruff check src/vidmcp

Architecture

┌─────────────────────────────────────────┐
│         Claude Code (Client)            │
└───────────────┬─────────────────────────┘
                │ MCP Protocol
┌───────────────▼─────────────────────────┐
│         Video MCP Server                │
│  ┌─────────────────────────────────┐   │
│  │  MCP Interface Layer            │   │
│  │  • process_video tool           │   │
│  │  • list_processed_videos tool   │   │
│  │  • video:// resources           │   │
│  └──────────┬──────────────────────┘   │
│             │                           │
│  ┌──────────▼──────────────────────┐   │
│  │  Processing Pipeline            │   │
│  │  • VideoProcessor (FFmpeg)      │   │
│  │  • AudioProcessor (Whisper)     │   │
│  │  • TimelineBuilder              │   │
│  └──────────┬──────────────────────┘   │
│             │                           │
│  ┌──────────▼──────────────────────┐   │
│  │  Cache Layer                    │   │
│  │  • Hash-based caching           │   │
│  │  • Persistent storage           │   │
│  │  • Auto cleanup                 │   │
│  └─────────────────────────────────┘   │
└─────────────────────────────────────────┘

Future Enhancements

Planned features:

GPU Support: CUDA acceleration for faster Whisper transcription
Video Chunking: Handle very long videos (>1 hour) efficiently
YouTube Support: Download and process YouTube videos directly
Speaker Diarization: Identify different speakers in transcript
Summary Mode: Generate video summaries with fewer frames
Custom Frame Selection: User-specified exact timestamps for frames
Faster Whisper: Use faster-whisper library for 4x speed improvement
Adaptive Scene Detection: Auto-tune threshold based on content

Acknowledgments

MCP Protocol: Anthropic's Model Context Protocol
FFmpeg: Video processing
Whisper: OpenAI's speech recognition
Claude: Anthropic's AI assistant

Made with ❤️ for the Claude Code community

Video MCP Server

Overview

Use Cases

Installation

Prerequisites

Install Video MCP Server

Configuration

MCP Configuration for Claude Code

Environment Variables

Scene Detection

Whisper Model Selection

Token Limits and Video Length

Usage

Basic Usage with Claude Code

Follow-up Queries

Custom Processing Options

List Cached Videos

Tools

1. process_video

2. list_processed_videos

Resources

Cache Management

Cache Location

Cache Key Format

Manual Cache Management

Automatic Cleanup

Supported Formats

Performance

Processing Time

Cache Benefits

Troubleshooting

Corporate Environments / JFrog Artifactory

FFmpeg Not Found

Video File Not Found

Unsupported Format

Whisper Model Download Issues

Out of Memory

Slow Processing

Development

Project Structure

Running Tests

Code Formatting

Architecture

Future Enhancements

Acknowledgments

MCP Server · Populars

MCP Server · New

1. `process_video`

2. `list_processed_videos`