guimatheus92

mcp-video-analyzer

Community guimatheus92
Updated

MCP server for video analysis — extracts transcripts, key frames, OCR text, and metadata from video URLs. Supports Loom and direct video files.

mcp-video-analyzer

MCP server for video analysis — extracts transcripts, key frames, and metadata from video URLs. Supports Loom, direct video files (.mp4, .webm), and more.

No existing video MCP combines transcripts + visual frames + metadata in one tool. This one does.

Quick Start

# One-command install for Claude Code
claude mcp add video-analyzer npx mcp-video-analyzer@latest

Or manually add to your MCP config (Claude Desktop, Cursor, VS Code):

{
  "mcpServers": {
    "video-analyzer": {
      "command": "npx",
      "args": ["mcp-video-analyzer@latest"]
    }
  }
}

Tools

analyze_video — Full video analysis

Extracts everything from a video URL in one call:

> Analyze this video: https://www.loom.com/share/abc123...

Returns:

  • Transcript with timestamps and speakers
  • Key frames extracted via scene-change detection (automatically deduplicated)
  • OCR text extracted from frames (code, error messages, UI text visible on screen)
  • Annotated timeline merging transcript + frames + OCR into a unified "what happened when" view
  • Metadata (title, duration, platform)
  • Comments from viewers
  • Chapters and AI summary (when available)

The AI will automatically call this tool when it sees a video URL — no need to ask.

Options:

  • maxFrames (1-50, default 20) — cap on extracted frames
  • threshold (0.0-1.0, default 0.1) — scene-change sensitivity. Use 0.1 for screencasts/demos, 0.3 for live-action video
  • skipFrames (boolean) — skip frame extraction for transcript-only analysis (not recommended — frames are critical for understanding)

get_frame_at — Single frame at a timestamp

> Show me the frame at 1:23 in this video

The AI reads the transcript, spots a critical moment, and requests the exact frame to see what's on screen.

get_frame_burst — N frames in a time range

> Show me 10 frames between 0:15 and 0:17 of this video

For motion, vibration, animations, or fast scrolling — burst mode captures N frames in a narrow window so the AI can see frame-by-frame changes.

Supported Platforms

Platform Transcript Metadata Comments Frames Auth
Loom Yes Yes Yes Yes None
Direct URL (.mp4, .webm) No Duration only No Yes None

Frame Extraction Strategies

Frame extraction uses a two-strategy fallback chain — no single dependency is required:

Strategy How it works Speed Requirements
yt-dlp + ffmpeg (primary) Downloads video, extracts frames via scene detection Fast, precise yt-dlp (pip install yt-dlp)
Browser (fallback) Opens video in headless Chrome, seeks to timestamps, takes screenshots Slower, no download needed Chrome or Chromium installed

The fallback is automatic — if yt-dlp is not available, the server tries browser-based extraction via puppeteer-core. If neither is available, analysis still returns transcript + metadata + comments, just no frames.

Post-Processing Pipeline

After frame extraction, the pipeline automatically applies:

Step What it does Why
Frame deduplication Removes near-identical consecutive frames using perceptual hashing (dHash + Hamming distance) Screencasts often have long static moments — dedup removes redundant frames, saving tokens
OCR Extracts text visible on screen from each frame (via tesseract.js) Captures code, error messages, terminal output, UI text that the transcript doesn't cover
Annotated timeline Merges transcript timestamps + frame timestamps + OCR text into a single chronological view Gives the AI a unified "what was said, what changed visually, and what text appeared" at each moment

The OCR step requires tesseract.js (included as a dependency). If it fails to load, analysis continues without OCR — no frames or transcript are lost.

Complementary Tools

Chrome DevTools MCP

For live web debugging alongside video analysis, pair this server with the Chrome DevTools MCP:

claude mcp add chrome-devtools npx @anthropic-ai/mcp-devtools@latest

When to use each:

Scenario Tool
Bug report recorded as a Loom video mcp-video-analyzer — extract transcript, frames, and error text from the recording
Live debugging a web page Chrome DevTools MCP — inspect DOM, console, network, take screenshots
Video shows UI issue, need to reproduce it Use both: analyze the video first, then open the page in Chrome DevTools to reproduce

The two MCPs complement each other: video analyzer understands recorded content, DevTools interacts with live pages.

Sample Output

analyze_video with a Loom URL (skipFrames: true)

{
  "metadata": {
    "platform": "loom",
    "title": "Bug: Cart total not updating",
    "description": "Demonstrating the cart total bug on the checkout page",
    "duration": 154.5,
    "durationFormatted": "2:34",
    "url": "https://www.loom.com/share/abc123..."
  },
  "transcript": [
    { "time": "0:05", "text": "So when I click add to cart..." },
    { "time": "0:12", "text": "The total stays at zero..." },
    { "time": "0:18", "speaker": "Guilherme", "text": "Let me show you the console..." }
  ],
  "comments": [
    { "author": "John", "text": "This also happens on mobile", "time": "0:12" },
    { "author": "Sarah", "text": "Confirmed on iOS Safari too" }
  ],
  "ocrResults": [
    { "time": "0:18", "text": "TypeError: Cannot read property 'total' of undefined", "confidence": 92 }
  ],
  "timeline": [
    { "time": "0:05", "seconds": 5, "transcript": "So when I click add to cart...", "frameIndex": 0 },
    { "time": "0:12", "seconds": 12, "transcript": "The total stays at zero...", "frameIndex": 1 },
    { "time": "0:18", "seconds": 18, "transcript": "Let me show you the console...", "frameIndex": 2, "ocrText": "TypeError: Cannot read property 'total' of undefined" }
  ],
  "frameCount": 3,
  "warnings": []
}

get_frame_at with a direct video URL

Returns the frame as an inline image that the AI can see and analyze.

Development

# Install dependencies
npm install

# Run all checks (format, lint, typecheck, knip, tests)
npm run check

# Build
npm run build

# Run E2E tests (requires network)
npm run test:e2e

# Open MCP Inspector for manual testing
npm run inspect

Architecture

src/
├── index.ts                    # Entry point (shebang + stdio)
├── server.ts                   # FastMCP server + tool registration
├── tools/                      # MCP tool definitions
│   ├── analyze-video.ts
│   ├── get-frame-at.ts
│   └── get-frame-burst.ts
├── adapters/                   # Platform-specific logic
│   ├── adapter.interface.ts    # IVideoAdapter interface + registry
│   ├── loom.adapter.ts         # Loom: authless GraphQL
│   └── direct.adapter.ts      # Direct URL: any mp4/webm link
├── processors/                 # Shared processing
│   ├── frame-extractor.ts      # ffmpeg scene detection + extraction
│   ├── browser-frame-extractor.ts # Headless Chrome fallback for frames
│   ├── image-optimizer.ts      # sharp resize/compress
│   ├── frame-dedup.ts          # Perceptual dedup (dHash + Hamming distance)
│   ├── frame-ocr.ts            # OCR text extraction (tesseract.js)
│   └── annotated-timeline.ts   # Unified timeline (transcript + frames + OCR)
├── utils/
│   ├── url-detector.ts         # Platform detection from URL
│   ├── vtt-parser.ts           # WebVTT → transcript entries
│   └── temp-files.ts           # Temp directory management
└── types.ts                    # Shared TypeScript interfaces

License

MIT

MCP Server · Populars

MCP Server · New

    RelayPlane

    @relayplane/proxy

    Open source cost intelligence proxy for AI agents. Cut costs ~80% with smart model routing. Dashboard, policy engine, 11 providers. MIT licensed.

    Community RelayPlane
    civyk-official

    WinWright

    Playwright-style MCP server for Windows desktop, system, and browser automation. 110 tools for WPF, WinForms, Win32, Chrome/Edge via Model Context Protocol.

    Community civyk-official
    mavdol

    Capsule

    A secure, durable runtime for AI agents. Run untrusted code in isolated WebAssembly sandboxes.

    Community mavdol
    easyshell-ai

    EasyShell

    Lightweight server management & intelligent ops platform with Docker one-click deployment, batch script execution, web terminal, and AI-powered operations.

    Community easyshell-ai
    AVIDS2

    Memorix

    Cross-Agent Memory Bridge Persistent memory for AI coding agents across 10 IDEs (Cursor, Windsurf, Claude Code, Codex, Copilot, Kiro, Antigravity, OpenCode, Trae, Gemini CLI) via MCP. Team collaboration, auto-cleanup, mini-skills, workspace sync. Never re-explain your project again.

    Community AVIDS2