unlomtrois

little-librarian

Community unlomtrois
Updated

pplx-embed powered mcp

little-librarian

A local MCP server that indexes .epub files and exposes semantic search toolsbacked by pplx-embed-context-v1 (late chunking) and Qdrant.

MCP client (Claude Desktop, Claude Code, …)
     │
     ▼  tool calls via MCP
 server.py
  pplx-embed-context-v1-0.6b  +  Qdrant (local)

Files

File Role
server.py MCP server — epub ingestion, embedding, search, Qdrant storage
code_librarian.py Separate MCP server for code (AST-based chunking)

Why pplx-embed-context-v1

Uses late chunking: all chunks from a chapter go through a single forwardpass, so each chunk embedding captures full document context without needing adoc-prefix at inference time. Scores 81.96 nDCG@10 on ConTEB.

Quick start

# 1. install
pip install -e .

# 2. ingest your library (runs embedding, then exits)
HF_HUB_OFFLINE=0 python server.py --index ./library

# 3. start the MCP server
python server.py

# optional: preload the model at startup
python server.py --preload

MCP tools

Tool Description
search(query, top_k) Semantic search, returns top-k passages with scores
search_groups(query, group_by, limit, group_size) Search grouped by "book" or "chapter" — best for cross-volume research
get_passage(book, chapter, max_chars) Retrieve full text for a book/chapter
list_books() List all indexed books with chapter counts
collection_stats() Qdrant collection info (point count, vector size)
library_stats() Full content breakdown: books, chapters, chunks per book, avg chunk length
get_device() Show which device (CPU/GPU) is used for embeddings

Claude Desktop config

{
  "mcpServers": {
    "little-librarian": {
      "command": "python",
      "args": ["/path/to/server.py"]
    }
  }
}

Hardware guidance

Setup Min VRAM
CPU only 0 GB
GPU (pplx-embed-0.6b) ~2 GB

MCP Server · Populars

MCP Server · New