little-librarian
A local MCP server that indexes .epub files and exposes semantic search toolsbacked by pplx-embed-context-v1 (late chunking) and Qdrant.
MCP client (Claude Desktop, Claude Code, …)
│
▼ tool calls via MCP
server.py
pplx-embed-context-v1-0.6b + Qdrant (local)
Files
| File |
Role |
server.py |
MCP server — epub ingestion, embedding, search, Qdrant storage |
code_librarian.py |
Separate MCP server for code (AST-based chunking) |
Why pplx-embed-context-v1
Uses late chunking: all chunks from a chapter go through a single forwardpass, so each chunk embedding captures full document context without needing adoc-prefix at inference time. Scores 81.96 nDCG@10 on ConTEB.
Quick start
# 1. install
pip install -e .
# 2. ingest your library (runs embedding, then exits)
HF_HUB_OFFLINE=0 python server.py --index ./library
# 3. start the MCP server
python server.py
# optional: preload the model at startup
python server.py --preload
MCP tools
| Tool |
Description |
search(query, top_k) |
Semantic search, returns top-k passages with scores |
search_groups(query, group_by, limit, group_size) |
Search grouped by "book" or "chapter" — best for cross-volume research |
get_passage(book, chapter, max_chars) |
Retrieve full text for a book/chapter |
list_books() |
List all indexed books with chapter counts |
collection_stats() |
Qdrant collection info (point count, vector size) |
library_stats() |
Full content breakdown: books, chapters, chunks per book, avg chunk length |
get_device() |
Show which device (CPU/GPU) is used for embeddings |
Claude Desktop config
{
"mcpServers": {
"little-librarian": {
"command": "python",
"args": ["/path/to/server.py"]
}
}
}
Hardware guidance
| Setup |
Min VRAM |
| CPU only |
0 GB |
| GPU (pplx-embed-0.6b) |
~2 GB |