LFS MCP Documentation Assistant
This project is a local MCP server for following Linux From Scratch (LFS) documentation step by step. It imports HTML documentation into a local SQLite database, indexes it with SQLite FTS5, and exposes MCP tools that keep users on the earliest incomplete checklist item unless they explicitly ask to look ahead.
The server is documentation-only. It never executes shell commands, never runs build steps, and does not perform destructive operations such as chroot, mount, partitioning, package compilation, or filesystem modification. Commands from the LFS book are returned as text only.
Quickstart
This creates a local virtual environment, installs the project, initializes a demo SQLite database with fixture docs, checks the environment, shows the current LFS step, and starts either the web dashboard or the MCP server.
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -e ".[dev]"
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" init
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" doctor
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" current
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" web
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" server
The project is local and single-user by default. Progress is stored in the selected SQLite database file, so multiple users should use separate database files unless profile or user isolation is added later.
What MCP means here
MCP (Model Context Protocol) lets an AI client call local tools exposed by this server. In this project, the tools provide safe access to imported LFS documentation: list versions, select a version, get the current step, mark documentation sections completed, and search the local docs.
Architecture
The pipeline is intentionally simple for a college assignment and local demo:
- HTML docs are fetched from an LFS URL or read from a local fixture directory.
- The parser follows table-of-contents links under the provided documentation base URL, preserves section order, extracts titles, chapters, clean text, markdown-like content, command blocks, package names, and source URLs.
- SQLite stores documentation versions, sections, progress, and settings.
- SQLite FTS5 indexes title/content/package fields for local full-text search.
- MCP tools expose sequential guidance and search.
SQLite FTS5 is used instead of a vector database in the first version because it is local, deterministic, easy to test offline, and has no hosted service dependency. The schema keeps metadata_json columns and stable (version_id, section_id) identities so embeddings or vector search can be added later without replacing the storage model.
Install
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -e ".[dev]"
On systems where python points to Python 3, you may use python; otherwise use python3.
The default database path is:
~/.local/share/lfs-mcp/lfs_docs.db
You can override it with --db /path/to/lfs.db or the LFS_MCP_DB environment variable.
Import the initial LFS docs
python3 -m lfs_mcp import \
--url https://www.linuxfromscratch.org/lfs/view/13.0-systemd-rc1/ \
--version-id 13.0-systemd-rc1 \
--display-name "Linux From Scratch 13.0 systemd rc1"
The importer also supports local fixture directories for offline tests and demos:
python3 -m lfs_mcp import \
--url tests/fixtures/lfs_sample_v1 \
--version-id sample-v1-systemd \
--display-name "Sample LFS v1 systemd"
Use --force only when you intentionally want to overwrite an already imported version.
The importer uses a clear local importer User-Agent for HTTP requests, applies a request timeout, and ignores table-of-contents links that point outside the provided documentation base URL or fixture directory.
Initialize and check a local demo
The init command creates or opens the configured SQLite database, verifies SQLite and FTS5 support, imports fixture docs by default when no source URL is provided, sets the imported version active, and prints the next useful commands.
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" init
Defaults:
fixture: tests/fixtures/lfs_sample_v1
version id: sample-v1-systemd
display name: Sample LFS v1 systemd
You can initialize from online LFS docs instead:
python3 -m lfs_mcp --db "$PWD/lfs_docs.db" init \
--source-url https://www.linuxfromscratch.org/lfs/view/13.0-systemd-rc1/ \
--version-id 13.0-systemd-rc1 \
--display-name "Linux From Scratch 13.0 systemd rc1"
Use --force to overwrite an existing imported version intentionally.
The doctor command prints human-readable diagnostics for Python, SQLite, FTS5, the resolved database path, required tables, imported versions, active version, current step loading, and MCP server importability.
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" doctor
Run the MCP server
python3 -m lfs_mcp server
With a custom database:
python3 -m lfs_mcp --db ./demo-lfs.db server
Run the local web dashboard
The web command starts a local FastAPI server with a static HTML/CSS/JavaScript dashboard. It uses the same SQLite database and service logic as the CLI and MCP tools.
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" web
Default local URL:
http://127.0.0.1:8787
You can choose a different host or port:
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" web --host 127.0.0.1 --port 8787
The web dashboard includes:
- light, chat-first workspace with database, active version, provider status, and Settings link in the top header
- compact left column for current progress, checklist, search, and saved sections
- primary center chat workspace for current-step or selected-section questions
- wide right-side referenced documentation panel for local source/provenance
- next-step preview
- ordered checklist
- SQLite FTS5 search
- search result progress labels
- refresh, mark-current-complete, and reset-progress buttons
- separate AI provider settings modal for bring-your-own-key chat
- minimal contextual Gemini, Vercel AI Gateway, or Local mock chat that uses current-step or selected search-result context
- referenced documentation panel showing the local LFS section used as chat context
- local section notes, bookmarks, needs-review flags, blocked flags, and a saved-sections panel
Phase 3 AI settings include:
- provider/model/base URL configuration
- supported providers: OpenAI, OpenRouter, Groq, Google Gemini, Vercel AI Gateway, Local mock, and Ollama
- optional API key entry for providers that require one
- Ollama default base URL:
http://127.0.0.1:11434 - Gemini default base URL:
https://generativelanguage.googleapis.com/v1beta - Gemini recommended default chat model:
gemini-2.0-flash-lite - Vercel AI Gateway default base URL:
https://ai-gateway.vercel.sh/v1 - Vercel AI Gateway default model:
openai/gpt-5.4 - Local mock default model:
local-mock - lightweight configuration testing
- local backend storage status and warnings
Phase 4 chat includes:
- a local
/api/chatendpoint - Gemini chat support using the configured provider/model/base URL
- Vercel AI Gateway chat support using OpenAI-compatible chat completions
- Local mock chat support for development and UI testing without an API key
x-goog-api-keyheader-based Gemini authenticationAuthorization: Bearer <key>Vercel AI Gateway authentication from the backend only- current step and next-step preview context by default
- structured
referenced_sectionsmetadata so the UI can show local documentation without parsing assistant prose - concise safety instructions that tell the model commands are documentation/manual commands only
Phase 5 search-to-chat context includes:
- an Ask about this section action on search results
POST /api/chatsupport forsection_ids, with selected sections loaded from the active local LFS version- selected-section excerpts in the provider prompt, not full imported documentation
referenced_sectionsentries labeled asselected_sectionso the Referenced documentation panel shows the exact local source used by the answer- duplicate section de-duplication and a small selected-section limit to keep prompts scoped
Phase 6 local section notes include:
- a
section_notesSQLite table keyed by(version_id, section_id) - current-step note editing with Bookmark, Needs review, and Blocked flags
- local note metadata on section fetches, current-step responses, search results, and chat
referenced_sections - a Saved sections panel with filters for bookmarked, needs-review, and blocked sections
- note persistence across progress resets and same-version re-imports when section ids remain stable
Phase 4 intentionally does not include:
- autonomous agent tool-calling
- chat support for OpenAI, OpenRouter, Groq, or Ollama yet
- sending the full LFS documentation database to AI providers
- LFS shell command execution
- multi-user profile isolation
The web dashboard remains local-first and single-user by default. Progress is stored in the selected SQLite database file. Multiple users should use separate database files unless profile or user isolation is added later.
Section notes are local-only personal data stored in the selected SQLite database. Notes are attached to a documentation section by version_id and section_id; they do not modify imported documentation content. Resetting progress does not remove notes, and re-importing the same version with the same section ids preserves them.
AI API keys are user-provided and handled by the local backend. OpenAI, OpenRouter, Groq, Google Gemini, and Vercel AI Gateway require an API key. Gemini keys are sent to the lightweight model-list validation endpoint with the x-goog-api-key HTTP header rather than a query string. Vercel AI Gateway keys are sent from the backend with an Authorization: Bearer header. Raw keys are never returned by API responses, never printed by doctor, and must not be hardcoded into frontend code.
Key storage prefers Python keyring when it is available and usable. If keyring is unavailable, the app falls back to an explicitly marked local plaintext value in the selected SQLite database settings table. The web settings panel and doctor command show a warning when this plaintext fallback is in use. Protect the database file accordingly.
Gemini model listing and Gemini text generation are separate provider capabilities. GET https://generativelanguage.googleapis.com/v1beta/models can succeed while generateContent still fails. If /api/chat returns HTTP 429 with RESOURCE_EXHAUSTED or a quota limit of 0, the configured key/model/endpoint may be valid but the Google project/account has no usable generateContent quota. Check Google AI Studio quota/billing, wait for quota to reset, choose another available model, or use the Local mock provider for development.
The Local mock provider is for local development and UI testing only. It requires no API key, makes no network calls, and returns deterministic local responses built from the same current-step context that /api/chat sends to external providers. It echoes the user message safely, includes active version/current step/next step metadata when available, and clearly labels itself as a local mock response. It is not a real AI answer.
Vercel AI Gateway uses the OpenAI-compatible endpoint at https://ai-gateway.vercel.sh/v1. The default model is openai/gpt-5.4, and user-entered model names from the Vercel AI Gateway model list are preserved. See https://vercel.com/ai-gateway/models for available model identifiers. Connection testing uses the lightweight /models endpoint when available; chat uses {base_url}/chat/completions.
To open the settings UI:
- Start the web dashboard with
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" web. - Open
http://127.0.0.1:8787. - Use the Settings button in the top header.
- Choose a provider, model, and base URL.
- Enter an API key only for providers that require one.
- Save settings or run the lightweight connection test.
To use the chat UI:
- Start the web dashboard with
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" web. - Open
http://127.0.0.1:8787. - Configure Vercel AI Gateway or Gemini in Settings with a user-provided API key, or choose Local mock for no-key local testing.
- Use the center chat workspace to ask about the current step, or run a search in the left column and choose Ask about this section on a result.
Chat sends only a small documentation context package to external providers: active version, current step metadata/content excerpt, selected search-result section excerpts, command blocks as documentation text, and next-step preview when current-step context is requested. It does not send the full SQLite database, all imported sections, or local personal notes. The app never executes LFS commands.
After a chat response, the right-side Referenced documentation panel shows the local source context. This panel is populated from backend-provided referenced_sections metadata, not by parsing the assistant's natural-language response. It displays the local imported LFS section used as current-step or selected-section context and may show local note flags for the section. If a section payload is truncated for the UI, the dashboard can fetch the full local section through GET /api/docs/sections/{section_id}?version_id=....
The current test behavior is intentionally limited: Ollama checks the local /api/tags endpoint when reachable; Gemini and Vercel AI Gateway check configured model-list endpoints; Local mock returns success immediately without network; other API-key providers validate local key presence/format only. Automated provider and chat tests use mocked HTTP behavior and do not require real Gemini or Vercel keys or real network access.
Web API endpoints
The local dashboard exposes these HTTP endpoints:
| Method | Path | Purpose |
|---|---|---|
GET |
/api/health |
Report SQLite, FTS5, database path, active version, and safety flags. |
GET |
/api/versions |
List imported LFS documentation versions. |
GET |
/api/current-step |
Return the first incomplete section for the active version. |
GET |
/api/checklist |
Return the active version checklist with progress status. |
GET |
/api/search?q=binutils |
Search the active version with SQLite FTS5. |
GET |
/api/docs/sections/{section_id} |
Fetch one local documentation section, optionally scoped with ?version_id=.... |
GET |
/api/section-notes/{section_id} |
Fetch the local note and flags for one section, optionally scoped with ?version_id=.... |
PATCH |
/api/section-notes/{section_id} |
Upsert a local section note using note_text, bookmarked, needs_review, blocked, and optional version_id. |
GET |
/api/section-notes |
List saved local section notes for the active version, with optional bookmarked, needs_review, and blocked filters. |
POST |
/api/complete-current-step |
Mark the current step completed. |
POST |
/api/complete-step |
Mark a requested step completed using {"section_id": "gcc-pass1", "force": false}. |
POST |
/api/reset-progress |
Reset progress for the active version. |
GET |
/api/settings |
Return local settings and future AI configuration status. |
GET |
/api/ai-settings |
Return provider, model, base URL, key presence, storage status, and warnings without returning raw keys. |
POST |
/api/ai-settings |
Save provider/model/base URL and optionally save a raw API key through the local backend. |
POST |
/api/ai-settings/test |
Run lightweight local validation for the saved AI settings. |
POST |
/api/chat |
Send a minimal contextual chat request to Gemini, Vercel AI Gateway, or Local mock. Accepts {"message": "...", "include_current_step": true, "section_ids": ["..."]}. |
MCP client configuration example
For an MCP-compatible client that accepts a JSON server configuration, use a stdio command similar to:
{
"mcpServers": {
"lfs-docs": {
"command": "python3",
"args": ["-m", "lfs_mcp", "server"],
"env": {
"LFS_MCP_DB": "/absolute/path/to/lfs_docs.db"
}
}
}
}
Practical Linux client options
Claude Desktop is not required for this project, and Claude Desktop is not currently the practical Linux path. The options below work with Linux-friendly MCP clients or with the built-in CLI fallback. Before using any MCP client, install the project and import at least one documentation version:
python3 -m pip install -e ".[dev]"
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" import \
--url tests/fixtures/lfs_sample_v1 \
--version-id sample-v1-systemd \
--display-name "Sample LFS v1 systemd" \
--force
For real LFS docs, replace the fixture import with:
python3 -m lfs_mcp --db "$PWD/lfs_docs.db" import \
--url https://www.linuxfromscratch.org/lfs/view/13.0-systemd-rc1/ \
--version-id 13.0-systemd-rc1 \
--display-name "Linux From Scratch 13.0 systemd rc1"
Use an absolute database path in GUI client configs because those clients may start from a different working directory.
MCP Inspector
MCP Inspector is the easiest Linux-friendly way to verify that this server exposes tools correctly. It is a local developer UI; it starts this server as a child process over stdio.
Run directly from the repository:
npx @modelcontextprotocol/inspector \
-e LFS_MCP_DB="$PWD/demo-lfs.db" \
-- python3 -m lfs_mcp server
Alternative config-file form, saved anywhere such as ./mcp-inspector-lfs.json:
{
"mcpServers": {
"lfs-docs": {
"command": "python3",
"args": ["-m", "lfs_mcp", "server"],
"env": {
"LFS_MCP_DB": "/absolute/path/to/lfs-mcp/demo-lfs.db"
}
}
}
}
Start it with:
npx @modelcontextprotocol/inspector --config ./mcp-inspector-lfs.json --server lfs-docs
Open the printed Inspector URL, such as http://localhost:6274/?MCP_PROXY_AUTH_TOKEN=generated-token, including the generated token if shown. To verify tools are visible, connect to the server and open the Tools view. You should see tools such as get_current_step, search_lfs_docs, and mark_step_completed.
Example tool call in Inspector:
{
"tool": "get_current_step",
"arguments": {}
}
Cursor
Cursor supports local MCP servers through mcp.json.
Config locations on Linux:
- Project-local:
<project-root>/.cursor/mcp.json - User-global:
~/.cursor/mcp.json
Example config:
{
"mcpServers": {
"lfs-docs": {
"command": "python3",
"args": ["-m", "lfs_mcp", "server"],
"env": {
"LFS_MCP_DB": "/absolute/path/to/lfs-mcp/demo-lfs.db"
}
}
}
}
Cursor starts the stdio server from this config when the MCP server is enabled. To verify tools are visible, open Cursor settings or the MCP section, confirm lfs-docs is enabled, and check the available tool list in Agent/chat mode.
Example prompt:
Use the lfs-docs MCP server and call get_current_step. What is my current Linux From Scratch step?
VS Code GitHub Copilot Chat
VS Code with GitHub Copilot Chat supports MCP servers in Agent mode. This requires a Copilot-enabled VS Code setup with MCP support enabled by your account or organization policy.
Config locations on Linux:
- Workspace:
<project-root>/.vscode/mcp.json - User profile: run MCP: Open User Configuration from the Command Palette
Example .vscode/mcp.json:
{
"servers": {
"lfs-docs": {
"type": "stdio",
"command": "python3",
"args": ["-m", "lfs_mcp", "server"],
"env": {
"LFS_MCP_DB": "/absolute/path/to/lfs-mcp/demo-lfs.db"
}
}
}
}
Start the server by opening .vscode/mcp.json and selecting the Start CodeLens above the server, or run MCP: List Servers from the Command Palette and start lfs-docs. To verify tools are visible, open Copilot Chat, switch to Agent mode, select the tools icon or Configure Tools, and confirm the LFS tools are listed.
Example prompt:
Use the lfs-docs MCP tool get_current_step and tell me only the current LFS step.
GitHub Copilot CLI, not retired gh copilot
The retired GitHub CLI extension gh copilot has been replaced by the newer copilot CLI. If your GitHub Copilot CLI version supports MCP, configure it like this.
Interactive setup:
copilot
Then run:
/mcp add
Choose STDIO or Local, use lfs-docs as the server name, and enter this command:
python3 -m lfs_mcp server
Set environment variables to:
{
"LFS_MCP_DB": "/absolute/path/to/lfs-mcp/demo-lfs.db"
}
Manual config location:
~/.copilot/mcp-config.json
Example config:
{
"mcpServers": {
"lfs-docs": {
"type": "local",
"command": "python3",
"args": ["-m", "lfs_mcp", "server"],
"env": {
"LFS_MCP_DB": "/absolute/path/to/lfs-mcp/demo-lfs.db"
},
"tools": ["*"]
}
}
}
To verify tools are visible inside Copilot CLI:
/mcp show
/mcp show lfs-docs
Example prompt:
Use the lfs-docs MCP server and its get_current_step tool. What should I do now in Linux From Scratch?
Fallback local CLI demo
If no AI MCP client is available, you can still demo the same behavior through the local CLI. This does not use MCP, but it exercises the same service logic and SQLite database.
Import fixture docs offline:
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" import \
--url tests/fixtures/lfs_sample_v1 \
--version-id sample-v1-systemd \
--display-name "Sample LFS v1 systemd" \
--force
Run the CLI equivalent of get_current_step:
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" current
Verify the JSON output contains a current_step object with the first incomplete section and a next_step_preview.
Example query:
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" current
CLI examples
List imported versions:
python3 -m lfs_mcp list-versions
Example output:
[
{
"version_id": "13.0-systemd-rc1",
"display_name": "Linux From Scratch 13.0 systemd rc1",
"source_url": "https://www.linuxfromscratch.org/lfs/view/13.0-systemd-rc1/",
"variant": "systemd",
"imported_at": "2026-04-26T23:00:00+00:00",
"progress_exists": 0
}
]
Set the active version:
python3 -m lfs_mcp set-active --version-id 13.0-systemd-rc1
Get the current step:
python3 -m lfs_mcp current
Mark a step completed:
python3 -m lfs_mcp complete introduction
Search the active version:
python3 -m lfs_mcp search "binutils pass 1"
Search across all imported versions:
python3 -m lfs_mcp search "systemd" --version-id all
Reset progress for the active version:
python3 -m lfs_mcp reset-progress
Reset progress for all versions:
python3 -m lfs_mcp reset-progress --all-versions
MCP tools
The server exposes these tools:
| Tool | Purpose |
|---|---|
list_lfs_versions() |
Return imported versions and whether progress exists. |
get_active_lfs_version() |
Return the selected/default version or a clear setup message. |
set_active_lfs_version(version_id) |
Validate and select an imported version. |
import_lfs_docs(source_url, version_id, display_name=None, force=False) |
Fetch/parse/import/index docs. |
get_build_checklist(version_id=None) |
Return the ordered checklist for one version. |
get_current_step(version_id=None) |
Return only the earliest incomplete step plus a small next-step preview. |
mark_step_completed(section_id, version_id=None, force=False) |
Complete a step, preventing accidental jumps unless forced. |
get_step(section_id, version_id=None) |
Return a section and warn if it is ahead of progress. |
search_lfs_docs(query, version_id=None) |
Search with SQLite FTS5; use version_id="all" for all versions. |
get_package_steps(package_name, version_id=None) |
Return ordered package-related sections and warn on multiple passes. |
reset_progress(version_id=None, all_versions=False) |
Reset selected-version progress by default or all progress explicitly. |
Example tool call outputs
get_current_step():
{
"version_id": "sample-v1-systemd",
"current_step": {
"order": 1,
"section_id": "introduction",
"title": "1.1 Introduction",
"chapter": "Chapter 1",
"source_url": "file:///home/user/lfs-mcp/tests/fixtures/lfs_sample_v1/chapter01/introduction.html",
"summary": "Start at the beginning. Read the LFS systemd book overview before preparing tools.",
"content": "# 1.1 Introduction\n\nStart at the beginning. Read the LFS systemd book overview before preparing tools.",
"command_blocks": []
},
"next_step_preview": {
"order": 2,
"section_id": "prepare",
"chapter": "Chapter 2",
"title": "2.1 Preparing the Host",
"status": "pending",
"preview": "Check host requirements and create a safe learning plan."
}
}
mark_step_completed("gcc-pass1") before earlier steps are complete:
{
"completed": false,
"warning": "This section is ahead of earlier pending checklist items. It was not marked completed.",
"earlier_pending_sections": [
{"section_id": "introduction", "order": 1, "status": "pending"},
{"section_id": "prepare", "order": 2, "status": "pending"}
]
}
search_lfs_docs("gcc"):
{
"query": "gcc",
"scope": "sample-v1-systemd",
"results": [
{
"version_id": "sample-v1-systemd",
"display_name": "Sample LFS v1 systemd",
"section_id": "gcc-pass1",
"title": "5.3 GCC-15.2.0 - Pass 1",
"chapter": "Chapter 5",
"source_url": "file:///home/user/lfs-mcp/tests/fixtures/lfs_sample_v1/chapter05/gcc-pass1.html",
"snippet": "Build the first [GCC] cross compiler pass after Binutils.",
"relation_to_current_step": "ahead_of_current_step"
}
]
}
Safety and sequential guidance
The ordered checklist is the source of truth. The current step is always the first incomplete section for the selected version. Normal flow does not recommend future steps. Explicit lookup and search are allowed, but future results are labeled ahead_of_current_step.
Progress is isolated per LFS version. The same section_id may exist in multiple versions, and completion in one version does not affect another because the logical identity is (version_id, section_id).
Tests
Tests use the bundled fixture docs and do not download the full LFS book:
python3 -m pytest