LFS MCP Documentation Assistant

This project is a local MCP server for following Linux From Scratch (LFS) documentation step by step. It imports HTML documentation into a local SQLite database, indexes it with SQLite FTS5, and exposes MCP tools that keep users on the earliest incomplete checklist item unless they explicitly ask to look ahead.

The server is documentation-only. It never executes shell commands, never runs build steps, and does not perform destructive operations such as chroot, mount, partitioning, package compilation, or filesystem modification. Commands from the LFS book are returned as text only.

Quickstart

This creates a local virtual environment, installs the project, initializes a demo SQLite database with fixture docs, checks the environment, shows the current LFS step, and starts either the web dashboard or the MCP server.

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -e ".[dev]"
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" init
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" doctor
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" current
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" web
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" server

The project is local and single-user by default. Progress is stored in the selected SQLite database file, so multiple users should use separate database files unless profile or user isolation is added later.

What MCP means here

MCP (Model Context Protocol) lets an AI client call local tools exposed by this server. In this project, the tools provide safe access to imported LFS documentation: list versions, select a version, get the current step, mark documentation sections completed, and search the local docs.

Architecture

The pipeline is intentionally simple for a college assignment and local demo:

HTML docs are fetched from an LFS URL or read from a local fixture directory.
The parser follows table-of-contents links under the provided documentation base URL, preserves section order, extracts titles, chapters, clean text, markdown-like content, command blocks, package names, and source URLs.
SQLite stores documentation versions, sections, progress, and settings.
SQLite FTS5 indexes title/content/package fields for local full-text search.
MCP tools expose sequential guidance and search.

SQLite FTS5 is used instead of a vector database in the first version because it is local, deterministic, easy to test offline, and has no hosted service dependency. The schema keeps metadata_json columns and stable (version_id, section_id) identities so embeddings or vector search can be added later without replacing the storage model.

Install

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -e ".[dev]"

On systems where python points to Python 3, you may use python; otherwise use python3.

The default database path is:

~/.local/share/lfs-mcp/lfs_docs.db

You can override it with --db /path/to/lfs.db or the LFS_MCP_DB environment variable.

Import the initial LFS docs

python3 -m lfs_mcp import \
  --url https://www.linuxfromscratch.org/lfs/view/13.0-systemd-rc1/ \
  --version-id 13.0-systemd-rc1 \
  --display-name "Linux From Scratch 13.0 systemd rc1"

The importer also supports local fixture directories for offline tests and demos:

python3 -m lfs_mcp import \
  --url tests/fixtures/lfs_sample_v1 \
  --version-id sample-v1-systemd \
  --display-name "Sample LFS v1 systemd"

Use --force only when you intentionally want to overwrite an already imported version.

The importer uses a clear local importer User-Agent for HTTP requests, applies a request timeout, and ignores table-of-contents links that point outside the provided documentation base URL or fixture directory.

Initialize and check a local demo

The init command creates or opens the configured SQLite database, verifies SQLite and FTS5 support, imports fixture docs by default when no source URL is provided, sets the imported version active, and prints the next useful commands.

python3 -m lfs_mcp --db "$PWD/demo-lfs.db" init

Defaults:

fixture: tests/fixtures/lfs_sample_v1
version id: sample-v1-systemd
display name: Sample LFS v1 systemd

You can initialize from online LFS docs instead:

python3 -m lfs_mcp --db "$PWD/lfs_docs.db" init \
  --source-url https://www.linuxfromscratch.org/lfs/view/13.0-systemd-rc1/ \
  --version-id 13.0-systemd-rc1 \
  --display-name "Linux From Scratch 13.0 systemd rc1"

Use --force to overwrite an existing imported version intentionally.

The doctor command prints human-readable diagnostics for Python, SQLite, FTS5, the resolved database path, required tables, imported versions, active version, current step loading, and MCP server importability.

python3 -m lfs_mcp --db "$PWD/demo-lfs.db" doctor

Run the MCP server

python3 -m lfs_mcp server

With a custom database:

python3 -m lfs_mcp --db ./demo-lfs.db server

Run the local web dashboard

The web command starts a local FastAPI server with a static HTML/CSS/JavaScript dashboard. It uses the same SQLite database and service logic as the CLI and MCP tools.

python3 -m lfs_mcp --db "$PWD/demo-lfs.db" web

Default local URL:

http://127.0.0.1:8787

You can choose a different host or port:

python3 -m lfs_mcp --db "$PWD/demo-lfs.db" web --host 127.0.0.1 --port 8787

The web dashboard includes:

light, chat-first workspace with database, active version, provider status, and Settings link in the top header
compact left column for current progress, checklist, search, and saved sections
primary center chat workspace for current-step or selected-section questions
wide right-side referenced documentation panel for local source/provenance
next-step preview
ordered checklist
SQLite FTS5 search
search result progress labels
refresh, mark-current-complete, and reset-progress buttons
separate AI provider settings modal for bring-your-own-key chat
minimal contextual Gemini, Vercel AI Gateway, or Local mock chat that uses current-step or selected search-result context
referenced documentation panel showing the local LFS section used as chat context
local section notes, bookmarks, needs-review flags, blocked flags, and a saved-sections panel

Phase 3 AI settings include:

provider/model/base URL configuration
supported providers: OpenAI, OpenRouter, Groq, Google Gemini, Vercel AI Gateway, Local mock, and Ollama
optional API key entry for providers that require one
Ollama default base URL: http://127.0.0.1:11434
Gemini default base URL: https://generativelanguage.googleapis.com/v1beta
Gemini recommended default chat model: gemini-2.0-flash-lite
Vercel AI Gateway default base URL: https://ai-gateway.vercel.sh/v1
Vercel AI Gateway default model: openai/gpt-5.4
Local mock default model: local-mock
lightweight configuration testing
local backend storage status and warnings

Phase 4 chat includes:

a local /api/chat endpoint
Gemini chat support using the configured provider/model/base URL
Vercel AI Gateway chat support using OpenAI-compatible chat completions
Local mock chat support for development and UI testing without an API key
x-goog-api-key header-based Gemini authentication
Authorization: Bearer <key> Vercel AI Gateway authentication from the backend only
current step and next-step preview context by default
structured referenced_sections metadata so the UI can show local documentation without parsing assistant prose
concise safety instructions that tell the model commands are documentation/manual commands only

Phase 5 search-to-chat context includes:

an Ask about this section action on search results
POST /api/chat support for section_ids, with selected sections loaded from the active local LFS version
selected-section excerpts in the provider prompt, not full imported documentation
referenced_sections entries labeled as selected_section so the Referenced documentation panel shows the exact local source used by the answer
duplicate section de-duplication and a small selected-section limit to keep prompts scoped

Phase 6 local section notes include:

a section_notes SQLite table keyed by (version_id, section_id)
current-step note editing with Bookmark, Needs review, and Blocked flags
local note metadata on section fetches, current-step responses, search results, and chat referenced_sections
a Saved sections panel with filters for bookmarked, needs-review, and blocked sections
note persistence across progress resets and same-version re-imports when section ids remain stable

Phase 4 intentionally does not include:

autonomous agent tool-calling
chat support for OpenAI, OpenRouter, Groq, or Ollama yet
sending the full LFS documentation database to AI providers
LFS shell command execution
multi-user profile isolation

The web dashboard remains local-first and single-user by default. Progress is stored in the selected SQLite database file. Multiple users should use separate database files unless profile or user isolation is added later.

Section notes are local-only personal data stored in the selected SQLite database. Notes are attached to a documentation section by version_id and section_id; they do not modify imported documentation content. Resetting progress does not remove notes, and re-importing the same version with the same section ids preserves them.

AI API keys are user-provided and handled by the local backend. OpenAI, OpenRouter, Groq, Google Gemini, and Vercel AI Gateway require an API key. Gemini keys are sent to the lightweight model-list validation endpoint with the x-goog-api-key HTTP header rather than a query string. Vercel AI Gateway keys are sent from the backend with an Authorization: Bearer header. Raw keys are never returned by API responses, never printed by doctor, and must not be hardcoded into frontend code.

Key storage prefers Python keyring when it is available and usable. If keyring is unavailable, the app falls back to an explicitly marked local plaintext value in the selected SQLite database settings table. The web settings panel and doctor command show a warning when this plaintext fallback is in use. Protect the database file accordingly.

Gemini model listing and Gemini text generation are separate provider capabilities. GET https://generativelanguage.googleapis.com/v1beta/models can succeed while generateContent still fails. If /api/chat returns HTTP 429 with RESOURCE_EXHAUSTED or a quota limit of 0, the configured key/model/endpoint may be valid but the Google project/account has no usable generateContent quota. Check Google AI Studio quota/billing, wait for quota to reset, choose another available model, or use the Local mock provider for development.

The Local mock provider is for local development and UI testing only. It requires no API key, makes no network calls, and returns deterministic local responses built from the same current-step context that /api/chat sends to external providers. It echoes the user message safely, includes active version/current step/next step metadata when available, and clearly labels itself as a local mock response. It is not a real AI answer.

Vercel AI Gateway uses the OpenAI-compatible endpoint at https://ai-gateway.vercel.sh/v1. The default model is openai/gpt-5.4, and user-entered model names from the Vercel AI Gateway model list are preserved. See https://vercel.com/ai-gateway/models for available model identifiers. Connection testing uses the lightweight /models endpoint when available; chat uses {base_url}/chat/completions.

To open the settings UI:

Start the web dashboard with python3 -m lfs_mcp --db "$PWD/demo-lfs.db" web.
Open http://127.0.0.1:8787.
Use the Settings button in the top header.
Choose a provider, model, and base URL.
Enter an API key only for providers that require one.
Save settings or run the lightweight connection test.

To use the chat UI:

Start the web dashboard with python3 -m lfs_mcp --db "$PWD/demo-lfs.db" web.
Open http://127.0.0.1:8787.
Configure Vercel AI Gateway or Gemini in Settings with a user-provided API key, or choose Local mock for no-key local testing.
Use the center chat workspace to ask about the current step, or run a search in the left column and choose Ask about this section on a result.

Chat sends only a small documentation context package to external providers: active version, current step metadata/content excerpt, selected search-result section excerpts, command blocks as documentation text, and next-step preview when current-step context is requested. It does not send the full SQLite database, all imported sections, or local personal notes. The app never executes LFS commands.

After a chat response, the right-side Referenced documentation panel shows the local source context. This panel is populated from backend-provided referenced_sections metadata, not by parsing the assistant's natural-language response. It displays the local imported LFS section used as current-step or selected-section context and may show local note flags for the section. If a section payload is truncated for the UI, the dashboard can fetch the full local section through GET /api/docs/sections/{section_id}?version_id=....

The current test behavior is intentionally limited: Ollama checks the local /api/tags endpoint when reachable; Gemini and Vercel AI Gateway check configured model-list endpoints; Local mock returns success immediately without network; other API-key providers validate local key presence/format only. Automated provider and chat tests use mocked HTTP behavior and do not require real Gemini or Vercel keys or real network access.

Web API endpoints

The local dashboard exposes these HTTP endpoints:

Method	Path	Purpose
`GET`	`/api/health`	Report SQLite, FTS5, database path, active version, and safety flags.
`GET`	`/api/versions`	List imported LFS documentation versions.
`GET`	`/api/current-step`	Return the first incomplete section for the active version.
`GET`	`/api/checklist`	Return the active version checklist with progress status.
`GET`	`/api/search?q=binutils`	Search the active version with SQLite FTS5.
`GET`	`/api/docs/sections/{section_id}`	Fetch one local documentation section, optionally scoped with `?version_id=...`.
`GET`	`/api/section-notes/{section_id}`	Fetch the local note and flags for one section, optionally scoped with `?version_id=...`.
`PATCH`	`/api/section-notes/{section_id}`	Upsert a local section note using `note_text`, `bookmarked`, `needs_review`, `blocked`, and optional `version_id`.
`GET`	`/api/section-notes`	List saved local section notes for the active version, with optional `bookmarked`, `needs_review`, and `blocked` filters.
`POST`	`/api/complete-current-step`	Mark the current step completed.
`POST`	`/api/complete-step`	Mark a requested step completed using `{"section_id": "gcc-pass1", "force": false}`.
`POST`	`/api/reset-progress`	Reset progress for the active version.
`GET`	`/api/settings`	Return local settings and future AI configuration status.
`GET`	`/api/ai-settings`	Return provider, model, base URL, key presence, storage status, and warnings without returning raw keys.
`POST`	`/api/ai-settings`	Save provider/model/base URL and optionally save a raw API key through the local backend.
`POST`	`/api/ai-settings/test`	Run lightweight local validation for the saved AI settings.
`POST`	`/api/chat`	Send a minimal contextual chat request to Gemini, Vercel AI Gateway, or Local mock. Accepts `{"message": "...", "include_current_step": true, "section_ids": ["..."]}`.

MCP client configuration example

For an MCP-compatible client that accepts a JSON server configuration, use a stdio command similar to:

{
  "mcpServers": {
    "lfs-docs": {
      "command": "python3",
      "args": ["-m", "lfs_mcp", "server"],
      "env": {
        "LFS_MCP_DB": "/absolute/path/to/lfs_docs.db"
      }
    }
  }
}

Practical Linux client options

Claude Desktop is not required for this project, and Claude Desktop is not currently the practical Linux path. The options below work with Linux-friendly MCP clients or with the built-in CLI fallback. Before using any MCP client, install the project and import at least one documentation version:

python3 -m pip install -e ".[dev]"
python3 -m lfs_mcp --db "$PWD/demo-lfs.db" import \
  --url tests/fixtures/lfs_sample_v1 \
  --version-id sample-v1-systemd \
  --display-name "Sample LFS v1 systemd" \
  --force

For real LFS docs, replace the fixture import with:

python3 -m lfs_mcp --db "$PWD/lfs_docs.db" import \
  --url https://www.linuxfromscratch.org/lfs/view/13.0-systemd-rc1/ \
  --version-id 13.0-systemd-rc1 \
  --display-name "Linux From Scratch 13.0 systemd rc1"

Use an absolute database path in GUI client configs because those clients may start from a different working directory.

MCP Inspector

MCP Inspector is the easiest Linux-friendly way to verify that this server exposes tools correctly. It is a local developer UI; it starts this server as a child process over stdio.

Run directly from the repository:

npx @modelcontextprotocol/inspector \
  -e LFS_MCP_DB="$PWD/demo-lfs.db" \
  -- python3 -m lfs_mcp server

Alternative config-file form, saved anywhere such as ./mcp-inspector-lfs.json:

{
  "mcpServers": {
    "lfs-docs": {
      "command": "python3",
      "args": ["-m", "lfs_mcp", "server"],
      "env": {
        "LFS_MCP_DB": "/absolute/path/to/lfs-mcp/demo-lfs.db"
      }
    }
  }
}

Start it with:

npx @modelcontextprotocol/inspector --config ./mcp-inspector-lfs.json --server lfs-docs

Open the printed Inspector URL, such as http://localhost:6274/?MCP_PROXY_AUTH_TOKEN=generated-token, including the generated token if shown. To verify tools are visible, connect to the server and open the Tools view. You should see tools such as get_current_step, search_lfs_docs, and mark_step_completed.

Example tool call in Inspector:

{
  "tool": "get_current_step",
  "arguments": {}
}

Cursor

Cursor supports local MCP servers through mcp.json.

Config locations on Linux:

Project-local: <project-root>/.cursor/mcp.json
User-global: ~/.cursor/mcp.json

Example config:

{
  "mcpServers": {
    "lfs-docs": {
      "command": "python3",
      "args": ["-m", "lfs_mcp", "server"],
      "env": {
        "LFS_MCP_DB": "/absolute/path/to/lfs-mcp/demo-lfs.db"
      }
    }
  }
}

Cursor starts the stdio server from this config when the MCP server is enabled. To verify tools are visible, open Cursor settings or the MCP section, confirm lfs-docs is enabled, and check the available tool list in Agent/chat mode.

Example prompt:

Use the lfs-docs MCP server and call get_current_step. What is my current Linux From Scratch step?

VS Code GitHub Copilot Chat

VS Code with GitHub Copilot Chat supports MCP servers in Agent mode. This requires a Copilot-enabled VS Code setup with MCP support enabled by your account or organization policy.

Config locations on Linux:

Workspace: <project-root>/.vscode/mcp.json
User profile: run MCP: Open User Configuration from the Command Palette

Example .vscode/mcp.json:

{
  "servers": {
    "lfs-docs": {
      "type": "stdio",
      "command": "python3",
      "args": ["-m", "lfs_mcp", "server"],
      "env": {
        "LFS_MCP_DB": "/absolute/path/to/lfs-mcp/demo-lfs.db"
      }
    }
  }
}

Start the server by opening .vscode/mcp.json and selecting the Start CodeLens above the server, or run MCP: List Servers from the Command Palette and start lfs-docs. To verify tools are visible, open Copilot Chat, switch to Agent mode, select the tools icon or Configure Tools, and confirm the LFS tools are listed.

Example prompt:

Use the lfs-docs MCP tool get_current_step and tell me only the current LFS step.

GitHub Copilot CLI, not retired `gh copilot`

The retired GitHub CLI extension gh copilot has been replaced by the newer copilot CLI. If your GitHub Copilot CLI version supports MCP, configure it like this.

Interactive setup:

copilot

Then run:

/mcp add

Choose STDIO or Local, use lfs-docs as the server name, and enter this command:

python3 -m lfs_mcp server

Set environment variables to:

{
  "LFS_MCP_DB": "/absolute/path/to/lfs-mcp/demo-lfs.db"
}

Manual config location:

~/.copilot/mcp-config.json

Example config:

{
  "mcpServers": {
    "lfs-docs": {
      "type": "local",
      "command": "python3",
      "args": ["-m", "lfs_mcp", "server"],
      "env": {
        "LFS_MCP_DB": "/absolute/path/to/lfs-mcp/demo-lfs.db"
      },
      "tools": ["*"]
    }
  }
}

To verify tools are visible inside Copilot CLI:

/mcp show
/mcp show lfs-docs

Example prompt:

Use the lfs-docs MCP server and its get_current_step tool. What should I do now in Linux From Scratch?

Fallback local CLI demo

If no AI MCP client is available, you can still demo the same behavior through the local CLI. This does not use MCP, but it exercises the same service logic and SQLite database.

Import fixture docs offline:

python3 -m lfs_mcp --db "$PWD/demo-lfs.db" import \
  --url tests/fixtures/lfs_sample_v1 \
  --version-id sample-v1-systemd \
  --display-name "Sample LFS v1 systemd" \
  --force

Run the CLI equivalent of get_current_step:

python3 -m lfs_mcp --db "$PWD/demo-lfs.db" current

Verify the JSON output contains a current_step object with the first incomplete section and a next_step_preview.

Example query:

python3 -m lfs_mcp --db "$PWD/demo-lfs.db" current

CLI examples

List imported versions:

python3 -m lfs_mcp list-versions

Example output:

[
  {
    "version_id": "13.0-systemd-rc1",
    "display_name": "Linux From Scratch 13.0 systemd rc1",
    "source_url": "https://www.linuxfromscratch.org/lfs/view/13.0-systemd-rc1/",
    "variant": "systemd",
    "imported_at": "2026-04-26T23:00:00+00:00",
    "progress_exists": 0
  }
]

Set the active version:

python3 -m lfs_mcp set-active --version-id 13.0-systemd-rc1

Get the current step:

python3 -m lfs_mcp current

Mark a step completed:

python3 -m lfs_mcp complete introduction

Search the active version:

python3 -m lfs_mcp search "binutils pass 1"

Search across all imported versions:

python3 -m lfs_mcp search "systemd" --version-id all

Reset progress for the active version:

python3 -m lfs_mcp reset-progress

Reset progress for all versions:

python3 -m lfs_mcp reset-progress --all-versions

MCP tools

The server exposes these tools:

Tool	Purpose
`list_lfs_versions()`	Return imported versions and whether progress exists.
`get_active_lfs_version()`	Return the selected/default version or a clear setup message.
`set_active_lfs_version(version_id)`	Validate and select an imported version.
`import_lfs_docs(source_url, version_id, display_name=None, force=False)`	Fetch/parse/import/index docs.
`get_build_checklist(version_id=None)`	Return the ordered checklist for one version.
`get_current_step(version_id=None)`	Return only the earliest incomplete step plus a small next-step preview.
`mark_step_completed(section_id, version_id=None, force=False)`	Complete a step, preventing accidental jumps unless forced.
`get_step(section_id, version_id=None)`	Return a section and warn if it is ahead of progress.
`search_lfs_docs(query, version_id=None)`	Search with SQLite FTS5; use `version_id="all"` for all versions.
`get_package_steps(package_name, version_id=None)`	Return ordered package-related sections and warn on multiple passes.
`reset_progress(version_id=None, all_versions=False)`	Reset selected-version progress by default or all progress explicitly.

Example tool call outputs

get_current_step():

{
  "version_id": "sample-v1-systemd",
  "current_step": {
    "order": 1,
    "section_id": "introduction",
    "title": "1.1 Introduction",
    "chapter": "Chapter 1",
    "source_url": "file:///home/user/lfs-mcp/tests/fixtures/lfs_sample_v1/chapter01/introduction.html",
    "summary": "Start at the beginning. Read the LFS systemd book overview before preparing tools.",
    "content": "# 1.1 Introduction\n\nStart at the beginning. Read the LFS systemd book overview before preparing tools.",
    "command_blocks": []
  },
  "next_step_preview": {
    "order": 2,
    "section_id": "prepare",
    "chapter": "Chapter 2",
    "title": "2.1 Preparing the Host",
    "status": "pending",
    "preview": "Check host requirements and create a safe learning plan."
  }
}

mark_step_completed("gcc-pass1") before earlier steps are complete:

{
  "completed": false,
  "warning": "This section is ahead of earlier pending checklist items. It was not marked completed.",
  "earlier_pending_sections": [
    {"section_id": "introduction", "order": 1, "status": "pending"},
    {"section_id": "prepare", "order": 2, "status": "pending"}
  ]
}

search_lfs_docs("gcc"):

{
  "query": "gcc",
  "scope": "sample-v1-systemd",
  "results": [
    {
      "version_id": "sample-v1-systemd",
      "display_name": "Sample LFS v1 systemd",
      "section_id": "gcc-pass1",
      "title": "5.3 GCC-15.2.0 - Pass 1",
      "chapter": "Chapter 5",
      "source_url": "file:///home/user/lfs-mcp/tests/fixtures/lfs_sample_v1/chapter05/gcc-pass1.html",
      "snippet": "Build the first [GCC] cross compiler pass after Binutils.",
      "relation_to_current_step": "ahead_of_current_step"
    }
  ]
}

Safety and sequential guidance

The ordered checklist is the source of truth. The current step is always the first incomplete section for the selected version. Normal flow does not recommend future steps. Explicit lookup and search are allowed, but future results are labeled ahead_of_current_step.

Progress is isolated per LFS version. The same section_id may exist in multiple versions, and completion in one version does not affect another because the logical identity is (version_id, section_id).

Tests

Tests use the bundled fixture docs and do not download the full LFS book:

python3 -m pytest

LFS MCP Documentation Assistant