martin-papy

QDrant Loader

Community martin-papy
Updated

Enterprise-ready vector database toolkit for building searchable knowledge bases from multiple data sources. Supports multi-project management, automatic ingestion from Confluence/JIRA/Git, intelligent file conversion (PDF/Office/images), and semantic search. Includes MCP server for seamless AI assistant integration.

QDrant Loader

PyPI - qdrant-loaderPyPI - mcp-serverPyPI - qdrant-loader-coreCodeRabbit Pull Request ReviewsTest CoverageLicense: GPL v3

๐Ÿ“ Changelog v1.0.2 - Latest improvements and bug fixes

A comprehensive toolkit for loading data into Qdrant vector database with advanced MCP server support for AI-powered development workflows.

๐ŸŽฏ What is QDrant Loader?

QDrant Loader is a data ingestion and retrieval system that collects content from multiple sources, processes and vectorizes it, then provides intelligent search capabilities through a Model Context Protocol (MCP) server for AI development tools.

Perfect for:

  • ๐Ÿค– AI-powered development with Cursor, Windsurf, and other MCP-compatible tools
  • ๐Ÿ“š Knowledge base creation from technical documentation
  • ๐Ÿ” Intelligent code assistance with contextual information
  • ๐Ÿข Enterprise content integration from multiple data sources

๐Ÿ“ฆ Packages

This monorepo contains three complementary packages:

๐Ÿ”„ QDrant Loader

Data ingestion and processing engine

Collects and vectorizes content from multiple sources into QDrant vector database.

Key Features:

  • Multi-source connectors: Git, Confluence (Cloud & Data Center), JIRA (Cloud & Data Center), Public Docs, Local Files
  • File conversion: PDF, Office docs (Word, Excel, PowerPoint), images, audio, EPUB, ZIP, and more using MarkItDown
  • Smart chunking: Modular chunking strategies with intelligent document processing and hierarchical context
  • Incremental updates: Change detection and efficient synchronization
  • Multi-project support: Organize sources into projects with shared collections
  • Provider-agnostic LLM: OpenAI, Azure OpenAI, Ollama, and custom endpoints with unified configuration

โš™๏ธ QDrant Loader Core

Core library and LLM abstraction layer

Provides the foundational components and provider-agnostic LLM interface used by other packages.

Key Features:

  • LLM Provider Abstraction: Unified interface for OpenAI, Azure OpenAI, Ollama, and custom endpoints
  • Configuration Management: Centralized settings and validation for LLM providers
  • Rate Limiting: Built-in rate limiting and request management
  • Error Handling: Robust error handling and retry mechanisms
  • Logging: Structured logging with configurable levels

๐Ÿ”Œ QDrant Loader MCP Server

AI development integration layer

Model Context Protocol server providing search capabilities to AI development tools.

Key Features:

  • MCP Protocol 2025-06-18: Latest protocol compliance with dual transport support (stdio + HTTP)
  • Advanced search tools: Semantic search, hierarchy-aware search, attachment discovery, and conflict detection
  • Cross-document intelligence: Document similarity, clustering, relationship analysis, and knowledge graphs
  • Streaming capabilities: Server-Sent Events (SSE) for real-time search results
  • Production-ready: HTTP transport with security, session management, and health checks

๐Ÿš€ Quick Start

Installation

# Install both packages
pip install qdrant-loader qdrant-loader-mcp-server

# Or install individually
pip install qdrant-loader          # Data ingestion only
pip install qdrant-loader-mcp-server  # MCP server only

5-Minute Setup

  1. Create a workspace

    mkdir my-workspace && cd my-workspace
    
  2. Initialize workspace with templates

    qdrant-loader init --workspace .
    
  3. Configure your environment (edit .env)

    # Qdrant connection
    QDRANT_URL=http://localhost:6333
    QDRANT_COLLECTION_NAME=my_docs
    
    # LLM provider (new unified configuration)
    OPENAI_API_KEY=your_openai_key
    LLM_PROVIDER=openai
    LLM_BASE_URL=https://api.openai.com/v1
    LLM_EMBEDDING_MODEL=text-embedding-3-small
    LLM_CHAT_MODEL=gpt-4o-mini
    
  4. Configure data sources (edit config.yaml)

    global:
      qdrant:
        url: "http://localhost:6333"
        collection_name: "my_docs"
      llm:
        provider: "openai"
        base_url: "https://api.openai.com/v1"
        api_key: "${OPENAI_API_KEY}"
        models:
          embeddings: "text-embedding-3-small"
          chat: "gpt-4o-mini"
        embeddings:
          vector_size: 1536
    
    projects:
      my-project:
        project_id: "my-project"
        sources:
          git:
            docs-repo:
              base_url: "https://github.com/your-org/your-repo.git"
              branch: "main"
              file_types: ["*.md", "*.rst"]
    
  5. Load your data

    qdrant-loader ingest --workspace .
    
  6. Start the MCP server

    mcp-qdrant-loader --env /path/tp/your/.env
    

๐Ÿ”ง MCP-Compatible IDE Setup

QDrant Loader works with any IDE/tool that supports MCP, including Cursor, Windsurf, and Claude Desktop.

Minimal MCP server entry (adapt path/format to your tool):

{
  "mcpServers": {
    "qdrant-loader": {
      "command": "/path/to/venv/bin/mcp-qdrant-loader",
      "env": {
        "QDRANT_URL": "http://localhost:6333",
        "QDRANT_COLLECTION_NAME": "my_docs",
        "OPENAI_API_KEY": "your_key"
      }
    }
  }
}

Alternative: Use configuration file (recommended for complex setups):

{
  "mcpServers": {
    "qdrant-loader": {
      "command": "/path/to/venv/bin/mcp-qdrant-loader",
      "args": [
        "--config",
        "/path/to/your/config.yaml",
        "--env",
        "/path/to/your/.env"
      ]
    }
  }
}

For tool-specific setup and exact config format:

  • MCP Setup and Integration - Full guide
  • Cursor Setup
  • Windsurf Setup
  • Claude Desktop Setup

Example queries in AI tools:

  • "Find documentation about authentication in our API"
  • "Show me examples of error handling patterns"
  • "What are the deployment requirements for this service?"
  • "Find all attachments related to database schema"

๐Ÿ“š Documentation

Getting Started

  • Getting Started - Quick start and core concepts
  • Installation Guide - Complete setup instructions
  • Quick Start - Step-by-step tutorial
  • Core Concepts - Understand the core architecture: workspace model, projects and sources, ingestion pipeline, and MCP search flow

User Guides

  • User Guides - Detailed usage instructions
  • Configuration - Complete configuration reference
  • Data Sources - Git, Confluence, JIRA setup
  • File Conversion - File processing capabilities
  • MCP Server - AI tool integration

๐Ÿ› ๏ธ Developer Resources

  • Developer hub - Developer guides for architecture, testing, deployment, and contribution workflows.
  • Architecture - System design overview
  • Testing - Testing guide and best practices

๐Ÿ†˜ Support

๐Ÿค Contributing

We welcome contributions! See our Contributing Guide for:

  • Development environment setup
  • Code style and standards
  • Pull request process

Quick Development Setup

# Clone and setup
git clone https://github.com/martin-papy/qdrant-loader.git
cd qdrant-loader

# Sync workspace environment (recommended)
uv sync --all-packages --all-extras

# Add a new dependency during development
uv add fastapi
uv sync

๐Ÿ“„ License

This project is licensed under the GNU GPLv3 - see the LICENSE file for details.

Ready to get started? Check out our Quick Start Guide or browse the complete documentation.

MCP Server ยท Populars

MCP Server ยท New

    Ayushmaniar

    PowerPoint MCP Server

    Open Source Model Context Protocol server for PowerPoint automation on Windows via pywin32

    Community Ayushmaniar
    kdpa-llc

    ๐ŸŽฏ Local Skills MCP

    Universal MCP server enabling any LLM or AI agent to utilize expert skills from your local filesystem. Reduces context consumption through lazy loading. Works with Claude, Cline, and any MCP-compatible client.

    Community kdpa-llc
    prime-radiant-inc

    streamlinear

    Streamlined Linear integration for Claude Code - one tool, six actions, zero bloat

    Community prime-radiant-inc
    martin-papy

    QDrant Loader

    Enterprise-ready vector database toolkit for building searchable knowledge bases from multiple data sources. Supports multi-project management, automatic ingestion from Confluence/JIRA/Git, intelligent file conversion (PDF/Office/images), and semantic search. Includes MCP server for seamless AI assistant integration.

    Community martin-papy
    Beever-AI

    Beever Atlas

    Your First LLM-Wiki Conversation Knowledge Base

    Community Beever-AI