๐ FastMCP Document Analyzer
A comprehensive document analysis server built with the modern FastMCP framework
๐ Table of Contents
- ๐ Features
- ๐ Quick Start
- ๐ฆ Installation
- ๐ง Usage
- ๐ ๏ธ Available Tools
- ๐ Sample Data
- ๐๏ธ Project Structure
- ๐ API Reference
- ๐งช Testing
- ๐ Documentation
- ๐ค Contributing
๐ Features
๐ Document Analysis
- ๐ญ Sentiment Analysis: VADER + TextBlob dual-engine sentiment classification
- ๐ Keyword Extraction: TF-IDF and frequency-based keyword identification
- ๐ Readability Scoring: Multiple metrics (Flesch, Flesch-Kincaid, ARI)
- ๐ Text Statistics: Word count, sentences, paragraphs, and more
๐๏ธ Document Management
- ๐พ Persistent Storage: JSON-based document collection with metadata
- ๐ Smart Search: TF-IDF semantic similarity search
- ๐ท๏ธ Tag System: Category and tag-based organization
- ๐ Collection Insights: Comprehensive statistics and analytics
๐ FastMCP Advantages
- โก Simple Setup: 90% less boilerplate than standard MCP
- ๐ Type Safety: Full type validation with Pydantic
- ๐ฏ Modern API: Decorator-based tool definitions
- ๐ Multi-Transport: STDIO, HTTP, and SSE support
๐ Quick Start
1. Clone and Setup
git clone <repository-url>
cd document-analyzer
python -m venv venv
source venv/Scripts/activate # Windows
# source venv/bin/activate # macOS/Linux
2. Install Dependencies
pip install -r requirements.txt
3. Initialize NLTK Data
python -c "import nltk; nltk.download('punkt'); nltk.download('vader_lexicon'); nltk.download('stopwords'); nltk.download('punkt_tab')"
4. Run the Server
python fastmcp_document_analyzer.py
5. Test Everything
python test_fastmcp_analyzer.py
๐ฆ Installation
System Requirements
- Python 3.8 or higher
- 500MB free disk space
- Internet connection (for initial NLTK data download)
Dependencies
fastmcp>=2.3.0 # Modern MCP framework
textblob>=0.17.1 # Sentiment analysis
nltk>=3.8.1 # Natural language processing
textstat>=0.7.3 # Readability metrics
scikit-learn>=1.3.0 # Machine learning utilities
numpy>=1.24.0 # Numerical computing
pandas>=2.0.0 # Data manipulation
python-dateutil>=2.8.2 # Date handling
Optional: Virtual Environment
# Create virtual environment
python -m venv venv
# Activate (Windows)
venv\Scripts\activate
# Activate (macOS/Linux)
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
๐ง Usage
Starting the Server
Default (STDIO Transport)
python fastmcp_document_analyzer.py
HTTP Transport (for web services)
python fastmcp_document_analyzer.py --transport http --port 9000
With Custom Host
python fastmcp_document_analyzer.py --transport http --host 0.0.0.0 --port 8080
Basic Usage Examples
# Analyze a document
result = analyze_document("doc_001")
print(f"Sentiment: {result['sentiment_analysis']['overall_sentiment']}")
# Extract keywords
keywords = extract_keywords("Artificial intelligence is transforming healthcare", 5)
print([kw['keyword'] for kw in keywords])
# Search documents
results = search_documents("machine learning", 3)
print(f"Found {len(results)} relevant documents")
# Get collection statistics
stats = get_collection_stats()
print(f"Total documents: {stats['total_documents']}")
๐ ๏ธ Available Tools
Core Analysis Tools
Tool | Description | Example |
---|---|---|
analyze_document |
๐ Complete document analysis | analyze_document("doc_001") |
get_sentiment |
๐ Sentiment analysis | get_sentiment("I love this!") |
extract_keywords |
๐ Keyword extraction | extract_keywords(text, 10) |
calculate_readability |
๐ Readability metrics | calculate_readability(text) |
Document Management Tools
Tool | Description | Example |
---|---|---|
add_document |
๐ Add new document | add_document("id", "title", "content") |
get_document |
๐ Retrieve document | get_document("doc_001") |
delete_document |
๐๏ธ Delete document | delete_document("old_doc") |
list_documents |
๐ List all documents | list_documents("Technology") |
Search and Discovery Tools
Tool | Description | Example |
---|---|---|
search_documents |
๐ Semantic search | search_documents("AI", 5) |
search_by_tags |
๐ท๏ธ Tag-based search | search_by_tags(["AI", "tech"]) |
get_collection_stats |
๐ Collection statistics | get_collection_stats() |
๐ Sample Data
The server comes pre-loaded with 16 diverse documents covering:
Category | Documents | Topics |
---|---|---|
Technology | 4 | AI, Quantum Computing, Privacy, Blockchain |
Science | 3 | Space Exploration, Healthcare, Ocean Conservation |
Environment | 2 | Climate Change, Sustainable Agriculture |
Society | 3 | Remote Work, Mental Health, Transportation |
Business | 2 | Economics, Digital Privacy |
Culture | 2 | Art History, Wellness |
Sample Document Structure
{
"id": "doc_001",
"title": "The Future of Artificial Intelligence",
"content": "Artificial intelligence is rapidly transforming...",
"author": "Dr. Sarah Chen",
"category": "Technology",
"tags": ["AI", "technology", "future", "ethics"],
"language": "en",
"created_at": "2024-01-15T10:30:00"
}
๐๏ธ Project Structure
document-analyzer/
โโโ ๐ analyzer/ # Core analysis engine
โ โโโ __init__.py
โ โโโ document_analyzer.py # Sentiment, keywords, readability
โโโ ๐ storage/ # Document storage system
โ โโโ __init__.py
โ โโโ document_storage.py # JSON storage, search, management
โโโ ๐ data/ # Sample data
โ โโโ __init__.py
โ โโโ sample_documents.py # 16 sample documents
โโโ ๐ fastmcp_document_analyzer.py # ๐ Main FastMCP server
โโโ ๐ test_fastmcp_analyzer.py # Comprehensive test suite
โโโ ๐ requirements.txt # Python dependencies
โโโ ๐ documents.json # Persistent document storage
โโโ ๐ README.md # This documentation
โโโ ๐ FASTMCP_COMPARISON.md # FastMCP vs Standard MCP
โโโ ๐ .gitignore # Git ignore patterns
โโโ ๐ venv/ # Virtual environment (optional)
๐ API Reference
Document Analysis
analyze_document(document_id: str) -> Dict[str, Any]
Performs comprehensive analysis of a document.
Parameters:
document_id
(str): Unique document identifier
Returns:
{
"document_id": "doc_001",
"title": "Document Title",
"sentiment_analysis": {
"overall_sentiment": "positive",
"confidence": 0.85,
"vader_scores": {...},
"textblob_scores": {...}
},
"keywords": [
{"keyword": "artificial", "frequency": 5, "relevance_score": 2.3}
],
"readability": {
"flesch_reading_ease": 45.2,
"reading_level": "Difficult",
"grade_level": "Grade 12"
},
"basic_statistics": {
"word_count": 119,
"sentence_count": 8,
"paragraph_count": 1
}
}
get_sentiment(text: str) -> Dict[str, Any]
Analyzes sentiment of any text.
Parameters:
text
(str): Text to analyze
Returns:
{
"overall_sentiment": "positive",
"confidence": 0.85,
"vader_scores": {
"compound": 0.7269,
"positive": 0.294,
"negative": 0.0,
"neutral": 0.706
},
"textblob_scores": {
"polarity": 0.5,
"subjectivity": 0.6
}
}
Document Management
add_document(...) -> Dict[str, str]
Adds a new document to the collection.
Parameters:
id
(str): Unique document IDtitle
(str): Document titlecontent
(str): Document contentauthor
(str, optional): Author namecategory
(str, optional): Document categorytags
(List[str], optional): Tags listlanguage
(str, optional): Language code
Returns:
{
"status": "success",
"message": "Document 'my_doc' added successfully",
"document_count": 17
}
Search and Discovery
search_documents(query: str, limit: int = 10) -> List[Dict[str, Any]]
Performs semantic search across documents.
Parameters:
query
(str): Search querylimit
(int): Maximum results
Returns:
[
{
"id": "doc_001",
"title": "AI Document",
"similarity_score": 0.8542,
"content_preview": "First 200 characters...",
"tags": ["AI", "technology"]
}
]
๐งช Testing
Run All Tests
python test_fastmcp_analyzer.py
Test Categories
- โ Server Initialization: FastMCP server setup
- โ Sentiment Analysis: VADER and TextBlob integration
- โ Keyword Extraction: TF-IDF and frequency analysis
- โ Readability Calculation: Multiple readability metrics
- โ Document Analysis: Full document processing
- โ Document Search: Semantic similarity search
- โ Collection Statistics: Analytics and insights
- โ Document Management: CRUD operations
- โ Tag Search: Tag-based filtering
Expected Test Output
=== Testing FastMCP Document Analyzer ===
โ FastMCP server module imported successfully
โ Server initialized successfully
โ Sentiment analysis working
โ Keyword extraction working
โ Readability calculation working
โ Document analysis working
โ Document search working
โ Collection statistics working
โ Document listing working
โ Document addition and deletion working
โ Tag search working
=== All FastMCP tests completed successfully! ===
๐ Documentation
Additional Resources
- ๐ FastMCP Documentation
- ๐ MCP Protocol Specification
- ๐ FASTMCP_COMPARISON.md - FastMCP vs Standard MCP
Key Concepts
Sentiment Analysis
Uses dual-engine approach:
- VADER: Rule-based, excellent for social media text
- TextBlob: Machine learning-based, good for general text
Keyword Extraction
Combines multiple approaches:
- TF-IDF: Term frequency-inverse document frequency
- Frequency Analysis: Simple word frequency counting
- Relevance Scoring: Weighted combination of both methods
Readability Metrics
Provides multiple readability scores:
- Flesch Reading Ease: 0-100 scale (higher = easier)
- Flesch-Kincaid Grade: US grade level
- ARI: Automated Readability Index
Document Search
Uses TF-IDF vectorization with cosine similarity:
- Converts documents to numerical vectors
- Calculates similarity between query and documents
- Returns ranked results with similarity scores
๐ค Contributing
Development Setup
# Clone repository
git clone <repository-url>
cd document-analyzer
# Create development environment
python -m venv venv
source venv/Scripts/activate # Windows
pip install -r requirements.txt
# Run tests
python test_fastmcp_analyzer.py
Adding New Tools
FastMCP makes it easy to add new tools:
@mcp.tool
def my_new_tool(param: str) -> Dict[str, Any]:
"""
๐ง Description of what this tool does.
Args:
param: Parameter description
Returns:
Return value description
"""
# Implementation here
return {"result": "success"}
Code Style
- Use type hints for all functions
- Add comprehensive docstrings
- Include error handling
- Follow PEP 8 style guidelines
- Add emoji icons for better readability
Testing New Features
- Add your tool to the main server file
- Create test cases in the test file
- Run the test suite to ensure everything works
- Update documentation as needed
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- FastMCP Team for the excellent framework
- NLTK Team for natural language processing tools
- TextBlob Team for sentiment analysis capabilities
- Scikit-learn Team for machine learning utilities
Made with โค๏ธ using FastMCP
๐ Ready to analyze documents? Start with
python fastmcp_document_analyzer.py