Scholar Agent
中文
General-purpose LLMs are often inaccurate and outdated in specialized domains. Scholar Agent combines online research + local knowledge accumulation into a sustainable knowledge flywheel, making your AI smarter in your domain over time. It also builds a human-readable knowledge base for quick learning. Integrates seamlessly with Claude Code and VS Code Copilot via MCP.
What It Does
Your question
│
▼
Online research (AI agent search + SearXNG + academic APIs)
│
▼
Structured synthesis (with citations, confidence, uncertainty)
│
▼
Local accumulation (Markdown knowledge cards + BM25 index)
│
▼
Next question: AI checks local first ── hit? ──► use directly, fast & accurate
│ miss
▼
Research again → accumulate → reindex ──► knowledge base keeps growing
Each round compounds. Knowledge cards have full lifecycle management: draft → reviewed → trusted → stale → deprecated.
Academic Research Pipeline
Scholar Agent includes a comprehensive academic paper research pipeline:
- Paper Search — Search papers from arXiv, DBLP, and Semantic Scholar. Filter by top conferences (CVPR, ICCV, ECCV, ICLR, AAAI, NeurIPS, ICML, ACL, EMNLP, MICCAI)
- Smart Scoring — Four-dimensional scoring engine (relevance, recency, popularity, quality) ranks papers by your research interests
- Deep Analysis Notes — Auto-generate 20+ section Obsidian-style markdown notes with
<!-- LLM: -->placeholders for AI-assisted completion - Figure Extraction — Extract images from arXiv source archives and PDFs (via PyMuPDF)
- Daily Recommendations — Automated daily paper search, scoring, deduplication, and recommendation note generation
- Paper → Knowledge Card — Convert paper analyses into knowledge cards that feed back into the knowledge flywheel
- Keyword Auto-Linking — Scan notes for technical terms and create
[[wiki-links]]automatically
Quick Start
Embed into an existing project
cd my-project && git clone https://github.com/zfy465914233/scholar-agent.git
bash scholar-agent/setup.sh
# Restart Claude Code to activate
This will create the directory structure, copy config templates, install skills, and build the knowledge index.
Use as a standalone project
# Clone and install
git clone https://github.com/zfy465914233/scholar-agent.git
cd scholar-agent
pip install -r requirements.txt
# Build the knowledge index
python scripts/local_index.py --output indexes/local/index.json
# (Optional) Start SearXNG for web research
docker compose up -d
MCP configs are pre-configured:
- Claude Code:
.mcp.jsonis ready.cdinto the project and start Claude Code. - VS Code Copilot:
.vscode/mcp.jsonis ready. Open the project, enable agent mode.
MCP Tools
Core Tools (always available)
| Tool | Description |
|---|---|
query_knowledge |
Search local knowledge base |
save_research |
Save structured research results as a knowledge card |
list_knowledge |
Browse all knowledge cards |
capture_answer |
Quick-capture a Q&A pair as a draft card |
ingest_source |
Ingest a URL or raw text into the knowledge base |
build_graph |
Generate an interactive knowledge graph (vis.js) |
Academic Tools (set SCHOLAR_ACADEMIC=1 to enable)
| Tool | Description |
|---|---|
search_papers |
Search arXiv + Semantic Scholar with 4-dim scoring |
search_conf_papers |
Search conference papers via DBLP + S2 enrichment |
analyze_paper |
Generate deep-analysis markdown notes (20+ sections) |
extract_paper_images |
Extract figures from arXiv source / PDF |
paper_to_card |
Convert paper analysis into a knowledge card |
daily_recommend |
Daily paper recommendation workflow |
link_paper_keywords |
Auto-link keywords as [[wikilinks]] in notes |
Configuration
.scholar.json
The .scholar.json file configures knowledge paths and academic research settings. See .scholar.example.json for a full example with comments.
Key sections:
knowledge_dir— Path to knowledge cards directoryindex_path— Path to BM25 search indexacademic.research_interests— Your research domains, keywords, and arXiv categoriesacademic.scoring— Paper scoring weights and dimensions
Environment Variables
Copy .env.example to .env and configure:
| Variable | Required | Description |
|---|---|---|
SCHOLAR_ACADEMIC |
No | Set to 1 to enable academic tools |
S2_API_KEY |
No | Semantic Scholar API key (get one free) |
LLM_API_KEY |
No | LLM API key for advanced synthesis pipeline |
SEARXNG_BASE_URL |
No | SearXNG URL for web research (default: http://localhost:8080) |
Project Structure
scholar-agent/
├── mcp_server.py # MCP server (13 tools)
├── setup_mcp.py # Embed into existing projects
├── pyproject.toml # Package configuration
├── docker-compose.yml # SearXNG
├── .scholar.json # Project & academic configuration
├── schemas/ # Answer + evidence JSON schemas
├── scripts/
│ ├── academic/ # Academic research modules
│ │ ├── arxiv_search.py # arXiv + Semantic Scholar search
│ │ ├── conf_search.py # Conference paper search (DBLP)
│ │ ├── paper_analyzer.py # Deep-analysis note generation
│ │ ├── scoring.py # 4-dim paper scoring engine
│ │ ├── image_extractor.py # Figure extraction from PDFs
│ │ ├── note_linker.py # Wiki-link discovery + keyword linking
│ │ └── daily_workflow.py # Daily recommendation pipeline
│ ├── scholar_config.py # Configuration reader
│ ├── local_index.py # BM25 index builder
│ ├── local_retrieve.py # Knowledge retrieval
│ ├── close_knowledge_loop.py # Knowledge card builder
│ └── ... # Research, synthesis, governance, graph
├── knowledge/ # Knowledge cards (gitignored, user-generated)
├── indexes/ # Generated indexes (gitignored)
└── tests/ # 247 tests
More Features
- Multi-perspective research — Parallel research from 5 perspectives (academic, technical, applied, contrarian, historical)
- Obsidian compatible — Standard Markdown + YAML frontmatter +
[[wiki-links]] - Knowledge governance CLI — Validate frontmatter, detect orphaned cards, find duplicates, manage lifecycle
- Provider fault tolerance — Each search source fails independently; falls back to local retrieval when offline
Testing
python -m pytest tests/ -v
247 tests, ~13s. No external services needed.
License
MIT — see LICENSE.