
Code Index
STDIOLocal-first code indexer with MCP integration for AI-powered code understanding and search
Local-first code indexer with MCP integration for AI-powered code understanding and search
Modular, extensible local-first code indexer designed to enhance Claude Code and other LLMs with deep code understanding capabilities. Built on the Model Context Protocol (MCP) for seamless integration with AI assistants.
Current completion: 100% (Production-ready with comprehensive validation completed)
System complexity: 5/5 (High - 136k lines, semantic search, distributed architecture)
Production ready: Yes - All systems validated, sub-100ms query performance, complete documentation
.indexes/
(relative to MCP server)The Code-Index-MCP follows a modular, plugin-based architecture designed for extensibility and performance:
🌐 System Context (Level 1)
📦 Container Architecture (Level 2)
┌─────────────────┐ ┌──────────────┐ ┌─────────────┐
│ API Gateway │────▶│ Dispatcher │────▶│ Plugins │
│ (FastAPI) │ │ │ │ (Language) │
└─────────────────┘ └──────────────┘ └─────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌──────────────┐ ┌─────────────┐
│ Local Index │ │ File Watcher │ │ Embedding │
│ (SQLite+FTS5) │ │ (Watchdog) │ │ Service │
└─────────────────┘ └──────────────┘ └─────────────┘
🔧 Component Details (Level 3)
The project follows a clean, organized structure. See docs/PROJECT_STRUCTURE.md for detailed layout.
Key directories:
mcp_server/
- Core MCP server implementationscripts/
- Development and utility scriptstests/
- Comprehensive test suite with fixturesdocs/
- Documentation and guidesarchitecture/
- System design and diagramsdocker/
- Docker configurations and compose filesdata/
- Database files and indexeslogs/
- Application and test logsreports/
- Generated performance reports and analysisanalysis_archive/
- Historical analysis and archived researchProduction-Ready Features:
Language Categories:
Category | Languages | Features |
---|---|---|
Dedicated Plugins | Python, JavaScript, TypeScript, C, C++, Dart, HTML/CSS | Enhanced analysis, framework support |
Systems Languages | Go, Rust, C, C++, Zig, Nim, D, V | Memory safety, performance analysis |
JVM Languages | Java, Kotlin, Scala, Clojure | Package analysis, build tool integration |
Web Technologies | JavaScript, TypeScript, HTML, CSS, SCSS, PHP | Framework detection, bundler support |
Scripting Languages | Python, Ruby, Perl, Lua, R, Julia | Dynamic typing, REPL integration |
Functional Languages | Haskell, Elixir, Erlang, F#, OCaml | Pattern matching, type inference |
Mobile Development | Swift, Kotlin, Dart, Objective-C | Platform-specific APIs |
Infrastructure | Dockerfile, Bash, PowerShell, Makefile, CMake | Build automation, CI/CD |
Data Formats | JSON, YAML, TOML, XML, GraphQL, SQL | Schema validation, query optimization |
Documentation | Markdown, LaTeX, reStructuredText | Cross-references, formatting |
Implementation Status: Production-Ready - All languages supported via the enhanced dispatcher with:
# Auto-configures MCP for your environment ./scripts/setup-mcp-json.sh # Or interactive mode ./scripts/setup-mcp-json.sh --interactive
This automatically detects your environment and creates the appropriate .mcp.json
configuration.
# Install MCP Index with Docker curl -sSL https://raw.githubusercontent.com/Code-Index-MCP/main/scripts/install-mcp-docker.sh | bash # Index your current directory docker run -it -v $(pwd):/workspace ghcr.io/code-index-mcp/mcp-index:minimal
# Set your API key (get one at https://voyageai.com) export VOYAGE_AI_API_KEY=your-key # Run with semantic search docker run -it -v $(pwd):/workspace -e VOYAGE_AI_API_KEY ghcr.io/code-index-mcp/mcp-index:standard
# PowerShell .\scripts\setup-mcp-json.ps1 # Or manually with Docker Desktop docker run -it -v ${PWD}:/workspace ghcr.io/code-index-mcp/mcp-index:minimal
# Install Docker Desktop or use Homebrew brew install --cask docker # Run setup ./scripts/setup-mcp-json.sh
# Install Docker (no Desktop needed) curl -fsSL https://get.docker.com | sh # Run setup ./scripts/setup-mcp-json.sh
# With Docker Desktop integration ./scripts/setup-mcp-json.sh # Auto-detects WSL+Docker # Without Docker Desktop cp .mcp.json.templates/native.json .mcp.json pip install -e .
# For VS Code/Cursor dev containers # Option 1: Use native Python (already in container) cp .mcp.json.templates/native.json .mcp.json # Option 2: Use Docker sidecar (avoids dependency conflicts) docker-compose -f docker/compose/development/docker-compose.mcp-sidecar.yml up -d cp .mcp.json.templates/docker-sidecar.json .mcp.json
The setup script creates the appropriate .mcp.json
for your environment. Manual examples:
{ "mcpServers": { "code-index-native": { "command": "python", "args": ["scripts/cli/mcp_server_cli.py"], "cwd": "${workspace}" } } }
{ "mcpServers": { "code-index-docker": { "command": "docker", "args": [ "run", "-i", "--rm", "-v", "${workspace}:/workspace", "ghcr.io/code-index-mcp/mcp-index:minimal" ] } } }
Feature | Minimal | Standard | Full | Cost |
---|---|---|---|---|
Code Search | ✅ | ✅ | ✅ | Free |
48 Languages | ✅ | ✅ | ✅ | Free |
Semantic Search | ❌ | ✅ | ✅ | ~$0.05/1M tokens |
GitHub Sync | ❌ | ✅ | ✅ | Free |
Monitoring | ❌ | ❌ | ✅ | Free |
Download a pre-built index from our releases to get started immediately:
# Download latest release python scripts/download-release.py --latest # Or download a specific version python scripts/download-release.py --tag v2024.01.15
Clone the repository
git clone https://github.com/yourusername/Code-Index-MCP.git cd Code-Index-MCP
Install dependencies
# Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install requirements pip install -r requirements.txt
Build the index (or download pre-built)
# Build index for current directory with full support (SQL + Semantic) python scripts/index_repositories.py --mode full # Or use specific modes: # SQL-only (fast, no API key needed): python scripts/index_repositories.py --mode sql # Semantic-only (requires VOYAGE_AI_API_KEY): python scripts/index_repositories.py --mode semantic # Or download from GitHub artifacts (if available) python scripts/index-artifact-download-v2.py --latest
Start the server
# Start the MCP server uvicorn mcp_server.gateway:app --reload --host 0.0.0.0 --port 8000
Test the API
# Check server status curl http://localhost:8000/status # Search for code curl -X POST http://localhost:8000/search \ -H "Content-Type: application/json" \ -d '{"query": "def parse"}'
Create a .env
file for configuration:
# Optional: Voyage AI for semantic search VOYAGE_AI_API_KEY=your_api_key_here # Server settings MCP_SERVER_HOST=0.0.0.0 MCP_SERVER_PORT=8000 MCP_LOG_LEVEL=INFO # Workspace settings MCP_WORKSPACE_ROOT=. MCP_MAX_FILE_SIZE=10485760 # 10MB # GitHub Artifact Sync (privacy settings) MCP_ARTIFACT_SYNC=false # Set to true to enable AUTO_UPLOAD=false # Auto-upload on changes AUTO_DOWNLOAD=true # Auto-download on clone
Control how your code index is shared:
// .mcp-index.json { "github_artifacts": { "enabled": false, // Disable sync entirely "auto_upload": false, // Manual upload only "auto_download": true, // Still get team indexes "exclude_patterns": [ // Additional exclusions "internal/*", "proprietary/*" ] } }
Privacy Features:
The system includes multiple reranking strategies to improve search relevance:
# Configure reranking in your searches from mcp_server.indexer.reranker import RerankConfig, TFIDFReranker config = RerankConfig( enabled=True, reranker=TFIDFReranker(), # Or CohereReranker(), CrossEncoderReranker() top_k=20 ) # Search with reranking results = await search_engine.search(query, rerank_config=config)
Available Rerankers:
Prevent accidental sharing of sensitive files:
# Analyze current index for security issues python scripts/utilities/analyze_gitignore_security.py # Create secure index export (filters gitignored files) python scripts/utilities/secure_index_export.py # The secure export will: # - Exclude all gitignored files # - Remove sensitive patterns (*.env, *.key, etc.) # - Create audit logs of excluded files
Combines traditional full-text search with semantic search:
# The system automatically uses hybrid search when available # Configure weights in settings: HYBRID_SEARCH_BM25_WEIGHT=0.3 HYBRID_SEARCH_SEMANTIC_WEIGHT=0.5 HYBRID_SEARCH_FUZZY_WEIGHT=0.2
The enhanced dispatcher includes timeout protection and automatic fallback:
from mcp_server.dispatcher.dispatcher_enhanced import EnhancedDispatcher from mcp_server.storage.sqlite_store import SQLiteStore store = SQLiteStore(".indexes/YOUR_REPO_ID/current.db") dispatcher = EnhancedDispatcher( sqlite_store=store, semantic_search_enabled=True, # Enable if Qdrant available lazy_load=True, # Load plugins on-demand use_plugin_factory=True # Use dynamic plugin loading ) # Search with automatic optimization results = list(dispatcher.search("your query", limit=10))
For maximum performance with BM25-only search:
from mcp_server.dispatcher.simple_dispatcher import create_simple_dispatcher # Ultra-fast BM25 search without plugin overhead dispatcher = create_simple_dispatcher(".indexes/YOUR_REPO_ID/current.db") results = list(dispatcher.search("your query", limit=10))
Configure dispatcher behavior via environment variables:
# Dispatcher settings MCP_DISPATCHER_TIMEOUT=5 # Plugin loading timeout (seconds) MCP_USE_SIMPLE_DISPATCHER=false # Use simple dispatcher MCP_PLUGIN_LAZY_LOAD=true # Load plugins on-demand # Performance tuning MCP_BM25_BYPASS_ENABLED=true # Enable direct BM25 bypass MCP_MAX_PLUGIN_MEMORY=1024 # Max memory for plugins (MB)
All indexes are now stored centrally at .indexes/
(relative to the MCP project) for better organization and to prevent accidental commits:
.indexes/
├── {repo_hash}/ # Unique hash for each repository
│ ├── main_abc123.db # Index for main branch at commit abc123
│ ├── main_abc123.metadata.json
│ └── current.db -> main_abc123.db # Symlink to active index
├── qdrant/ # Semantic search embeddings
│ └── main.qdrant/ # Centralized Qdrant database
Benefits:
Migration: For existing repositories with local indexes:
python scripts/move_indexes_to_central.py
This project uses GitHub Actions Artifacts for efficient index sharing, eliminating reindexing time while keeping the repository lean.
# First time setup - pull latest indexes python scripts/cli/mcp_cli.py artifact pull --latest # After making changes - rebuild locally python scripts/cli/mcp_cli.py index rebuild # Share your indexes with the team python scripts/cli/mcp_cli.py artifact push # Check sync status python scripts/cli/mcp_cli.py artifact sync # Optional: Install git hooks for automatic sync mcp-index hooks install # Now indexes upload automatically on git push # and download automatically on git pull
Enable portable index management in any repository with zero GitHub compute costs:
# One-line install curl -sSL https://raw.githubusercontent.com/yourusername/mcp-index-kit/main/install.sh | bash # Or via npm npm install -g mcp-index-kit mcp-index init
Zero-Cost Architecture:
Portable Design:
Usage:
# Initialize in your repo cd your-repo mcp-index init # Build index locally mcp-index build # Push to GitHub Artifacts mcp-index push # Pull latest index mcp-index pull # Auto sync mcp-index sync
To enable semantic search capabilities, you need a Voyage AI API key. Get one from https://www.voyageai.com/.
Method 1: Claude Code Configuration (Recommended)
Create or edit .mcp.json
in your project root:
{ "mcpServers": { "code-index-mcp": { "command": "uvicorn", "args": ["mcp_server.gateway:app", "--host", "0.0.0.0", "--port", "8000"], "env": { "VOYAGE_AI_API_KEY": "your-voyage-ai-api-key-here", "SEMANTIC_SEARCH_ENABLED": "true" } } } }
Method 2: Claude Code CLI
claude mcp add code-index-mcp -e VOYAGE_AI_API_KEY=your_key -e SEMANTIC_SEARCH_ENABLED=true -- uvicorn mcp_server.gateway:app
Method 3: Environment Variables
export VOYAGE_AI_API_KEY=your_key export SEMANTIC_SEARCH_ENABLED=true
Method 4: .env File
Create a .env
file in your project root:
VOYAGE_AI_API_KEY=your_key
SEMANTIC_SEARCH_ENABLED=true
Check Configuration
Verify your semantic search setup:
python scripts/cli/mcp_cli.py index check-semantic
Edit .mcp-index.json
in your repository:
{ "enabled": true, "auto_download": true, "artifact_retention_days": 30, "github_artifacts": { "enabled": true, "max_size_mb": 100 } }
See mcp-index-kit for full documentation
python scripts/cli/mcp_cli.py artifact info 12345
#### Index Management
```bash
# Check index status
python scripts/cli/mcp_cli.py index status
# Check compatibility
python scripts/cli/mcp_cli.py index check-compatibility
# Rebuild indexes locally
python scripts/cli/mcp_cli.py index rebuild
# Create backup
python scripts/cli/mcp_cli.py index backup my_backup
# Restore from backup
python scripts/cli/mcp_cli.py index restore my_backup
Clone Repository
git clone https://github.com/yourusername/Code-Index-MCP.git cd Code-Index-MCP
Get Latest Indexes
python scripts/cli/mcp_cli.py artifact pull --latest
Make Your Changes
Share Updates
# Your indexes are already updated locally python scripts/cli/mcp_cli.py artifact push
The system tracks embedding model versions to ensure compatibility:
voyage-code-3
(1024 dimensions)If you use a different embedding model, the system will detect incompatibility and rebuild locally with your configuration.
Create plugin structure
mkdir -p mcp_server/plugins/my_language_plugin cd mcp_server/plugins/my_language_plugin touch __init__.py plugin.py
Implement the plugin interface
from mcp_server.plugin_base import PluginBase class MyLanguagePlugin(PluginBase): def __init__(self): self.tree_sitter_language = "my_language" def index(self, file_path: str) -> Dict: # Parse and index the file pass def getDefinition(self, symbol: str, context: Dict) -> Dict: # Find symbol definition pass def getReferences(self, symbol: str, context: Dict) -> List[Dict]: # Find symbol references pass
Register the plugin
# In dispatcher.py from .plugins.my_language_plugin import MyLanguagePlugin self.plugins['my_language'] = MyLanguagePlugin()
# Run all tests pytest # Run specific test pytest test_python_plugin.py # Run with coverage pytest --cov=mcp_server --cov-report=html
# View C4 architecture diagrams docker run --rm -p 8080:8080 \ -v "$(pwd)/architecture":/usr/local/structurizr \ structurizr/lite # Open http://localhost:8080 in your browser
GET /symbol
Get symbol definition
GET /symbol?symbol_name=parseFile&file_path=/path/to/file.py
Query parameters:
symbol_name
(required): Name of the symbol to findfile_path
(optional): Specific file to search inGET /search
Search for code patterns
GET /search?query=async+def.*parse&file_extensions=.py,.js
Query parameters:
query
(required): Search pattern (regex supported)file_extensions
(optional): Comma-separated list of extensionsAll API responses follow a consistent JSON structure:
Success Response:
{ "status": "success", "data": { ... }, "timestamp": "2024-01-01T00:00:00Z" }
Error Response:
{ "status": "error", "error": "Error message", "code": "ERROR_CODE", "timestamp": "2024-01-01T00:00:00Z" }
The project includes multiple Docker configurations for different environments:
Development (Default):
# Uses docker-compose.yml + Dockerfile docker-compose up -d # - SQLite database # - Uvicorn development server # - Volume mounts for code changes # - Debug logging enabled
Production:
# Uses docker-compose.production.yml + Dockerfile.production docker-compose -f docker-compose.production.yml up -d # - PostgreSQL database # - Gunicorn + Uvicorn workers # - Multi-stage optimized builds # - Security hardening (non-root user) # - Production logging
Enhanced Development:
# Uses both compose files with development overrides docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d # - Development base + enhanced debugging # - Source code volume mounting # - Read-write code access
Important: By default, docker-compose restart
uses the DEVELOPMENT configuration:
docker-compose restart
→ Uses docker-compose.yml
(Development)docker-compose -f docker-compose.production.yml restart
→ Uses ProductionFor production environments, we provide:
k8s/
directorySee our Deployment Guide for detailed instructions including:
For quick setup, download pre-built indexes from our GitHub releases:
# List available releases python scripts/download-release.py --list # Download latest release python scripts/download-release.py --latest # Download specific version python scripts/download-release.py --tag v2024.01.15 --output ./my-index
Maintainers can create new releases with pre-built indexes:
# Create a new release (as draft) python scripts/create-release.py --version 1.0.0 # Create and publish immediately python scripts/create-release.py --version 1.0.0 --publish
The project includes Git hooks for automatic index synchronization:
Install hooks with: mcp-index hooks install
We welcome contributions! Please see our Contributing Guide for details.
git checkout -b feature/amazing-feature
)Operation | Performance Target | Current Status |
---|---|---|
Symbol Lookup | <100ms (p95) | ✅ Achieved - All queries < 100ms |
Code Search | <500ms (p95) | ✅ Achieved - BM25 search < 50ms |
File Indexing | 10K files/min | ✅ Achieved - 152K files indexed |
The system follows C4 model architecture patterns:
For detailed architectural documentation, see the architecture/ directory.
See ROADMAP.md for detailed development plans and current progress.
Current Status: 100% Complete - Production Ready
Recent Achievements (June 2025):
Performance optimization features are implemented and available:
INDEXING_BATCH_SIZE
environment variableINDEXING_MAX_FILE_SIZE
environment variableINDEXING_MAX_WORKERS
This project is licensed under the MIT License - see the LICENSE file for details.
Built with ❤️ for the developer community
# Test change to trigger hook