Crawl4ai
STDIO基于crawl4ai的通用内容提取与AI分析服务器
基于crawl4ai的通用内容提取与AI分析服务器
⚠️ Important: This is an unofficial MCP server implementation for the excellent crawl4ai library.
Not affiliated with the original crawl4ai project.
A comprehensive Model Context Protocol (MCP) server that wraps the powerful crawl4ai library with advanced AI capabilities. Extract and analyze content from any source: web pages, PDFs, Office documents, YouTube videos, and more. Features intelligent summarization to dramatically reduce token usage while preserving key information.
Install system dependencies for Playwright:
Ubuntu 24.04 LTS (Manual Required):
# Manual setup required due to t64 library transition sudo apt update && sudo apt install -y \ libnss3 libatk-bridge2.0-0 libxss1 libasound2t64 \ libgbm1 libgtk-3-0t64 libxshmfence-dev libxrandr2 \ libxcomposite1 libxcursor1 libxdamage1 libxi6 \ fonts-noto-color-emoji fonts-unifont python3-venv python3-pip python3 -m venv venv && source venv/bin/activate pip install playwright==1.55.0 && playwright install chromium sudo playwright install-deps
Other Linux/macOS:
sudo bash scripts/prepare_for_uvx_playwright.sh
Windows (as Administrator):
scripts/prepare_for_uvx_playwright.ps1
UVX (Recommended - Easiest):
# After system preparation above - that's it! uvx --from git+https://github.com/walksoda/crawl-mcp crawl-mcp
Docker (Production-Ready):
# Clone the repository git clone https://github.com/walksoda/crawl-mcp cd crawl-mcp # Build and run with Docker Compose (STDIO mode) docker-compose up --build # Or build and run HTTP mode on port 8000 docker-compose --profile http up --build crawl4ai-mcp-http # Or build manually docker build -t crawl4ai-mcp . docker run -it crawl4ai-mcp
Docker Features:
UVX Installation:
Add to your claude_desktop_config.json:
{ "mcpServers": { "crawl-mcp": { "transport": "stdio", "command": "uvx", "args": [ "--from", "git+https://github.com/walksoda/crawl-mcp", "crawl-mcp" ], "env": { "CRAWL4AI_LANG": "en" } } } }
Docker HTTP Mode:
{ "mcpServers": { "crawl-mcp": { "transport": "http", "baseUrl": "http://localhost:8000" } } }
For Japanese interface:
"env": { "CRAWL4AI_LANG": "ja" }
| Topic | Description |
|---|---|
| Installation Guide | Complete installation instructions for all platforms |
| API Reference | Full tool documentation and usage examples |
| Configuration Examples | Platform-specific setup configurations |
| HTTP Integration | HTTP API access and integration methods |
| Advanced Usage | Power user techniques and workflows |
| Development Guide | Contributing and development setup |
crawl_url - Single page crawling with JavaScript supportdeep_crawl_site - Multi-page site mapping and explorationcrawl_url_with_fallback - Robust crawling with retry strategiesbatch_crawl - Process multiple URLs simultaneouslyintelligent_extract - Semantic content extraction with custom instructionsauto_summarize - LLM-based summarization for large contentextract_entities - Pattern-based entity extraction (emails, phones, URLs, etc.)process_file - Convert PDFs, Office docs, ZIP archives to markdownextract_youtube_transcript - Multi-language transcript extractionbatch_extract_youtube_transcripts - Process multiple videossearch_google - Genre-filtered Google search with metadatasearch_and_crawl - Combined search and content extractionbatch_search_google - Multiple search queries with analysisContent Research:
search_and_crawl → intelligent_extract → structured analysis
Documentation Mining:
deep_crawl_site → batch processing → comprehensive extraction
Media Analysis:
extract_youtube_transcript → auto_summarize → insight generation
Competitive Intelligence:
batch_crawl → extract_entities → comparative analysis
Installation Issues:
get_system_diagnostics toolPerformance Issues:
wait_for_js: true for JavaScript-heavy sitesauto_summarize for large contentConfiguration Issues:
claude_desktop_config.jsonThis project is an unofficial wrapper around the crawl4ai library. Please refer to the original crawl4ai license for the underlying functionality.
See our Development Guide for contribution guidelines and development setup instructions.