医生
HTTP-SSE网页爬取索引MCP服务器,支持向量搜索和层次站点映射
网页爬取索引MCP服务器,支持向量搜索和层次站点映射
  A tool for discovering, crawl, and indexing web sites to be exposed as an MCP server for LLM agents for better and more up-to-date reasoning and code generation.
Doctor provides a complete stack for:
export OPENAI_API_KEY=your-openai-key
docker compose up
/fetch_url endpoint and start a crawl job by providing a URL/job_progress to see the current job statushttp://localhost:9111/mcp as an MCP serverPOST /fetch_url: Start crawling a URLGET /search_docs: Search indexed documentsGET /job_progress: Check crawl job progressGET /list_doc_pages: List indexed pagesGET /get_doc_page: Get full text of a pageThe Maps feature provides a hierarchical view of crawled websites, making it easy to navigate and explore the structure of indexed sites.
Endpoints:
GET /map: View an index of all crawled sitesGET /map/site/{root_page_id}: View the hierarchical tree structure of a specific siteGET /map/page/{page_id}: View a specific page with navigation (parent, siblings, children)GET /map/page/{page_id}/raw: Get the raw markdown content of a pageFeatures:
Usage Example:
/fetch_url endpoint/map to see all crawled sitesEnsure that your Docker Compose stack is up, and then add to your Cursor or VSCode MCP Servers configuration:
"doctor": { "type": "sse", "url": "http://localhost:9111/mcp" }
To run all tests:
# Run all tests with coverage report pytest
To run specific test categories:
# Run only unit tests pytest -m unit # Run only async tests pytest -m async_test # Run tests for a specific component pytest tests/lib/test_crawler.py
The project is configured to generate coverage reports automatically:
# Run tests with detailed coverage report pytest --cov=src --cov-report=term-missing
tests/conftest.py: Common fixtures for all teststests/lib/: Tests for library components
test_crawler.py: Tests for the crawler moduletest_crawler_enhanced.py: Tests for enhanced crawler with hierarchy trackingtest_chunker.py: Tests for the chunker moduletest_embedder.py: Tests for the embedder moduletest_database.py: Tests for the unified Database classtest_database_hierarchy.py: Tests for database hierarchy operationstests/common/: Tests for common modulestests/services/: Tests for service layer
test_map_service.py: Tests for the map servicetests/api/: Tests for API endpoints
test_map_api.py: Tests for map API endpointstests/integration/: Integration tests
test_processor_enhanced.py: Tests for enhanced processor with hierarchyThe project is configured with pre-commit hooks that run automatically before each commit:
ruff check --fix: Lints code and automatically fixes issuesruff format: Formats code according to project styleTo set up pre-commit hooks:
# Install pre-commit uv pip install pre-commit # Install the git hooks pre-commit install
You can run the pre-commit hooks manually on all files:
# Run all pre-commit hooks pre-commit run --all-files
Or on staged files only:
# Run on staged files pre-commit run
This project is licensed under the MIT License - see the LICENSE.md file for details.