网站下载器
STDIO用于下载文档网站并准备RAG索引的MCP服务器
用于下载文档网站并准备RAG索引的MCP服务器
Simple MCP server for downloading documentation websites and preparing them for RAG indexing.
Fork and download, cd to the repository.
uv venv ./venv/Scripts/activate pip install -e .
Put this in your claude_desktop_config.json with your own paths:
"mcp-windows-website-downloader": { "command": "uv", "args": [ "--directory", "F:/GithubRepos/mcp-windows-website-downloader", "run", "mcp-windows-website-downloader", "--library", "F:/GithubRepos/mcp-windows-website-downloader/website_library" ] },

python -m mcp_windows_website_downloader.server --library docs_library
result = await server.call_tool("download", { "url": "https://docs.example.com" })
docs_library/
  domain_name/
    index.html
    about.html
    docs/
      getting-started.html
      ...
    assets/
      css/
      js/
      images/
      fonts/
    rag_index.json
The server follows standard MCP architecture:
src/
  mcp_windows_website_downloader/
    __init__.py
    server.py    # MCP server implementation
    core.py      # Core downloader functionality
    utils.py     # Helper utilities
server.py: Main MCP server implementation that handles tool registration and requestscore.py: Core website downloading functionality with proper asset handlingutils.py: Helper utilities for file handling and URL processingSingle Responsibility
Clean Structure
Robust Operation
The rag_index.json file contains:
{ "url": "https://docs.example.com", "domain": "docs.example.com", "pages": 42, "path": "/path/to/site" }
MIT License - See LICENSE file
The server handles common issues:
Error responses follow the format:
{ "status": "error", "error": "Detailed error message" }
Success responses:
{ "status": "success", "path": "/path/to/downloaded/site", "pages": 42 }