网站下载器
STDIO用于下载文档网站并准备RAG索引
用于下载文档网站并准备RAG索引
Simple MCP server for downloading documentation websites and preparing them for RAG indexing.
Fork and download, cd to the repository.
uv venv ./venv/Scripts/activate pip install -e .
Put this in your claude_desktop_config.json with your own paths:
"mcp-windows-website-downloader": { "command": "uv", "args": [ "--directory", "F:/GithubRepos/mcp-windows-website-downloader", "run", "mcp-windows-website-downloader", "--library", "F:/GithubRepos/mcp-windows-website-downloader/website_library" ] },
python -m mcp_windows_website_downloader.server --library docs_library
result = await server.call_tool("download", { "url": "https://docs.example.com" })
docs_library/
domain_name/
index.html
about.html
docs/
getting-started.html
...
assets/
css/
js/
images/
fonts/
rag_index.json
The server follows standard MCP architecture:
src/
mcp_windows_website_downloader/
__init__.py
server.py # MCP server implementation
core.py # Core downloader functionality
utils.py # Helper utilities
server.py
: Main MCP server implementation that handles tool registration and requestscore.py
: Core website downloading functionality with proper asset handlingutils.py
: Helper utilities for file handling and URL processingSingle Responsibility
Clean Structure
Robust Operation
The rag_index.json
file contains:
{ "url": "https://docs.example.com", "domain": "docs.example.com", "pages": 42, "path": "/path/to/site" }
MIT License - See LICENSE file
The server handles common issues:
Error responses follow the format:
{ "status": "error", "error": "Detailed error message" }
Success responses:
{ "status": "success", "path": "/path/to/downloaded/site", "pages": 42 }