MD Web Crawler
STDIOPython-based MCP web crawler for extracting and saving website content.
Python-based MCP web crawler for extracting and saving website content.
A Python-based MCP (https://modelcontextprotocol.io/introduction) web crawler for extracting and saving website content.
git clone https://github.com/yourusername/webcrawler.git cd webcrawler
pip install -r requirements.txt
export OUTPUT_PATH=./output # Set your preferred output directory
Crawled content is saved in markdown format in the specified output directory.
The server can be configured through environment variables:
OUTPUT_PATH
: Default output directory for saved filesMAX_CONCURRENT_REQUESTS
: Maximum parallel requests (default: 5)REQUEST_TIMEOUT
: Request timeout in seconds (default: 30)Install with FastMCP
fastmcp install server.py
or user custom settings to run with fastmcp directly
"Crawl Server": {
"command": "fastmcp",
"args": [
"run",
"/Users/mm22/Dev_Projekte/servers-main/src/Webcrawler/server.py"
],
"env": {
"OUTPUT_PATH": "/Users/user/Webcrawl"
}
fastmcp dev server.py --with-editable .
It helps to use https://modelcontextprotocol.io/docs/tools/inspector for debugging
mcp call extract_content --url "https://example.com" --output_path "example.md"
mcp call scan_linked_content --url "https://example.com" | \ mcp call create_index --content_map - --output_path "index.md"
git checkout -b feature/AmazingFeature
)git commit -m 'Add some AmazingFeature'
)git push origin feature/AmazingFeature
)Distributed under the MIT License. See LICENSE
for more information.