Web Crawler
STDIOMCP server implementation for web crawling with configurable depth and concurrent requests.
MCP server implementation for web crawling with configurable depth and concurrent requests.
Clone the repository:
git clone https://github.com/jitsmaster/web-crawler-mcp.git cd web-crawler-mcp
Install dependencies:
npm install
Build the project:
npm run build
Create a .env
file with the following environment variables:
CRAWL_LINKS=false MAX_DEPTH=3 REQUEST_DELAY=1000 TIMEOUT=5000 MAX_CONCURRENT=5
Start the MCP server:
npm start
Add the following to your MCP settings file:
{ "mcpServers": { "web-crawler": { "command": "node", "args": ["/path/to/web-crawler/build/index.js"], "env": { "CRAWL_LINKS": "false", "MAX_DEPTH": "3", "REQUEST_DELAY": "1000", "TIMEOUT": "5000", "MAX_CONCURRENT": "5" } } } }
The server provides a crawl
tool that can be accessed through MCP. Example usage:
{ "url": "https://example.com", "depth": 1 }
Environment Variable | Default | Description |
---|---|---|
CRAWL_LINKS | false | Whether to follow links |
MAX_DEPTH | 3 | Maximum crawl depth |
REQUEST_DELAY | 1000 | Delay between requests (ms) |
TIMEOUT | 5000 | Request timeout (ms) |
MAX_CONCURRENT | 5 | Maximum concurrent requests |