
智能抓取
STDIO高级网页内容抓取与转换工具
高级网页内容抓取与转换工具
A powerful Model Context Protocol (MCP) server that intelligently fetches and processes web content with nested URL crawling capabilities. Transform any documentation site or web resource into clean, structured markdown files perfect for AI consumption and analysis.
<main>
, <article>
, .content
)fetch_website_nested
Comprehensive web crawling with nested URL processing.
Parameters:
url
(required): Starting URL to crawlmaxDepth
(optional, default: 2): Maximum crawl depthmaxPages
(optional, default: 50): Maximum pages to processsameDomainOnly
(optional, default: true): Restrict to same domainexcludePatterns
(optional): Array of regex patterns to excludeincludePatterns
(optional): Array of regex patterns to includetimeout
(optional, default: 10000): Request timeout in millisecondsfetch_website_single
Simple single-page content extraction.
Parameters:
url
(required): URL to fetchtimeout
(optional, default: 10000): Request timeout in millisecondsTo install Better Fetch for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @flutterninja9/better-fetch --client claude
git clone https://github.com/yourusername/better-fetch.git cd better-fetch npm install
npm run build
# Quick test npm run dev # Or run comprehensive tests node test-mcp.js
Add to your claude_desktop_config.json
:
{ "mcpServers": { "better-fetch": { "command": "node", "args": ["/absolute/path/to/better-fetch/dist/server.js"], "env": { "NODE_ENV": "production" } } } }
{ "better-fetch": { "command": "node", "args": ["/Users/yourusername/better-fetch/dist/server.js"] } }
{ "name": "better-fetch", "command": "node", "args": ["/path/to/better-fetch/dist/server.js"], "stdio": true }
Fetch all the web contents from this Flutter Shadcn UI documentation site:
https://flutter-shadcn-ui.mariuti.com/
Use nested fetching with a maximum depth of 3 levels and process up to 100 pages.
Fetch content from the React documentation but exclude any URLs containing 'api' or 'reference' and only process pages containing 'tutorial' or 'guide':
URL: https://react.dev
Max Depth: 2
Exclude Patterns: ["/api/", "/reference/"]
Include Patterns: ["/tutorial/", "/guide/"]
Max Pages: 30
Extract the content from this specific page only:
https://nextjs.org/docs/getting-started/installation
Use single page mode to avoid crawling related links.
The server generates comprehensive markdown files with the following structure:
# Site Name Documentation *Scraped from: https://example.com* *Generated on: 2024-01-15T10:30:00.000Z* ## Table of Contents - [Getting Started](#getting-started) - [Installation](#installation) - [Quick Start](#quick-start) - [API Reference](#api-reference) - [Core Functions](#core-functions) --- ## Getting Started *Source: [https://example.com/getting-started](https://example.com/getting-started)* [Clean markdown content here...] --- ## Installation *Source: [https://example.com/installation](https://example.com/installation)* [Installation instructions in markdown...]
For a complete example, refer to output.md
which demonstrates the server's output when processing a real documentation site.
better-fetch/
├── src/
│ └── server.ts # Main server implementation
├── dist/ # Compiled JavaScript
├── test-mcp.js # Testing utilities
├── output.md # Sample output file
├── package.json
├── tsconfig.json
└── README.md
npm run dev # Run in development mode with hot reload npm run build # Compile TypeScript to JavaScript npm run start # Run the compiled server npm run clean # Clean dist directory npm test # Run test suite
# Interactive testing node interactive-test.js # Automated test suite node test-mcp.js # Manual JSON-RPC testing echo '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | node dist/index.js
maxPages
limits for large sitesincludePatterns
to focus on relevant contentsameDomainOnly
to avoid external link crawlingtimeout
based on target site response timesWe welcome contributions! Please see our Contributing Guide for details.
git checkout -b feature/amazing-feature
git commit -m 'Add amazing feature'
git push origin feature/amazing-feature
This project is licensed under the MIT License - see the LICENSE file for details.
Made with ❤️ for the AI and developer community