Prysm网页抓取器
STDIO为AI助手提供高精度网页抓取功能
为AI助手提供高精度网页抓取功能
The Prysm MCP (Model Context Protocol) Server enables AI assistants like Claude and others to scrape web content with high accuracy and flexibility.
# Recommended: Install the LLM-optimized version npm install -g @pinkpixel/prysm-mcp # Or install the standard version npm install -g prysm-mcp # Or clone and build git clone https://github.com/pinkpixel-dev/prysm-mcp.git cd prysm-mcp npm install npm run build
We provide detailed integration guides for popular MCP-compatible applications:
There are multiple ways to set up Prysm MCP Server:
Create a mcp.json
file in the appropriate location according to the above guides.
{ "mcpServers": { "prysm-scraper": { "description": "Prysm web scraper with custom output directories", "command": "npx", "args": [ "-y", "@pinkpixel/prysm-mcp" ], "env": { "PRYSM_OUTPUT_DIR": "${workspaceFolder}/scrape_results", "PRYSM_IMAGE_OUTPUT_DIR": "${workspaceFolder}/scrape_results/images" } } } }
The server provides the following tools:
scrapeFocused
Fast web scraping optimized for speed (fewer scrolls, main content only).
Please scrape https://example.com using the focused mode
Available Parameters:
url
(required): URL to scrapemaxScrolls
(optional): Maximum number of scroll attempts (default: 5)scrollDelay
(optional): Delay between scrolls in ms (default: 1000)scrapeImages
(optional): Whether to include images in resultsdownloadImages
(optional): Whether to download images locallymaxImages
(optional): Maximum images to extractoutput
(optional): Output directory for downloaded imagesscrapeBalanced
Balanced web scraping approach with good coverage and reasonable speed.
Please scrape https://example.com using the balanced mode
Available Parameters:
scrapeFocused
with different defaultsmaxScrolls
default: 10scrollDelay
default: 2000timeout
parameter to limit total scraping time (default: 30000ms)scrapeDeep
Maximum extraction web scraping (slower but thorough).
Please scrape https://example.com using the deep mode with maximum scrolls
Available Parameters:
scrapeFocused
with different defaultsmaxScrolls
default: 20scrollDelay
default: 3000maxImages
default: 100formatResult
Format scraped data into different structured formats (markdown, HTML, JSON).
Format the scraped data as markdown
Available Parameters:
data
(required): The scraped data to formatformat
(required): Output format - "markdown", "html", or "json"includeImages
(optional): Whether to include images in output (default: true)output
(optional): File path to save the formatted resultYou can also save formatted results to a file by specifying an output path:
Format the scraped data as markdown and save it to "my-results/output.md"
By default, when saving formatted results, files will be saved to ~/prysm-mcp/output/
. You can customize this in two ways:
# Linux/macOS export PRYSM_OUTPUT_DIR="/path/to/custom/directory" export PRYSM_IMAGE_OUTPUT_DIR="/path/to/custom/image/directory" # Windows (Command Prompt) set PRYSM_OUTPUT_DIR=C:\path\to\custom\directory set PRYSM_IMAGE_OUTPUT_DIR=C:\path\to\custom\image\directory # Windows (PowerShell) $env:PRYSM_OUTPUT_DIR="C:\path\to\custom\directory" $env:PRYSM_IMAGE_OUTPUT_DIR="C:\path\to\custom\image\directory"
# For general results
Format the scraped data as markdown and save it to "/absolute/path/to/file.md"
# For image downloads when scraping
Please scrape https://example.com and download images to "/absolute/path/to/images"
.cursor/mcp.json
), you can set these environment variables:{ "mcpServers": { "prysm-scraper": { "command": "npx", "args": ["-y", "@pinkpixel/prysm-mcp"], "env": { "PRYSM_OUTPUT_DIR": "${workspaceFolder}/scrape_results", "PRYSM_IMAGE_OUTPUT_DIR": "${workspaceFolder}/scrape_results/images" } } } }
If PRYSM_IMAGE_OUTPUT_DIR
is not specified, it will default to a subfolder named images
inside the PRYSM_OUTPUT_DIR
.
If you provide only a relative path or filename, it will be saved relative to the configured output directory.
The formatResult
tool handles paths in the following ways:
/home/user/file.md
)subfolder/file.md
)output.md
)# Install dependencies npm install # Build the project npm run build # Run the server locally node bin/prysm-mcp # Debug MCP communication DEBUG=mcp:* node bin/prysm-mcp # Set custom output directories PRYSM_OUTPUT_DIR=./my-output PRYSM_IMAGE_OUTPUT_DIR=./my-output/images node bin/prysm-mcp
You can run the server directly with npx without installing:
# Run with default settings npx @pinkpixel/prysm-mcp # Run with custom output directories PRYSM_OUTPUT_DIR=./my-output PRYSM_IMAGE_OUTPUT_DIR=./my-output/images npx @pinkpixel/prysm-mcp
MIT
Developed by Pink Pixel
Powered by the Model Context Protocol and Puppeteer