AI Cursor Scraping Assistant
STDIOTool leveraging Cursor AI and MCP to generate web scrapers for various websites.
Tool leveraging Cursor AI and MCP to generate web scrapers for various websites.
A powerful tool that leverages Cursor AI and MCP (Model Context Protocol) to easily generate web scrapers for various types of websites. This project helps you quickly analyze websites and generate proper Scrapy or Camoufox scrapers with minimal effort.
This project contains two main components:
Clone this repository to your local machine:
git clone https://github.com/TheWebScrapingClub/AI-Cursor-Scraping-Assistant.git cd AI-Cursor-Scraping-Assistant
Install the required dependencies:
pip install mcp camoufox scrapy
If you plan to use Camoufox, you'll need to fetch its browser binary:
python -m camoufox fetch
The MCP server provides tools that help Cursor AI analyze web pages and generate XPath selectors. To start the MCP server:
Navigate to the MCPfiles directory:
cd MCPfiles
Update the CAMOUFOX_FILE_PATH
in xpath_server.py
to point to your local Camoufox_template.py
file.
Start the MCP server:
python xpath_server.py
In Cursor, connect to the MCP server by configuring it in the settings or using the MCP panel.
The cursor-rules directory contains rules that teach Cursor AI how to analyze websites and create different types of scrapers. These rules are automatically loaded when you open the project in Cursor.
The cursor-rules
directory contains a set of MDC (Markdown Configuration) files that guide Cursor's behavior when creating web scrapers:
prerequisites.mdc
This rule handles initial setup tasks before creating any scrapers:
pwd
website-analysis.mdc
This comprehensive rule guides Cursor through website analysis:
scrapy-step-by-step-process.mdc
This rule provides the execution flow for creating scrapers:
scrapy.mdc
This extensive rule contains Scrapy best practices:
scraper-models.mdc
This rule defines the different types of scrapers that can be created:
Here's how to use the AI-Cursor-Scraping-Assistant:
Write an e-commerce PLP scraper for the website gucci.com
Cursor will then:
You can request different types of scrapers:
For example:
Write an e-commerce PDP scraper for nike.com
The project includes a Camoufox template for creating stealth scrapers that can bypass certain anti-bot measures. The MCP tools help you:
You can extend the functionality by adding new scraper types to the cursor-rules files. The modular design allows for easy customization.
AI-Cursor-Scraping-Assistant/
├── MCPfiles/
│ ├── xpath_server.py # MCP server with web scraping tools
│ └── Camoufox_template.py # Template for Camoufox scrapers
├── cursor-rules/
│ ├── website-analysis.mdc # Rules for analyzing websites
│ ├── scrapy.mdc # Best practices for Scrapy
│ ├── scrapy-step-by-step-process.mdc # Guide for creating scrapers
│ ├── scraper-models.mdc # Templates for different scraper types
│ └── prerequisites.mdc # Setup requirements
└── README.md
The following features are planned for future development:
This project is based on articles from The Web Scraping Club:
For more information on web scraping techniques and best practices, visit The Web Scraping Club.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.