icon for mcp server

Pandas

STDIO

支持LLM执行pandas代码的数据分析可视化服务器

Pandas-MCP Server

License: MIT Python 3.8+ Code Style: Black GitHub stars

🚀 Powerful tool for AI-powered data analysis - Through MCP protocol, enables LLMs to safely and efficiently execute pandas code and generate visualizations

GitHub stars

If you find this project helpful, please consider giving it a ⭐️ star!


A comprehensive Model Context Protocol (MCP) server that enables LLMs to execute pandas code through a standardized workflow for data analysis and visualization.

✨ Key Features

  • 🔒 Secure Execution Environment - Sandboxed code execution prevents malicious operations and protects system security
  • 📊 Intelligent Data Analysis - Automatically extracts file metadata, understands data structure, and provides intelligent analysis suggestions
  • 🎨 Interactive Visualizations - One-click generation of various interactive charts with real-time parameter adjustment
  • 🧠 Memory Optimization - Intelligent memory management supports large file processing with automatic data type optimization
  • 🔧 Easy Integration - Simple configuration for seamless integration with AI assistants like Claude Desktop
  • 📝 CLI Support - Provides command-line interface for convenient testing and development

🎯 MCP Server Overview

The Pandas-MCP Server is designed as a Model Context Protocol (MCP) server that provides LLMs with powerful data processing capabilities. MCP is a standardized protocol that allows AI models to interact with external tools and services in a secure, structured way.

🛠️ Installation

Prerequisites

  • Python 3.8+
  • pip package manager
  • Git (for cloning the repository)

Step 1: Clone the Repository

git clone https://github.com/marlonluo2018/pandas-mcp-server.git cd pandas-mcp-server

Step 2: Install Dependencies

pip install -r requirements.txt

Step 3: Verify Installation

# Test the CLI interface python cli.py # Or test the MCP server directly python server.py

Dependencies

  • pandas>=2.0.0 - Data manipulation and analysis
  • fastmcp>=1.0.0 - MCP server framework
  • chardet>=5.0.0 - Character encoding detection
  • psutil - System monitoring for memory optimization

Claude Desktop Configuration

Add this configuration to your Claude Desktop settings:

{ "mcpServers": { "pandas-server": { "type": "stdio", "command": "python", "args": ["/path/to/your/pandas-mcp-server/server.py"] } } }

Note: Replace /path/to/your/pandas-mcp-server/server.py with the actual path where you cloned the repository.

Example paths:

  • Windows: "C:\\Users\\YourName\\pandas-mcp-server\\server.py"
  • macOS/Linux: "/home/username/pandas-mcp-server/server.py"

Configuration File Location

  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json

Verification

After configuration, restart Claude Desktop. The server should appear in the MCP tools list with four available tools:

  • read_metadata_tool - File analysis
  • interpret_column_data - Column value interpretation
  • run_pandas_code_tool - Code execution
  • generate_chartjs_tool - Chart generation

🔄 Workflow

The pandas MCP server follows a structured workflow for data analysis and visualization:

Step 1: Read File Metadata

LLM calls read_metadata_tool to understand the file structure:

  • Extract file type, size, encoding, and column information
  • Get data types, sample values, and statistical summaries
  • Receive data quality warnings and suggested operations
  • Understand the dataset structure before processing

Step 2: Interpret Column Values (Optional)

LLM calls interpret_column_data to understand specific columns:

  • Extract all unique values from important columns
  • Identify patterns in categorical data
  • Understand the meaning behind codes or abbreviations
  • Key Purpose: Complement metadata by providing deep understanding of column values, which helps LLM generate more accurate and effective pandas code in the next step, especially when working with multiple CSV files

When to Use interpret_column_data:

  • High Value: Categorical fields with limited unique values (Region, Status, Category)
  • High Value: Code fields that need interpretation (StatusCode "A", "B", "C")
  • High Value: Fields with abbreviations or cryptic values
  • Low Value: ID fields (usually unique values with no patterns)
  • Low Value: Email fields (typically unique identifiers)
  • Low Value: Numeric percentage fields (already self-explanatory)
  • Conditional: Time fields (useful for non-standard formats or categorical time)

Step 3: Execute Pandas Operations

LLM calls run_pandas_code_tool based on metadata and column analysis:

  • Formulate pandas operations using the understood file structure
  • Execute data processing, filtering, aggregation, or analysis
  • Receive results in DataFrame, Series, or dictionary format
  • Get optimized output with memory management

Step 4: Generate Visualizations

LLM calls generate_chartjs_tool to create interactive charts:

  • Transform processed data into Chart.js compatible format
  • Generate interactive HTML charts with customization controls
  • Create bar, line, or pie charts based on data characteristics
  • Output responsive visualizations for analysis presentation

How interpret_column_data Complements read_metadata

The interpret_column_data function is designed to complement the read_metadata_tool by providing deeper insights into column values:

  • read_metadata_tool: Focuses on file structure, data types, and statistical summaries

    • Provides high-level understanding of the dataset
    • Offers sample values and basic statistics
    • Helps LLM understand the overall data architecture
  • interpret_column_data: Focuses on detailed value analysis of specific columns

    • Reveals all unique values and their frequencies
    • Helps LLM understand categorical data patterns
    • Enables more precise filtering and grouping operations
    • Particularly valuable when working with multiple CSV files where consistent value understanding across datasets is crucial

This two-step approach ensures that LLM has both structural and value-level understanding before generating pandas code, resulting in more accurate and effective data analysis operations, especially when processing multiple CSV files.

🚀 MCP Server Tools

The server exposes four main tools for LLM integration:

1. read_metadata_tool - File Analysis

Extract comprehensive metadata from Excel and CSV files including:

  • File type, size, encoding, and structure
  • Column names, data types, and sample values
  • Statistical summaries (null counts, unique values, min/max/mean)
  • Data quality warnings and suggested operations
  • Memory-optimized processing for large files

Purpose: Provides LLM with a high-level understanding of the data structure and characteristics, serving as the foundation for data analysis.

MCP Tool Usage:

{ "tool": "read_metadata_tool", "args": { "file_path": "/path/to/sales_data.xlsx" } }

2. interpret_column_data - Column Value Interpretation

Interpret specific columns to understand their value patterns:

  • Extract all unique values with their counts from specified columns
  • Support for single or multiple column interpretation
  • Automatic pattern recognition for common data types
  • Complete value distribution without sampling

Purpose: Complements read_metadata_tool by providing deep insights into column values, enabling LLM to generate more precise filtering, grouping, and analysis operations, especially when working with multiple CSV files that require consistent value understanding across datasets.

Best Use Cases:

  • Most Valuable: Categorical fields with limited unique values (Region, Status, Category)
  • Most Valuable: Code/abbreviation fields that need interpretation (StatusCode "A", "B", "C")
  • Less Valuable: ID fields, email fields, or numeric percentage fields
  • Context-Dependent: Time fields (useful for non-standard formats)

Response Format

The function returns a structured response with the following format:

{ "columns_interpretation": [ { "column_name": "Region", "data_type": "object", "total_values": 1000, "null_count": 5, "unique_count": 4, "unique_values_with_counts": [ ["North", 350], ["South", 280], ["East", 220], ["West", 145] ] } ] }

Key Features

  • Complete Value Distribution: Returns all unique values with their exact counts
  • Sorted by Frequency: Values are sorted in descending order of occurrence
  • Data Type Analysis: Identifies the underlying data type (object, int64, etc.)
  • Quality Metrics: Provides null count and total values for data quality assessment
  • Multi-column Support: Can analyze multiple columns in a single request

MCP Tool Usage:

{ "tool": "interpret_column_data", "args": { "file_path": "/path/to/sales_data.csv", "column_names": ["Region", "Status"] } }

3. run_pandas_code_tool - Secure Code Execution

Execute pandas operations with:

  • Security filtering against malicious code
  • Memory optimization for large datasets
  • Comprehensive error handling and debugging
  • Support for DataFrame, Series, and dictionary results

Purpose: Leverages insights from both read_metadata_tool and interpret_column_data to execute precise data analysis operations, particularly valuable when processing multiple CSV files with consistent value patterns.

Forbidden Operations

The following operations are blocked for security reasons:

  • System Access: os., sys., subprocess. - Prevents file system and system access
  • Code Execution: open(), exec(), eval() - Blocks dynamic code execution
  • Dangerous Imports: import os, import sys - Prevents specific harmful imports
  • Browser/DOM Access: document., window., XMLHttpRequest - Blocks browser operations
  • JavaScript/Remote: fetch(), eval(), Function() - Prevents remote code execution
  • Script Injection: script, javascript: - Blocks script injection attempts

Requirements:

  • Final result must be assigned to result variable
  • Code should include necessary imports (pandas available as pd)
  • All code goes through security filtering before execution

MCP Tool Usage:

{ "tool": "run_pandas_code_tool", "args": { "code": "import pandas as pd\ndf = pd.read_excel('/path/to/data.xlsx')\nresult = df.groupby('Region')['Sales'].sum()" } }

4. generate_chartjs_tool - Interactive Visualizations

Generate interactive charts with Chart.js:

  • Bar charts - For categorical comparisons
  • Line charts - For trend analysis
  • Pie charts - For proportional data
  • Interactive HTML templates with customization controls

Chart Output

  • File Format: All charts are generated as standalone HTML files
  • Save Location: Charts are saved in the ./charts/ directory by default
  • File Naming: Files are automatically named with timestamp and chart type (e.g., bar_chart_20250710_143022.html)
  • Accessibility: HTML files can be opened in any web browser and shared easily

MCP Tool Usage:

{ "tool": "generate_chartjs_tool", "args": { "data": { "columns": [ { "name": "Region", "type": "string", "examples": ["North", "South", "East", "West"] }, { "name": "Sales", "type": "number", "examples": [15000, 12000, 18000, 9000] } ] }, "chart_types": ["bar"], "title": "Sales by Region" } }

🚀 Usage

CLI Interface (Testing & Development)

The cli.py provides a convenient command-line interface for testing the MCP server functionality without requiring an MCP client:

Interactive Mode

python cli.py

Launches a guided menu system with:

  • Step-by-step workflow guidance
  • Automatic input validation
  • Clear error messages
  • Support for file paths with spaces

Command-Line Mode

# Read metadata python cli.py metadata data.xlsx # Interpret column values (useful for multiple CSV files) python cli.py interpret data.csv --columns "Region,Status" # Execute pandas code python cli.py execute analysis.py # Generate charts python cli.py chart data.json --type bar --title "Sales Analysis"

Chart Output Information

When generating charts using the CLI:

  • Output Format: Charts are saved as interactive HTML files
  • Default Location: All charts are saved in the ./charts/ directory
  • File Naming: Automatic naming with timestamp and chart type
  • Viewing Charts: Open the HTML files in any web browser to view interactive visualizations
  • Sharing: HTML files can be easily shared with others

🔍 Code Logic & Architecture

Core Components

1. Server Architecture (server.py)

  • FastMCP Integration: Uses FastMCP framework for MCP protocol implementation
  • Logging System: Unified logging with rotation and memory tracking
  • Tool Registration: Exposes four main tools with proper error handling
  • Memory Monitoring: Tracks memory usage before/after operations

2. Metadata Processing (core/metadata.py)

Key Logic:

  • File validation (existence, size limits)
  • Encoding detection for CSV files
  • Memory-optimized data processing (100-row samples)
  • Comprehensive statistical analysis
  • Data quality assessment and warnings

Memory Optimization:

  • Uses category dtype for string columns with low cardinality
  • Converts float64 to float32 for memory efficiency
  • Processes only first 100 rows for metadata extraction
  • Forces garbage collection after processing

3. Code Execution (core/execution.py)

Security Features:

  • Blacklist filtering for dangerous operations
  • Sandboxed execution environment
  • Output capture and error handling
  • Memory monitoring for large results

Execution Flow:

  1. Security check against BLACKLIST patterns
  2. Syntax validation through compilation
  3. Code execution in isolated environment
  4. Result formatting and memory optimization
  5. Output capture and error reporting

4. Chart Generation (core/visualization.py)

Architecture:

  • Template-based HTML generation
  • Chart.js integration via CDN
  • Interactive controls for customization
  • Automatic file naming and organization

Chart Types:

  • Bar Charts: Categorical data with bar width and Y-axis controls
  • Line Charts: Trend analysis with line styling options
  • Pie Charts: Proportional data with donut hole and percentage display

5. Column Interpretation (core/column_interpretation.py)

Functionality:

  • Complete value distribution analysis for specified columns
  • Unique value extraction with exact counts
  • Data type identification and quality metrics
  • Multi-column processing in a single request
  • Particularly valuable when working with multiple CSV files to ensure consistent value understanding across datasets

Key Features:

  • Value frequency sorting (descending order)
  • Null value detection and reporting
  • Memory-optimized processing for large datasets
  • Support for categorical and numerical data
  • Enables consistent data analysis across multiple CSV files

6. Chart Generators (core/chart_generators/)

Base Class (base.py):

  • Abstract base class for all chart generators
  • Template management and file I/O
  • Common chart configuration

Specific Generators:

  • BarChartGenerator: Bar charts with interactive controls
  • LineChartGenerator: Line charts with tension and styling
  • PieChartGenerator: Pie charts with legend and percentage options

Data Flow Architecture

User Input → Security Check → Processing → Result → Output
    ↓              ↓            ↓         ↓         ↓
  CLI/MCP → BLACKLIST → Memory Opt → Format → Log/Display

Memory Management Strategy

  1. Chunked Processing: Large files processed in 10KB chunks
  2. Type Optimization: Automatic dtype conversion (float64→float32, object→category)
  3. Limited Sampling: Only first 100 rows processed for metadata
  4. Garbage Collection: Forced cleanup after major operations
  5. Memory Monitoring: PSutil integration for tracking usage

📁 Project Structure

pandas-mcp-server/
├── server.py                 # MCP server implementation
├── cli.py                    # CLI interface for testing
├── requirements.txt          # Python dependencies
├── core/                     # Core functionality
│   ├── config.py            # Configuration and constants
│   ├── data_types.py        # Data type utilities
│   ├── metadata.py          # File metadata extraction
│   ├── column_interpretation.py   # Column value analysis
│   ├── execution.py         # Pandas code execution
│   ├── visualization.py     # Chart generation orchestration
│   └── chart_generators/    # Chart-specific implementations
│       ├── __init__.py
│       ├── base.py          # Base chart generator
│       ├── bar.py           # Bar chart generator
│       ├── line.py          # Line chart generator
│       └── pie.py           # Pie chart generator
│       └── templates/       # HTML templates for charts
├── charts/                  # Generated chart files
├── logs/                    # Application logs
├── csv_metadata_format.md   # CSV metadata documentation
└── test_*.py               # Test files

🔧 Configuration

Core Configuration (core/config.py)

  • MAX_FILE_SIZE: 100MB file size limit
  • BLACKLIST: Security restrictions for code execution
  • CHARTS_DIR: Directory for generated charts
  • Logging: Comprehensive logging with rotation

Security Features

  • Code execution sandboxing
  • Blacklisted operations (file system, network, eval)
  • Memory usage monitoring
  • Input validation and sanitization

📊 Chart Generation Details

HTML Chart Files

All charts generated by the Pandas-MCP Server are saved as standalone HTML files with the following characteristics:

File Structure

  • Self-contained: Each HTML file includes all necessary CSS and JavaScript
  • Interactive: Charts include controls for customization (zoom, filter, etc.)
  • Responsive: Charts adapt to different screen sizes
  • No Dependencies: HTML files work offline without internet connection

Save Location

  • Default Directory: ./charts/ (created automatically if it doesn't exist)
  • Configuration: Can be changed in core/config.py by modifying CHARTS_DIR
  • File Naming Pattern: {chart_type}_chart_{timestamp}.html
    • Example: bar_chart_20250710_143022.html
    • Example: line_chart_20250710_143547.html

Usage

  1. Viewing: Simply double-click the HTML file to open it in a web browser
  2. Sharing: Send the HTML file to others - no additional software needed
  3. Embedding: HTML files can be embedded in web pages or iframes
  4. Printing: Use browser's print function to save charts as PDF

Template System

Charts are generated using HTML templates with:

  • Chart.js integration via CDN
  • Interactive controls for customization
  • Responsive design with mobile support
  • Real-time parameter adjustment

Chart Types

Bar Charts

  • Interactive controls for bar width and Y-axis scaling
  • Responsive design with zoom capabilities
  • Data labels and tooltips
  • Multiple dataset support

Line Charts

  • Multiple line series support
  • Adjustable line tension and styling
  • Point size and style customization
  • Stepped line options

Pie Charts

  • Interactive donut hole adjustment
  • Percentage/value toggle display
  • Legend positioning and styling
  • Border width and color controls

🧪 Testing

Running Tests

# Test metadata extraction python test_metadata.py # Test pandas code execution python test_execution.py # Test chart generation python test_generate_barchart.py # Test all chart types python test_generate_pyecharts.py

Test Data Requirements

  • Excel files (.xlsx) with multiple sheets
  • CSV files with various encodings
  • JSON files with structured data for chart generation

📈 Performance Optimization

Memory Management

  • Chunked processing for large files
  • Automatic garbage collection
  • Memory usage logging
  • Dataset size limits

File Processing

  • Optimized dtype inference
  • Category encoding for string columns
  • Float32 precision for numeric data
  • Streaming CSV reading

🔍 Logging

Log Structure

  • mcp_server.log: Main application log
  • memory_usage: Memory consumption tracking
  • metadata: File processing details

Log Levels

  • DEBUG: Detailed processing information
  • INFO: General operation status
  • WARNING: Non-critical issues
  • ERROR: Processing failures

🐛 Troubleshooting

Common Issues

MCP Connection Issues

  • Verify server path in Claude Desktop configuration
  • Check Python environment and dependencies
  • Ensure server.py is executable
  • Review MCP server logs for connection errors

File Not Found

  • Verify file path is absolute
  • Check file permissions
  • Ensure file exists before processing

Memory Issues

  • Reduce file size or use chunked processing
  • Monitor memory usage in logs
  • Consider data sampling for large datasets

Chart Generation Errors

  • Verify data structure matches expected format
  • Check for required columns (string + numeric)
  • Ensure Chart.js CDN accessibility

Debug Mode

Enable debug logging by setting environment variable:

export LOG_LEVEL=DEBUG python server.py

📄 Additional Documentation

  • CSV Metadata Format: See csv_metadata_format.md for detailed CSV processing documentation
  • API Documentation: Check our API documentation for detailed usage instructions

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

Getting Help

Frequently Asked Questions

Check our FAQ page for answers to common questions.

Contact Information


🌟 Show Your Support

If you find this project useful, please consider giving it a ⭐️ star on GitHub! Your support helps us:

  • Increase project visibility
  • Attract more users
  • Motivate continued development
  • Build a stronger community

GitHub stars


📊 Project Statistics

GitHub issues GitHub forks GitHub stars


Thank you for your support! Please give us a ⭐️ Star if you find this project helpful!

🔝 Back to top

MCP Now 重磅来袭,抢先一步体验