Pandas-MCP Server

🚀 Powerful tool for AI-powered data analysis - Through MCP protocol, enables LLMs to safely and efficiently execute pandas code and generate visualizations

If you find this project helpful, please consider giving it a ⭐️ star!

A comprehensive Model Context Protocol (MCP) server that enables LLMs to execute pandas code through a standardized workflow for data analysis and visualization.

✨ Key Features

🔒 Secure Execution Environment - Sandboxed code execution prevents malicious operations and protects system security
📊 Intelligent Data Analysis - Automatically extracts file metadata, understands data structure, and provides intelligent analysis suggestions
🎨 Interactive Visualizations - One-click generation of various interactive charts with real-time parameter adjustment
🧠 Memory Optimization - Intelligent memory management supports large file processing with automatic data type optimization
🔧 Easy Integration - Simple configuration for seamless integration with AI assistants like Claude Desktop
📝 CLI Support - Provides command-line interface for convenient testing and development

🎯 MCP Server Overview

The Pandas-MCP Server is designed as a Model Context Protocol (MCP) server that provides LLMs with powerful data processing capabilities. MCP is a standardized protocol that allows AI models to interact with external tools and services in a secure, structured way.

🛠️ Installation

Prerequisites

Python 3.8+
pip package manager
Git (for cloning the repository)

Step 1: Clone the Repository

git clone https://github.com/marlonluo2018/pandas-mcp-server.git
cd pandas-mcp-server

Step 2: Install Dependencies

pip install -r requirements.txt

Step 3: Verify Installation

# Test the CLI interface
python cli.py

# Or test the MCP server directly
python server.py

Dependencies

pandas>=2.0.0 - Data manipulation and analysis
fastmcp>=1.0.0 - MCP server framework
chardet>=5.0.0 - Character encoding detection
psutil - System monitoring for memory optimization

Claude Desktop Configuration

Add this configuration to your Claude Desktop settings:

{
  "mcpServers": {
    "pandas-server": {
      "type": "stdio",
      "command": "python",
      "args": ["/path/to/your/pandas-mcp-server/server.py"]
    }
  }
}

Note: Replace /path/to/your/pandas-mcp-server/server.py with the actual path where you cloned the repository.

Example paths:

Windows: "C:\\Users\\YourName\\pandas-mcp-server\\server.py"
macOS/Linux: "/home/username/pandas-mcp-server/server.py"

Configuration File Location

Windows: %APPDATA%\Claude\claude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

Verification

After configuration, restart Claude Desktop. The server should appear in the MCP tools list with four available tools:

read_metadata_tool - File analysis
interpret_column_data - Column value interpretation
run_pandas_code_tool - Code execution
generate_chartjs_tool - Chart generation

🔄 Workflow

The pandas MCP server follows a structured workflow for data analysis and visualization:

Step 1: Read File Metadata

LLM calls read_metadata_tool to understand the file structure:

Extract file type, size, encoding, and column information
Get data types, sample values, and statistical summaries
Receive data quality warnings and suggested operations
Understand the dataset structure before processing

Step 2: Interpret Column Values (Optional)

LLM calls interpret_column_data to understand specific columns:

Extract all unique values from important columns
Identify patterns in categorical data
Understand the meaning behind codes or abbreviations
Key Purpose: Complement metadata by providing deep understanding of column values, which helps LLM generate more accurate and effective pandas code in the next step, especially when working with multiple CSV files

When to Use interpret_column_data:

High Value: Categorical fields with limited unique values (Region, Status, Category)
High Value: Code fields that need interpretation (StatusCode "A", "B", "C")
High Value: Fields with abbreviations or cryptic values
Low Value: ID fields (usually unique values with no patterns)
Low Value: Email fields (typically unique identifiers)
Low Value: Numeric percentage fields (already self-explanatory)
Conditional: Time fields (useful for non-standard formats or categorical time)

Step 3: Execute Pandas Operations

LLM calls run_pandas_code_tool based on metadata and column analysis:

Formulate pandas operations using the understood file structure
Execute data processing, filtering, aggregation, or analysis
Receive results in DataFrame, Series, or dictionary format
Get optimized output with memory management

Step 4: Generate Visualizations

LLM calls generate_chartjs_tool to create interactive charts:

Transform processed data into Chart.js compatible format
Generate interactive HTML charts with customization controls
Create bar, line, or pie charts based on data characteristics
Output responsive visualizations for analysis presentation

How `interpret_column_data` Complements `read_metadata`

The interpret_column_data function is designed to complement the read_metadata_tool by providing deeper insights into column values:

read_metadata_tool: Focuses on file structure, data types, and statistical summaries
- Provides high-level understanding of the dataset
- Offers sample values and basic statistics
- Helps LLM understand the overall data architecture
interpret_column_data: Focuses on detailed value analysis of specific columns
- Reveals all unique values and their frequencies
- Helps LLM understand categorical data patterns
- Enables more precise filtering and grouping operations
- Particularly valuable when working with multiple CSV files where consistent value understanding across datasets is crucial

This two-step approach ensures that LLM has both structural and value-level understanding before generating pandas code, resulting in more accurate and effective data analysis operations, especially when processing multiple CSV files.

🚀 MCP Server Tools

The server exposes four main tools for LLM integration:

1. `read_metadata_tool` - File Analysis

Extract comprehensive metadata from Excel and CSV files including:

File type, size, encoding, and structure
Column names, data types, and sample values
Statistical summaries (null counts, unique values, min/max/mean)
Data quality warnings and suggested operations
Memory-optimized processing for large files

Purpose: Provides LLM with a high-level understanding of the data structure and characteristics, serving as the foundation for data analysis.

MCP Tool Usage:

{
  "tool": "read_metadata_tool",
  "args": {
    "file_path": "/path/to/sales_data.xlsx"
  }
}

2. `interpret_column_data` - Column Value Interpretation

Interpret specific columns to understand their value patterns:

Extract all unique values with their counts from specified columns
Support for single or multiple column interpretation
Automatic pattern recognition for common data types
Complete value distribution without sampling

Purpose: Complements read_metadata_tool by providing deep insights into column values, enabling LLM to generate more precise filtering, grouping, and analysis operations, especially when working with multiple CSV files that require consistent value understanding across datasets.

Best Use Cases:

Most Valuable: Categorical fields with limited unique values (Region, Status, Category)
Most Valuable: Code/abbreviation fields that need interpretation (StatusCode "A", "B", "C")
Less Valuable: ID fields, email fields, or numeric percentage fields
Context-Dependent: Time fields (useful for non-standard formats)

Response Format

The function returns a structured response with the following format:

{
  "columns_interpretation": [
    {
      "column_name": "Region",
      "data_type": "object",
      "total_values": 1000,
      "null_count": 5,
      "unique_count": 4,
      "unique_values_with_counts": [
        ["North", 350],
        ["South", 280],
        ["East", 220],
        ["West", 145]
      ]
    }
  ]
}

Key Features

Complete Value Distribution: Returns all unique values with their exact counts
Sorted by Frequency: Values are sorted in descending order of occurrence
Data Type Analysis: Identifies the underlying data type (object, int64, etc.)
Quality Metrics: Provides null count and total values for data quality assessment
Multi-column Support: Can analyze multiple columns in a single request

MCP Tool Usage:

{
  "tool": "interpret_column_data",
  "args": {
    "file_path": "/path/to/sales_data.csv",
    "column_names": ["Region", "Status"]
  }
}

3. `run_pandas_code_tool` - Secure Code Execution

Execute pandas operations with:

Security filtering against malicious code
Memory optimization for large datasets
Comprehensive error handling and debugging
Support for DataFrame, Series, and dictionary results

Purpose: Leverages insights from both read_metadata_tool and interpret_column_data to execute precise data analysis operations, particularly valuable when processing multiple CSV files with consistent value patterns.

Forbidden Operations

The following operations are blocked for security reasons:

System Access: os., sys., subprocess. - Prevents file system and system access
Code Execution: open(), exec(), eval() - Blocks dynamic code execution
Dangerous Imports: import os, import sys - Prevents specific harmful imports
Browser/DOM Access: document., window., XMLHttpRequest - Blocks browser operations
JavaScript/Remote: fetch(), eval(), Function() - Prevents remote code execution
Script Injection: script, javascript: - Blocks script injection attempts

Requirements:

Final result must be assigned to result variable
Code should include necessary imports (pandas available as pd)
All code goes through security filtering before execution

MCP Tool Usage:

{
  "tool": "run_pandas_code_tool",
  "args": {
    "code": "import pandas as pd\ndf = pd.read_excel('/path/to/data.xlsx')\nresult = df.groupby('Region')['Sales'].sum()"
  }
}

4. `generate_chartjs_tool` - Interactive Visualizations

Generate interactive charts with Chart.js:

Bar charts - For categorical comparisons
Line charts - For trend analysis
Pie charts - For proportional data
Interactive HTML templates with customization controls

Chart Output

File Format: All charts are generated as standalone HTML files
Save Location: Charts are saved in the ./charts/ directory by default
File Naming: Files are automatically named with timestamp and chart type (e.g., bar_chart_20250710_143022.html)
Accessibility: HTML files can be opened in any web browser and shared easily

MCP Tool Usage:

{
  "tool": "generate_chartjs_tool",
  "args": {
    "data": {
      "columns": [
        {
          "name": "Region",
          "type": "string",
          "examples": ["North", "South", "East", "West"]
        },
        {
          "name": "Sales",
          "type": "number",
          "examples": [15000, 12000, 18000, 9000]
        }
      ]
    },
    "chart_types": ["bar"],
    "title": "Sales by Region"
  }
}

🚀 Usage

CLI Interface (Testing & Development)

The cli.py provides a convenient command-line interface for testing the MCP server functionality without requiring an MCP client:

Interactive Mode

python cli.py

Launches a guided menu system with:

Step-by-step workflow guidance
Automatic input validation
Clear error messages
Support for file paths with spaces

Command-Line Mode

# Read metadata
python cli.py metadata data.xlsx

# Interpret column values (useful for multiple CSV files)
python cli.py interpret data.csv --columns "Region,Status"

# Execute pandas code
python cli.py execute analysis.py

# Generate charts
python cli.py chart data.json --type bar --title "Sales Analysis"

Chart Output Information

When generating charts using the CLI:

Output Format: Charts are saved as interactive HTML files
Default Location: All charts are saved in the ./charts/ directory
File Naming: Automatic naming with timestamp and chart type
Viewing Charts: Open the HTML files in any web browser to view interactive visualizations
Sharing: HTML files can be easily shared with others

🔍 Code Logic & Architecture

Core Components

1. Server Architecture (`server.py`)

FastMCP Integration: Uses FastMCP framework for MCP protocol implementation
Logging System: Unified logging with rotation and memory tracking
Tool Registration: Exposes four main tools with proper error handling
Memory Monitoring: Tracks memory usage before/after operations

2. Metadata Processing (`core/metadata.py`)

Key Logic:

File validation (existence, size limits)
Encoding detection for CSV files
Memory-optimized data processing (100-row samples)
Comprehensive statistical analysis
Data quality assessment and warnings

Memory Optimization:

Uses category dtype for string columns with low cardinality
Converts float64 to float32 for memory efficiency
Processes only first 100 rows for metadata extraction
Forces garbage collection after processing

3. Code Execution (`core/execution.py`)

Security Features:

Blacklist filtering for dangerous operations
Sandboxed execution environment
Output capture and error handling
Memory monitoring for large results

Execution Flow:

Security check against BLACKLIST patterns
Syntax validation through compilation
Code execution in isolated environment
Result formatting and memory optimization
Output capture and error reporting

4. Chart Generation (`core/visualization.py`)

Architecture:

Template-based HTML generation
Chart.js integration via CDN
Interactive controls for customization
Automatic file naming and organization

Chart Types:

Bar Charts: Categorical data with bar width and Y-axis controls
Line Charts: Trend analysis with line styling options
Pie Charts: Proportional data with donut hole and percentage display

5. Column Interpretation (`core/column_interpretation.py`)

Functionality:

Complete value distribution analysis for specified columns
Unique value extraction with exact counts
Data type identification and quality metrics
Multi-column processing in a single request
Particularly valuable when working with multiple CSV files to ensure consistent value understanding across datasets

Key Features:

Value frequency sorting (descending order)
Null value detection and reporting
Memory-optimized processing for large datasets
Support for categorical and numerical data
Enables consistent data analysis across multiple CSV files

6. Chart Generators (`core/chart_generators/`)

Base Class (base.py):

Abstract base class for all chart generators
Template management and file I/O
Common chart configuration

Specific Generators:

BarChartGenerator: Bar charts with interactive controls
LineChartGenerator: Line charts with tension and styling
PieChartGenerator: Pie charts with legend and percentage options

Data Flow Architecture

User Input → Security Check → Processing → Result → Output
    ↓              ↓            ↓         ↓         ↓
  CLI/MCP → BLACKLIST → Memory Opt → Format → Log/Display

Memory Management Strategy

Chunked Processing: Large files processed in 10KB chunks
Type Optimization: Automatic dtype conversion (float64→float32, object→category)
Limited Sampling: Only first 100 rows processed for metadata
Garbage Collection: Forced cleanup after major operations
Memory Monitoring: PSutil integration for tracking usage

📁 Project Structure

pandas-mcp-server/
├── server.py                 # MCP server implementation
├── cli.py                    # CLI interface for testing
├── requirements.txt          # Python dependencies
├── core/                     # Core functionality
│   ├── config.py            # Configuration and constants
│   ├── data_types.py        # Data type utilities
│   ├── metadata.py          # File metadata extraction
│   ├── column_interpretation.py   # Column value analysis
│   ├── execution.py         # Pandas code execution
│   ├── visualization.py     # Chart generation orchestration
│   └── chart_generators/    # Chart-specific implementations
│       ├── __init__.py
│       ├── base.py          # Base chart generator
│       ├── bar.py           # Bar chart generator
│       ├── line.py          # Line chart generator
│       └── pie.py           # Pie chart generator
│       └── templates/       # HTML templates for charts
├── charts/                  # Generated chart files
├── logs/                    # Application logs
├── csv_metadata_format.md   # CSV metadata documentation
└── test_*.py               # Test files

🔧 Configuration

Core Configuration (`core/config.py`)

MAX_FILE_SIZE: 100MB file size limit
BLACKLIST: Security restrictions for code execution
CHARTS_DIR: Directory for generated charts
Logging: Comprehensive logging with rotation

Security Features

Code execution sandboxing
Blacklisted operations (file system, network, eval)
Memory usage monitoring
Input validation and sanitization

📊 Chart Generation Details

HTML Chart Files

All charts generated by the Pandas-MCP Server are saved as standalone HTML files with the following characteristics:

File Structure

Self-contained: Each HTML file includes all necessary CSS and JavaScript
Interactive: Charts include controls for customization (zoom, filter, etc.)
Responsive: Charts adapt to different screen sizes
No Dependencies: HTML files work offline without internet connection

Save Location

Default Directory: ./charts/ (created automatically if it doesn't exist)
Configuration: Can be changed in core/config.py by modifying CHARTS_DIR
File Naming Pattern: {chart_type}_chart_{timestamp}.html
- Example: bar_chart_20250710_143022.html
- Example: line_chart_20250710_143547.html

Usage

Viewing: Simply double-click the HTML file to open it in a web browser
Sharing: Send the HTML file to others - no additional software needed
Embedding: HTML files can be embedded in web pages or iframes
Printing: Use browser's print function to save charts as PDF

Template System

Charts are generated using HTML templates with:

Chart.js integration via CDN
Interactive controls for customization
Responsive design with mobile support
Real-time parameter adjustment

Chart Types

Bar Charts

Interactive controls for bar width and Y-axis scaling
Responsive design with zoom capabilities
Data labels and tooltips
Multiple dataset support

Line Charts

Multiple line series support
Adjustable line tension and styling
Point size and style customization
Stepped line options

Pie Charts

Interactive donut hole adjustment
Percentage/value toggle display
Legend positioning and styling
Border width and color controls

🧪 Testing

Running Tests

# Test metadata extraction
python test_metadata.py

# Test pandas code execution
python test_execution.py

# Test chart generation
python test_generate_barchart.py

# Test all chart types
python test_generate_pyecharts.py

Test Data Requirements

Excel files (.xlsx) with multiple sheets
CSV files with various encodings
JSON files with structured data for chart generation

📈 Performance Optimization

Memory Management

Chunked processing for large files
Automatic garbage collection
Memory usage logging
Dataset size limits

File Processing

Optimized dtype inference
Category encoding for string columns
Float32 precision for numeric data
Streaming CSV reading

🔍 Logging

Log Structure

mcp_server.log: Main application log
memory_usage: Memory consumption tracking
metadata: File processing details

Log Levels

DEBUG: Detailed processing information
INFO: General operation status
WARNING: Non-critical issues
ERROR: Processing failures

🐛 Troubleshooting

Common Issues

MCP Connection Issues

Verify server path in Claude Desktop configuration
Check Python environment and dependencies
Ensure server.py is executable
Review MCP server logs for connection errors

File Not Found

Verify file path is absolute
Check file permissions
Ensure file exists before processing

Memory Issues

Reduce file size or use chunked processing
Monitor memory usage in logs
Consider data sampling for large datasets

Chart Generation Errors

Verify data structure matches expected format
Check for required columns (string + numeric)
Ensure Chart.js CDN accessibility

Debug Mode

Enable debug logging by setting environment variable:

export LOG_LEVEL=DEBUG
python server.py

📄 Additional Documentation

CSV Metadata Format: See csv_metadata_format.md for detailed CSV processing documentation
API Documentation: Check our API documentation for detailed usage instructions

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

Getting Help

Report Issues: Submit an issue on GitHub
Discussions: Join our GitHub Discussions

Frequently Asked Questions

Check our FAQ page for answers to common questions.

Contact Information

Email: [email protected]

🌟 Show Your Support

If you find this project useful, please consider giving it a ⭐️ star on GitHub! Your support helps us:

Increase project visibility
Attract more users
Motivate continued development
Build a stronger community

📊 Project Statistics

Thank you for your support! Please give us a ⭐️ Star if you find this project helpful!

🔝 Back to top

Pandas

Pandas-MCP Server

✨ Key Features

🎯 MCP Server Overview

🛠️ Installation

Prerequisites

Step 1: Clone the Repository

Step 2: Install Dependencies

Step 3: Verify Installation

Dependencies

Claude Desktop Configuration

Configuration File Location

Verification

🔄 Workflow

Step 1: Read File Metadata

Step 2: Interpret Column Values (Optional)

Step 3: Execute Pandas Operations

Step 4: Generate Visualizations

How interpret_column_data Complements read_metadata

🚀 MCP Server Tools

1. read_metadata_tool - File Analysis

2. interpret_column_data - Column Value Interpretation

Response Format

Key Features

3. run_pandas_code_tool - Secure Code Execution

Forbidden Operations

4. generate_chartjs_tool - Interactive Visualizations

Chart Output

🚀 Usage

CLI Interface (Testing & Development)

Interactive Mode

Command-Line Mode

Chart Output Information

🔍 Code Logic & Architecture

Core Components

1. Server Architecture (server.py)

2. Metadata Processing (core/metadata.py)

3. Code Execution (core/execution.py)

4. Chart Generation (core/visualization.py)

5. Column Interpretation (core/column_interpretation.py)

6. Chart Generators (core/chart_generators/)

Data Flow Architecture

Memory Management Strategy

📁 Project Structure

🔧 Configuration

Core Configuration (core/config.py)

Security Features

📊 Chart Generation Details

HTML Chart Files

File Structure

Save Location

Usage

Template System

Chart Types

Bar Charts

Line Charts

Pie Charts

🧪 Testing

Running Tests

Test Data Requirements

📈 Performance Optimization

Memory Management

File Processing

🔍 Logging

Log Structure

Log Levels

🐛 Troubleshooting

Common Issues

MCP Connection Issues

File Not Found

Memory Issues

Chart Generation Errors

Debug Mode

📄 Additional Documentation

📄 License

🆘 Support

Getting Help

Frequently Asked Questions

Contact Information

🌟 Show Your Support

How `interpret_column_data` Complements `read_metadata`

1. `read_metadata_tool` - File Analysis

2. `interpret_column_data` - Column Value Interpretation

3. `run_pandas_code_tool` - Secure Code Execution

4. `generate_chartjs_tool` - Interactive Visualizations

1. Server Architecture (`server.py`)

2. Metadata Processing (`core/metadata.py`)

3. Code Execution (`core/execution.py`)

4. Chart Generation (`core/visualization.py`)

5. Column Interpretation (`core/column_interpretation.py`)

6. Chart Generators (`core/chart_generators/`)

Core Configuration (`core/config.py`)