
Zen
STDIOMCP server enabling Claude to collaborate with Gemini, O3, and other AI models for enhanced development.
MCP server enabling Claude to collaborate with Gemini, O3, and other AI models for enhanced development.
https://github.com/user-attachments/assets/8097e18e-b926-4d8b-ba14-a979e4c58bda
The ultimate development partners for Claude - a Model Context Protocol server that gives Claude access to multiple AI models for enhanced code analysis, problem-solving, and collaborative development.
Features true AI orchestration with conversations that continue across tasks - Give Claude a complex task and let it orchestrate between models automatically. Claude stays in control, performs the actual work, but gets perspectives from the best AI for each subtask. Claude can switch between different tools and models mid-conversation, with context carrying forward seamlessly.
Example Workflow - Claude Code:
analyze
the code in question for a second opinionchatting
about its findingsprecommit
reviewAll within a single conversation thread! Gemini Pro in step 6 knows what was recommended by O3 in step 3! Taking that context and review into consideration to aid with its pre-commit review.
Think of it as Claude Code for Claude Code. This MCP isn't magic. It's just super-glue.
Getting Started
Tools Reference
Advanced Topics
Resources
Claude is brilliant, but sometimes you need:
chat
)thinkdeep
)codereview
)precommit
)debug
)This server orchestrates multiple AI models as your development team, with Claude automatically selecting the best model for each task or allowing you to choose specific models for different strengths.
Prompt Used:
Study the code properly, think deeply about what this does and then see if there's any room for improvement in
terms of performance optimizations, brainstorm with gemini on this to get feedback and then confirm any change by
first adding a unit test with `measure` and measuring current code and then implementing the optimization and
measuring again to ensure it improved, then share results. Check with gemini in between as you make tweaks.
The final implementation resulted in a 26% improvement in JSON parsing performance for the selected library, reducing processing time through targeted, collaborative optimizations guided by Gemini’s analysis and Claude’s refinement.
Option A: OpenRouter (Access multiple models with one API)
conf/openrouter_models.json
Option B: Native APIs
Note: Using both OpenRouter and native APIs creates ambiguity about which provider serves each model. If both are configured, native APIs will take priority for
gemini
ando3
.
# Clone to your preferred location git clone https://github.com/BeehiveInnovations/zen-mcp-server.git cd zen-mcp-server # One-command setup (includes Redis for AI conversations) ./setup-docker.sh
What this does:
$GEMINI_API_KEY
and $OPENAI_API_KEY
if set in environment)# Edit .env to add your API keys (if not already set in environment) nano .env # The file will contain, at least one should be set: # GEMINI_API_KEY=your-gemini-api-key-here # For Gemini models # OPENAI_API_KEY=your-openai-api-key-here # For O3 model # OPENROUTER_API_KEY=your-openrouter-key # For OpenRouter (see docs/openrouter.md) # WORKSPACE_ROOT=/Users/your-username (automatically configured) # Note: At least one API key is required
Run the following commands on the terminal to add the MCP directly to Claude Code
# Add the MCP server directly via Claude Code CLI claude mcp add zen -s user -- docker exec -i zen-mcp-server python server.py # List your MCP servers to verify claude mcp list # Remove when needed claude mcp remove zen -s user # You may need to remove an older version of this MCP after it was renamed: claude mcp remove gemini -s user
Now run claude
on the terminal for it to connect to the newly added mcp server. If you were already running a claude
code session,
please exit and start a new session.
This will open a folder revealing claude_desktop_config.json
.
The setup script shows you the exact configuration. It looks like this. When you ran setup-docker.sh
it should
have produced a configuration for you to copy:
{ "mcpServers": { "zen": { "command": "docker", "args": [ "exec", "-i", "zen-mcp-server", "python", "server.py" ] } } }
Paste the above into claude_desktop_config.json
. If you have several other MCP servers listed, simply add this below the rest after a ,
comma:
... other mcp servers ... , "zen": { "command": "docker", "args": [ "exec", "-i", "zen-mcp-server", "python", "server.py" ] }
Just ask Claude naturally:
thinkdeep
codereview
debug
analyze
Remember: Claude remains in control — but you are the true orchestrator.
You're the prompter, the guide, the puppeteer.
Your prompt decides when Claude brings in Gemini, Flash, O3 — or handles it solo.
Quick Tool Selection Guide:
chat
(brainstorm ideas, get second opinions, validate approaches)thinkdeep
(extends analysis, finds edge cases)codereview
(bugs, security, performance issues)precommit
(validate git changes before committing)debug
(root cause analysis, error tracing)analyze
(architecture, patterns, dependencies)get_version
(version and configuration details)Auto Mode: When DEFAULT_MODEL=auto
, Claude automatically picks the best model for each task. You can override with: "Use flash for quick analysis" or "Use o3 to debug this".
Model Selection Examples:
Pro Tip: Thinking modes (for Gemini models) control depth vs token cost. Use "minimal" or "low" for quick tasks, "high" or "max" for complex problems. Learn more
Tools Overview:
chat
- Collaborative thinking and development conversationsthinkdeep
- Extended reasoning and problem-solvingcodereview
- Professional code review with severity levelsprecommit
- Validate git changes before committingdebug
- Root cause analysis and debugginganalyze
- General-purpose file and code analysisget_version
- Get server version and configurationchat
- General Development Chat & Collaborative ThinkingYour thinking partner - bounce ideas, get second opinions, brainstorm collaboratively
Thinking Mode: Default is medium
(8,192 tokens). Use low
for quick questions to save tokens, or high
for complex discussions when thoroughness matters.
Chat with zen and pick the best model for this job. I need to pick between Redis and Memcached for session storage
and I need an expert opinion for the project I'm working on. Get a good idea of what the project does, pick one of the two options
and then debate with the other models to give me a final verdict
Key Features:
"Use gemini to explain this algorithm with context from algorithm.py"
thinkdeep
- Extended Reasoning PartnerGet a second opinion to augment Claude's own extended thinking
Thinking Mode: Default is high
(16,384 tokens) for deep analysis. Claude will automatically choose the best mode based on complexity - use low
for quick validations, medium
for standard problems, high
for complex issues (default), or max
for extremely complex challenges requiring deepest analysis.
Think deeper about my authentication design with pro using max thinking mode and brainstorm to come up
with the best architecture for my project
Key Features:
"Use gemini to think deeper about my API design with reference to api/routes.py"
codereview
- Professional Code ReviewComprehensive code analysis with prioritized feedback
Thinking Mode: Default is medium
(8,192 tokens). Use high
for security-critical code (worth the extra tokens) or low
for quick style checks (saves ~6k tokens).
Perform a codereview with gemini pro and review auth.py for security issues and potential vulnerabilities.
I need an actionable plan but break it down into smaller quick-wins that we can implement and test rapidly
Key Features:
"Use gemini to review src/ against PEP8 standards"
"Get gemini to review auth/ - only report critical vulnerabilities"
precommit
- Pre-Commit ValidationComprehensive review of staged/unstaged git changes across multiple repositories
Thinking Mode: Default is medium
(8,192 tokens). Use high
or max
for critical releases when thorough validation justifies the token cost.
Prompt Used:
Now use gemini and perform a review and precommit and ensure original requirements are met, no duplication of code or
logic, everything should work as expected
How beautiful is that? Claude used precommit
twice and codereview
once and actually found and fixed two critical errors before commit!
Use zen and perform a thorough precommit ensuring there aren't any new regressions or bugs introduced
Key Features:
Parameters:
path
: Starting directory to search for repos (default: current directory)original_request
: The requirements for contextcompare_to
: Compare against a branch/tag instead of local changesreview_type
: full|security|performance|quickseverity_filter
: Filter by issue severitymax_depth
: How deep to search for nested reposdebug
- Expert Debugging AssistantRoot cause analysis for complex problems
Thinking Mode: Default is medium
(8,192 tokens). Use high
for tricky bugs (investment in finding root cause) or low
for simple errors (save tokens).
Basic Usage:
"Use gemini to debug this TypeError: 'NoneType' object has no attribute 'split'"
"Get gemini to debug why my API returns 500 errors with the full stack trace: [paste traceback]"
Key Features:
analyze
- Smart File AnalysisGeneral-purpose code understanding and exploration
Thinking Mode: Default is medium
(8,192 tokens). Use high
for architecture analysis (comprehensive insights worth the cost) or low
for quick file overviews (save ~6k tokens).
Basic Usage:
"Use gemini to analyze main.py to understand how it works"
"Get gemini to do an architecture analysis of the src/ directory"
Key Features:
use_websearch
(default: true), the model can request Claude to perform web searches and share results back to enhance analysis with current documentation, design patterns, and best practicesget_version
- Server Information"Get zen to show its version"
All tools that work with files support both individual files and entire directories. The server automatically expands directories, filters for relevant code files, and manages token limits.
analyze
- Analyze files or directories
files
: List of file paths or directories (required)question
: What to analyze (required)model
: auto|pro|flash|o3|o3-mini (default: server default)analysis_type
: architecture|performance|security|quality|generaloutput_format
: summary|detailed|actionablethinking_mode
: minimal|low|medium|high|max (default: medium, Gemini only)use_websearch
: Enable web search for documentation and best practices - allows model to request Claude perform searches (default: true)"Analyze the src/ directory for architectural patterns" (auto mode picks best model)
"Use flash to quickly analyze main.py and tests/ to understand test coverage"
"Use o3 for logical analysis of the algorithm in backend/core.py"
"Use pro for deep analysis of the entire backend/ directory structure"
codereview
- Review code files or directories
files
: List of file paths or directories (required)model
: auto|pro|flash|o3|o3-mini (default: server default)review_type
: full|security|performance|quickfocus_on
: Specific aspects to focus onstandards
: Coding standards to enforceseverity_filter
: critical|high|medium|allthinking_mode
: minimal|low|medium|high|max (default: medium, Gemini only)"Review the entire api/ directory for security issues" (auto mode picks best model)
"Use pro to review auth/ for deep security analysis"
"Use o3 to review logic in algorithms/ for correctness"
"Use flash to quickly review src/ with focus on performance, only show critical issues"
debug
- Debug with file context
error_description
: Description of the issue (required)model
: auto|pro|flash|o3|o3-mini (default: server default)error_context
: Stack trace or logsfiles
: Files or directories related to the issueruntime_info
: Environment detailsprevious_attempts
: What you've triedthinking_mode
: minimal|low|medium|high|max (default: medium, Gemini only)use_websearch
: Enable web search for error messages and solutions - allows model to request Claude perform searches (default: true)"Debug this logic error with context from backend/" (auto mode picks best model)
"Use o3 to debug this algorithm correctness issue"
"Use pro to debug this complex architecture problem"
thinkdeep
- Extended analysis with file context
current_analysis
: Your current thinking (required)model
: auto|pro|flash|o3|o3-mini (default: server default)problem_context
: Additional contextfocus_areas
: Specific aspects to focus onfiles
: Files or directories for contextthinking_mode
: minimal|low|medium|high|max (default: max, Gemini only)use_websearch
: Enable web search for documentation and insights - allows model to request Claude perform searches (default: true)"Think deeper about my design with reference to src/models/" (auto mode picks best model)
"Use pro to think deeper about this architecture with extended thinking"
"Use o3 to think deeper about the logical flow in this algorithm"
Think hard about designing and developing a fun calculator app in swift. Review your design plans with o3, taking in
their suggestions but keep the feature-set realistic and doable without adding bloat. Begin implementing and in between
implementation, get a codereview done by Gemini Pro and chat with Flash if you need to for creative directions.
Implement a new screen where the locations taken from the database display on a map, with pins falling from
the top and landing with animation. Once done, codereview with gemini pro and o3 both and ask them to critique your
work. Fix medium to critical bugs / concerns / issues and show me the final product
Take a look at these log files saved under subfolder/diagnostics.log there's a bug where the user says the app
crashes at launch. Think hard and go over each line, tallying it with corresponding code within the project. After
you've performed initial investigation, ask gemini pro to analyze the log files and the related code where you
suspect lies the bug and then formulate and implement a bare minimal fix. Must not regress. Perform a precommit
with zen in the end using gemini pro to confirm we're okay to publish the fix
To help choose the right tool for your needs:
Decision Flow:
debug
codereview
analyze
thinkdeep
chat
Key Distinctions:
analyze
vs codereview
: analyze explains, codereview prescribes fixeschat
vs thinkdeep
: chat is open-ended, thinkdeep extends specific analysisdebug
vs codereview
: debug diagnoses runtime errors, review finds static issuesClaude automatically manages thinking modes based on task complexity, but you can also manually control Gemini's reasoning depth to balance between response quality and token consumption. Each thinking mode uses a different amount of tokens, directly affecting API costs and response time.
These only apply to models that support customizing token usage for extended thinking, such as Gemini 2.5 Pro.
Mode | Token Budget | Use Case | Cost Impact |
---|---|---|---|
minimal | 128 tokens | Simple, straightforward tasks | Lowest cost |
low | 2,048 tokens | Basic reasoning tasks | 16x more than minimal |
medium | 8,192 tokens | Default - Most development tasks | 64x more than minimal |
high | 16,384 tokens | Complex problems requiring thorough analysis (default for thinkdeep ) | 128x more than minimal |
max | 32,768 tokens | Exhaustive reasoning | 256x more than minimal |
Claude automatically selects appropriate thinking modes, but you can override this by explicitly requesting a specific mode in your prompts. Remember: higher thinking modes = more tokens = higher cost but better quality:
In most cases, let Claude automatically manage thinking modes for optimal balance of cost and quality. Override manually when you have specific requirements:
Use lower modes (minimal
, low
) to save tokens when:
Use higher modes (high
, max
) when quality justifies the cost:
Token Cost Examples:
minimal
(128 tokens) vs max
(32,768 tokens) = 256x difference in thinking tokensminimal
instead of the default medium
saves ~8,000 thinking tokenshigh
or max
mode are a worthwhile investmentExamples by scenario:
# Quick style check with o3
"Use flash to review formatting in utils.py"
# Security audit with o3
"Get o3 to do a security review of auth/ with thinking mode high"
# Complex debugging, letting claude pick the best model
"Use zen to debug this race condition with max thinking mode"
# Architecture analysis with Gemini 2.5 Pro
"Analyze the entire src/ directory architecture with high thinking using pro"
This server enables true AI collaboration between Claude and multiple AI models (Gemini, O3), where they can coordinate and question each other's approaches:
How it works:
analyze
) and continue with another (e.g., codereview
) using the same conversation threadExample of Multi-Model AI Coordination:
Asynchronous workflow example:
Enhanced collaboration features:
Cross-tool & Cross-Model Continuation Example:
1. Claude: "Analyze /src/auth.py for security issues"
→ Auto mode: Claude picks Gemini Pro for deep security analysis
→ Pro analyzes and finds vulnerabilities, provides continuation_id
2. Claude: "Review the authentication logic thoroughly"
→ Uses same continuation_id, but Claude picks O3 for logical analysis
→ O3 sees previous Pro analysis and provides logic-focused review
3. Claude: "Debug the auth test failures"
→ Same continuation_id, Claude keeps O3 for debugging
→ O3 provides targeted debugging with full context from both previous analyses
4. Claude: "Quick style check before committing"
→ Same thread, but Claude switches to Flash for speed
→ Flash quickly validates formatting with awareness of all previous fixes
The MCP protocol has a combined request+response limit of approximately 25K tokens. This server intelligently works around this limitation by automatically handling large prompts as files:
How it works:
prompt.txt
Example scenario:
# You have a massive code review request with detailed context
User: "Use gemini to review this code: [50,000+ character detailed analysis]"
# Server detects the large prompt and responds:
Zen MCP: "The prompt is too large for MCP's token limits (>50,000 characters).
Please save the prompt text to a temporary file named 'prompt.txt' and resend
the request with an empty prompt string and the absolute file path included
in the files parameter, along with any other files you wish to share as context."
# Claude automatically handles this:
- Saves your prompt to /tmp/prompt.txt
- Resends: "Use gemini to review this code" with files=["/tmp/prompt.txt", "/path/to/code.py"]
# Server processes the large prompt through Gemini's 1M context
# Returns comprehensive analysis within MCP's response limits
This feature ensures you can send arbitrarily large prompts to Gemini without hitting MCP's protocol limitations, while maximizing the available space for detailed responses.
Tools can request additional context from Claude during execution. When Gemini needs more information to provide a thorough analysis, it will ask Claude for specific files or clarification, enabling true collaborative problem-solving.
Example: If Gemini is debugging an error but needs to see a configuration file that wasn't initially provided, it can request:
{ "status": "requires_clarification", "question": "I need to see the database configuration to understand this connection error", "files_needed": ["config/database.yml", "src/db_connection.py"] }
Claude will then provide the requested files and Gemini can continue with a more complete analysis.
Smart web search recommendations for enhanced analysis
Web search is now enabled by default for all tools. Instead of performing searches directly, Gemini intelligently analyzes when additional information from the web would enhance its response and provides specific search recommendations for Claude to execute.
How it works:
Example:
User: "Use gemini to debug this FastAPI async error"
Gemini's Response:
[... debugging analysis ...]
**Recommended Web Searches for Claude:**
- "FastAPI async def vs def performance 2024" - to verify current best practices for async endpoints
- "FastAPI BackgroundTasks memory leak" - to check for known issues with the version you're using
- "FastAPI lifespan context manager pattern" - to explore proper resource management patterns
Claude can then search for these specific topics and provide you with the most current information.
Benefits:
Web search control: Web search is enabled by default, allowing models to request Claude perform searches for current documentation and solutions. If you prefer the model to work only with its training data, you can disable web search:
"Use gemini to review this code with use_websearch false"
The server includes several configurable properties that control its behavior:
🎯 Auto Mode (Recommended):
Set DEFAULT_MODEL=auto
in your .env file and Claude will intelligently select the best model for each task:
# .env file DEFAULT_MODEL=auto # Claude picks the best model automatically # API Keys (at least one required) GEMINI_API_KEY=your-gemini-key # Enables Gemini Pro & Flash OPENAI_API_KEY=your-openai-key # Enables O3, O3-mini
How Auto Mode Works:
Supported Models & When Claude Uses Them:
Model | Provider | Context | Strengths | Auto Mode Usage |
---|---|---|---|---|
pro (Gemini 2.5 Pro) | 1M tokens | Extended thinking (up to 32K tokens), deep analysis | Complex architecture, security reviews, deep debugging | |
flash (Gemini 2.0 Flash) | 1M tokens | Ultra-fast responses | Quick checks, formatting, simple analysis | |
o3 | OpenAI | 200K tokens | Strong logical reasoning | Debugging logic errors, systematic analysis |
o3-mini | OpenAI | 200K tokens | Balanced speed/quality | Moderate complexity tasks |
Any model | OpenRouter | Varies | Access to GPT-4, Claude, Llama, etc. | User-specified or based on task requirements |
Manual Model Selection: You can specify a default model instead of auto mode:
# Use a specific model by default DEFAULT_MODEL=gemini-2.5-pro-preview-06-05 # Always use Gemini Pro DEFAULT_MODEL=flash # Always use Flash DEFAULT_MODEL=o3 # Always use O3
Per-Request Model Override: Regardless of your default setting, you can specify models per request:
Model Capabilities:
Different tools use optimized temperature settings:
TEMPERATURE_ANALYTICAL
: 0.2
- Used for code review and debugging (focused, deterministic)TEMPERATURE_BALANCED
: 0.5
- Used for general chat (balanced creativity/accuracy)TEMPERATURE_CREATIVE
: 0.7
- Used for deep thinking and architecture (more creative)Control logging verbosity via the LOG_LEVEL
environment variable:
DEBUG
: Shows detailed operational messages, tool execution flow, conversation threadingINFO
: Shows general operational messages (default)WARNING
: Shows only warnings and errorsERROR
: Shows only errorsSet in your .env file:
LOG_LEVEL=DEBUG # For troubleshooting LOG_LEVEL=INFO # For normal operation (default)
For Docker:
# In .env file LOG_LEVEL=DEBUG # Or set directly when starting LOG_LEVEL=DEBUG docker compose up
All file paths must be absolute paths.
When using any Gemini tool, always provide absolute paths:
✅ "Use gemini to analyze /Users/you/project/src/main.py"
❌ "Use gemini to analyze ./src/main.py" (will be rejected)
By default, the server allows access to files within your home directory. This is necessary for the server to work with any file you might want to analyze from Claude.
For Docker environments, the WORKSPACE_ROOT
environment variable is used to map your local directory to the internal /workspace
directory, enabling the MCP to translate absolute file references correctly:
"env": { "GEMINI_API_KEY": "your-key", "WORKSPACE_ROOT": "/Users/you/project" // Maps to /workspace inside Docker }
This allows Claude to use absolute paths that will be correctly translated between your local filesystem and the Docker container.
The server uses carefully crafted system prompts to give each tool specialized expertise:
prompts/tool_prompts.py
BaseTool
and implements get_system_prompt()
User Request → Tool Selection → System Prompt + Context → Gemini Response
Each tool has a unique system prompt that defines its role and approach:
thinkdeep
: Acts as a senior development partner, challenging assumptions and finding edge casescodereview
: Expert code reviewer with security/performance focus, uses severity levelsdebug
: Systematic debugger providing root cause analysis and prevention strategiesanalyze
: Code analyst focusing on architecture, patterns, and actionable insightsTo modify tool behavior, you can:
prompts/tool_prompts.py
for global changesget_system_prompt()
in a tool class for tool-specific changestemperature
parameter to adjust response style (0.2 for focused, 0.7 for creative)The project includes comprehensive unit tests that use mocks and don't require a Gemini API key:
# Run all unit tests python -m pytest tests/ -v # Run with coverage python -m pytest tests/ --cov=. --cov-report=html
To test the MCP server with comprehensive end-to-end simulation:
# Set your API keys (at least one required) export GEMINI_API_KEY=your-gemini-api-key-here export OPENAI_API_KEY=your-openai-api-key-here # Run all simulation tests (default: uses existing Docker containers) python communication_simulator_test.py # Run specific tests only python communication_simulator_test.py --tests basic_conversation content_validation # Run with Docker rebuild (if needed) python communication_simulator_test.py --rebuild-docker # List available tests python communication_simulator_test.py --list-tests
The simulation tests validate:
The project includes GitHub Actions workflows that:
The CI pipeline works without any secrets and will pass all tests using mocked responses. Simulation tests require API key secrets (GEMINI_API_KEY
and/or OPENAI_API_KEY
) to run the communication simulator.
"Connection failed" in Claude Desktop
docker compose ps
docker ps
to see actual container names"API key environment variable is required"
docker compose restart
Container fails to start
docker compose logs zen-mcp
docker compose build --no-cache
"spawn ENOENT" or execution issues
docker compose ps
Testing your Docker setup:
# Check if services are running docker compose ps # Test manual connection docker exec -i zen-mcp-server echo "Connection test" # View logs docker compose logs -f
MIT License - see LICENSE file for details.
Built with the power of Multi-Model AI collaboration 🤝