As A Judge
STDIOAI coding validation layer with quality gates for plan, code, and testing approval
AI coding validation layer with quality gates for plan, code, and testing approval
mcp-name: io.github.OtherVibes/mcp-as-a-judge
MCP as a Judge acts as a validation layer between AI coding assistants and LLMs, helping ensure safer and higher-quality code.
MCP as a Judge is a behavioral MCP that strengthens AI coding assistants by requiring explicit LLM evaluations for:
It enforces evidence-based research, reuse over reinvention, and human-in-the-loop decisions.
If your IDE has rules/agents (Copilot, Cursor, Claude Code), keep using them—this Judge adds enforceable approval gates on plan, code diffs, and tests.
| Tool | What it solves | 
|---|---|
set_coding_task | Creates/updates task metadata; classifies task_size; returns next-step workflow guidance | 
get_current_coding_task | Recovers the latest task_id and metadata to resume work safely | 
judge_coding_plan | Validates plan/design; requires library selection and internal reuse maps; flags risks | 
judge_code_change | Reviews unified Git diffs for correctness, reuse, security, and code quality | 
judge_testing_implementation | Validates tests using real runner output and optional coverage | 
judge_coding_task_completion | Final gate ensuring plan, code, and tests approvals before completion | 
raise_missing_requirements | Elicits missing details and decisions to unblock progress | 
raise_obstacle | Engages the user on trade‑offs, constraints, and enforced changes | 
MCP as a Judge is heavily dependent on MCP Sampling and MCP Elicitation features for its core functionality:
| AI Assistant | Platform | MCP Support | Status | Notes | 
|---|---|---|---|---|
| GitHub Copilot | Visual Studio Code | ✅ Full | Recommended | Complete MCP integration with sampling and elicitation | 
| Claude Code | - | ⚠️ Partial | Requires LLM API key | Sampling Support feature request Elicitation Support feature request  | 
| Cursor | - | ⚠️ Partial | Requires LLM API key | MCP support available, but sampling/elicitation limited | 
| Augment | - | ⚠️ Partial | Requires LLM API key | MCP support available, but sampling/elicitation limited | 
| Qodo | - | ⚠️ Partial | Requires LLM API key | MCP support available, but sampling/elicitation limited | 
✅ Recommended setup: GitHub Copilot + VS Code — full MCP sampling; no API key needed.
⚠️ Critical: For assistants without full MCP sampling (Cursor, Claude Code, Augment, Qodo), you MUST set LLM_API_KEY. Without it, the server cannot evaluate plans or code. See LLM API Configuration.
💡 Tip: Prefer large context models (≥ 1M tokens) for better analysis and judgments.
For troubleshooting, visit the FAQs section.
Configure MCP as a Judge in your MCP-enabled client:
Notes:
Configure MCP Settings:
Add this to your MCP client configuration file:
{ "command": "docker", "args": ["run", "--rm", "-i", "--pull=always", "ghcr.io/othervibes/mcp-as-a-judge:latest"], "env": { "LLM_API_KEY": "your-openai-api-key-here", "LLM_MODEL_NAME": "gpt-4o-mini" } }
📝 Configuration Options (All Optional):
--pull=always flag ensures you always get the latest version automaticallyThen manually update when needed:
# Pull the latest version docker pull ghcr.io/othervibes/mcp-as-a-judge:latest
Install the package:
uv tool install mcp-as-a-judge
Configure MCP Settings:
The MCP server may be automatically detected by your MCP‑enabled client.
📝 Notes:
To update to the latest version:
# Update MCP as a Judge to the latest version uv tool upgrade mcp-as-a-judge
For AI assistants without full MCP sampling support you can configure an LLM API key as a fallback. This ensures MCP as a Judge works even when the client doesn't support MCP sampling.
LLM_API_KEY (unified key). Vendor is auto-detected; optionally set LLM_MODEL_NAME to override the default.| Rank | Provider | API Key Format | Default Model | Notes | 
|---|---|---|---|---|
| 1 | OpenAI | sk-... | gpt-4.1 | Fast and reliable model optimized for speed | 
| 2 | Anthropic | sk-ant-... | claude-sonnet-4-20250514 | High-performance with exceptional reasoning | 
| 3 | AIza... | gemini-2.5-pro | Most advanced model with built-in thinking | |
| 4 | Azure OpenAI | [a-f0-9]{32} | gpt-4.1 | Same as OpenAI but via Azure | 
| 5 | AWS Bedrock | AWS credentials | anthropic.claude-sonnet-4-20250514-v1:0 | Aligned with Anthropic | 
| 6 | Vertex AI | Service Account JSON | gemini-2.5-pro | Enterprise Gemini via Google Cloud | 
| 7 | Groq | gsk_... | deepseek-r1 | Best reasoning model with speed advantage | 
| 8 | OpenRouter | sk-or-... | deepseek/deepseek-r1 | Best reasoning model available | 
| 9 | xAI | xai-... | grok-code-fast-1 | Latest coding-focused model (Aug 2025) | 
| 10 | Mistral | [a-f0-9]{64} | pixtral-large | Most advanced model (124B params) | 
Open Cursor Settings:
File → Preferences → Cursor SettingsMCP tab+ Add to add a new MCP serverAdd MCP Server Configuration:
{ "command": "uv", "args": ["tool", "run", "mcp-as-a-judge"], "env": { "LLM_API_KEY": "your-openai-api-key-here", "LLM_MODEL_NAME": "gpt-4.1" } }
📝 Configuration Options:
Add MCP Server via CLI:
# Set environment variables first (optional model override) export LLM_API_KEY="your_api_key_here" export LLM_MODEL_NAME="claude-3-5-haiku" # Optional: faster/cheaper model # Add MCP server claude mcp add mcp-as-a-judge -- uv tool run mcp-as-a-judge
Alternative: Manual Configuration:
~/.config/claude-code/mcp_servers.json{ "command": "uv", "args": ["tool", "run", "mcp-as-a-judge"], "env": { "LLM_API_KEY": "your-anthropic-api-key-here", "LLM_MODEL_NAME": "claude-3-5-haiku" } }
📝 Configuration Options:
For other MCP-compatible clients, use the standard MCP server configuration:
{ "command": "uv", "args": ["tool", "run", "mcp-as-a-judge"], "env": { "LLM_API_KEY": "your-openai-api-key-here", "LLM_MODEL_NAME": "gpt-5" } }
📝 Configuration Options:
Primary Mode: MCP Sampling
Fallback Mode: LLM API Key
LLM_API_KEY for fallback, the server will call your chosen LLM provider only to perform judgments (plan/code/test) with the evaluation content you provide.We welcome contributions! Please see CONTRIBUTING.md for guidelines.
# Clone the repository git clone https://github.com/OtherVibes/mcp-as-a-judge.git cd mcp-as-a-judge # Install dependencies with uv uv sync --all-extras --dev # Install pre-commit hooks uv run pre-commit install # Run tests uv run pytest # Run all checks uv run pytest && uv run ruff check && uv run ruff format --check && uv run mypy src
© 2025 OtherVibes and Zvi Fried. The "MCP as a Judge" concept, the "behavioral MCP" approach, the staged workflow (plan → code → test → completion), tool taxonomy/descriptions, and prompt templates are original work developed in this repository.
While “LLM‑as‑a‑judge” is a broadly known idea, this repository defines the original “MCP as a Judge” behavioral MCP pattern by OtherVibes and Zvi Fried. It combines task‑centric workflow enforcement (plan → code → test → completion), explicit LLM‑based validations, and human‑in‑the‑loop elicitation, along with the prompt templates and tool taxonomy provided here. Please attribute as: “OtherVibes – MCP as a Judge (Zvi Fried)”.
| Feature | IDE Rules | Subagents | MCP as a Judge | 
|---|---|---|---|
| Static behavior guidance | ✓ | ✓ | ✗ | 
| Custom system prompts | ✓ | ✓ | ✓ | 
| Project context integration | ✓ | ✓ | ✓ | 
| Specialized task handling | ✗ | ✓ | ✓ | 
| Active quality gates | ✗ | ✗ | ✓ | 
| Evidence-based validation | ✗ | ✗ | ✓ | 
| Approve/reject with feedback | ✗ | ✗ | ✓ | 
| Workflow enforcement | ✗ | ✗ | ✓ | 
| Cross-assistant compatibility | ✗ | ✗ | ✓ | 
This project is licensed under the MIT License (see LICENSE).