icon for mcp server

DINO-X

STDIO

Fine-grained object detection and image understanding for LLMs using DINO-X technology.

DINO-X MCP

License npm version npm downloads PRs Welcome GitHub stars

English | 中文

Enables large language models to perform fine-grained object detection and image understanding, powered by DINO-X and Grounding DINO 1.6 API.

💡 Why DINO-X MCP?

Although multimodal models can understand and describe images, they often lack precise localization and high-quality structured outputs for visual content.

With DINO-X MCP, you can:

🧠 Achieve fine-grained image understanding — both full-scene recognition and targeted detection based on natural language.

🎯 Accurately obtain object count, position, and attributes, enabling tasks such as visual question answering.

🧩 Integrate with other MCP Servers to build multi-step visual workflows.

🛠️ Build natural language-driven visual agents for real-world automation scenarios.

🎬 Use Case

🎯 Scenario📝 Input✨ Output
Detection & Localization💬 Prompt:
Detect the fire areas
in the forest and visualize
with Canvas

🖼️ Input Image:
Object Counting💬 Prompt:
Please analyze this
warehouse image, detect
all the cardboard boxes,
count the total number

🖼️ Input Image:
Feature Detection💬 Prompt:
Find all red cars
in the image

🖼️ Input Image:
Attribute Reasoning💬 Prompt:
Find the tallest person
in the image, describe
their clothing

🖼️ Input Image:
Full Scene Detection💬 Prompt:
Find the fruit with
the highest vitamin C
content in the image

🖼️ Input Image:


Answer: Kiwi fruit (93mg/100g)
Pose Analysis💬 Prompt:
Please analyze what
yoga pose this is

🖼️ Input Image:

🚀 Quick Start

1. Prerequisites

You can install Node.js using one of the following methods:

Option A: Command 👍

# For MacOS or Linux # 1. Install nvm (Node Version Manager) curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash # OR wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash # 2. Add these lines to your profile (~/.bash_profile, ~/.zshrc, ~/.profile, or ~/.bashrc) export NVM_DIR="$HOME/.nvm" [ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" [ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion" # 3. Activate nvm in current shell source ~/.bashrc # Or source ~/.zshrc # 4. Verify nvm installation command -v nvm # 5. Install and use LTS version of Node.js nvm install --lts nvm use --lts # For Windows winget install OpenJS.NodeJS.LTS # Or using PowerShell (Administrator) iwr -useb https://raw.githubusercontent.com/chocolatey/chocolatey/master/chocolateyInstall/InstallChocolatey.ps1 | iex choco install nodejs-lts -y

Option B: Manual Installation

Download the installer from nodejs.org

Also, choose an AI assistants and applications that support the MCP Client, including but not limited to:

2. Configure MCP Sever

You can use DINO-X MCP server in two ways:

Option A: Using NPM Package 👍

Add the following configuration in your MCP client:

{ "mcpServers": { "dinox-mcp": { "command": "npx", "args": ["-y", "@deepdataspace/dinox-mcp"], "env": { "DINOX_API_KEY": "your-api-key-here" } } } }

Option B: Using Local Project

First, clone and build the project:

# Clone the project git clone https://github.com/IDEA-Research/DINO-X-MCP.git cd DINO-X-MCP # Install dependencies pnpm install # Build the project pnpm run build

Then configure your MCP client:

{ "mcpServers": { "dinox-mcp": { "command": "node", "args": ["/path/to/DINO-X-MCP/build/index.js"], "env": { "DINOX_API_KEY": "your-api-key-here" } } } }

3. Get API Key

Get your API key from DINO-X Platform (A free quota is available for new users).

Replace your-api-key-here in the configuration above with your actual API key.

4. Available Tools

Restart your MCP client, and you should be able to use the following tools:

Method NameDescriptionInputOutput
detect-all-objectsDetects and localizes all recognizable objects in an image.ImageCategory names + bounding boxes + captions
object-detection-by-textDetects and localizes objects in an image based on a natural language prompt.Image + Text promptBounding boxes + object captions
detect-human-pose-keypointsDetects 17 human body keypoints per person in an image for pose estimation.ImageKeypoint coordinates and captions

📝 Usage

Supported Image Formats

  • Remote URLs starting with https:// 👍
  • Local file paths (starting with file://)
  • Common image formats: jpg, jpeg, png, webp

API Docs

Please refer to DINO-X Platform for API usage limits and pricing information.

🛠️ Development

Watch Mode

During development, you can use watch mode for automatic rebuilding:

pnpm run watch

Debugging

Use MCP Inspector to debug the server:

pnpm run inspector

License

Apache License 2.0

Be the First to Experience MCP Now