
DINO-X
STDIOFine-grained object detection and image understanding for LLMs using DINO-X technology.
Fine-grained object detection and image understanding for LLMs using DINO-X technology.
English | 中文
Enables large language models to perform fine-grained object detection and image understanding, powered by DINO-X and Grounding DINO 1.6 API.
Although multimodal models can understand and describe images, they often lack precise localization and high-quality structured outputs for visual content.
With DINO-X MCP, you can:
🧠 Achieve fine-grained image understanding — both full-scene recognition and targeted detection based on natural language.
🎯 Accurately obtain object count, position, and attributes, enabling tasks such as visual question answering.
🧩 Integrate with other MCP Servers to build multi-step visual workflows.
🛠️ Build natural language-driven visual agents for real-world automation scenarios.
🎯 Scenario | 📝 Input | ✨ Output |
---|---|---|
Detection & Localization | 💬 Prompt:Detect the fire areas in the forest and visualize with Canvas 🖼️ Input Image: ![]() | ![]() |
Object Counting | 💬 Prompt:Please analyze this warehouse image, detect all the cardboard boxes, count the total number 🖼️ Input Image: ![]() | ![]() |
Feature Detection | 💬 Prompt:Find all red cars in the image 🖼️ Input Image: ![]() | ![]() |
Attribute Reasoning | 💬 Prompt:Find the tallest person in the image, describe their clothing 🖼️ Input Image: ![]() | ![]() |
Full Scene Detection | 💬 Prompt:Find the fruit with the highest vitamin C content in the image 🖼️ Input Image: ![]() | ![]() Answer: Kiwi fruit (93mg/100g) |
Pose Analysis | 💬 Prompt:Please analyze what yoga pose this is 🖼️ Input Image: ![]() | ![]() |
You can install Node.js using one of the following methods:
# For MacOS or Linux # 1. Install nvm (Node Version Manager) curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash # OR wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash # 2. Add these lines to your profile (~/.bash_profile, ~/.zshrc, ~/.profile, or ~/.bashrc) export NVM_DIR="$HOME/.nvm" [ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" [ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion" # 3. Activate nvm in current shell source ~/.bashrc # Or source ~/.zshrc # 4. Verify nvm installation command -v nvm # 5. Install and use LTS version of Node.js nvm install --lts nvm use --lts # For Windows winget install OpenJS.NodeJS.LTS # Or using PowerShell (Administrator) iwr -useb https://raw.githubusercontent.com/chocolatey/chocolatey/master/chocolateyInstall/InstallChocolatey.ps1 | iex choco install nodejs-lts -y
Download the installer from nodejs.org
Also, choose an AI assistants and applications that support the MCP Client, including but not limited to:
You can use DINO-X MCP server in two ways:
Add the following configuration in your MCP client:
{ "mcpServers": { "dinox-mcp": { "command": "npx", "args": ["-y", "@deepdataspace/dinox-mcp"], "env": { "DINOX_API_KEY": "your-api-key-here" } } } }
First, clone and build the project:
# Clone the project git clone https://github.com/IDEA-Research/DINO-X-MCP.git cd DINO-X-MCP # Install dependencies pnpm install # Build the project pnpm run build
Then configure your MCP client:
{ "mcpServers": { "dinox-mcp": { "command": "node", "args": ["/path/to/DINO-X-MCP/build/index.js"], "env": { "DINOX_API_KEY": "your-api-key-here" } } } }
Get your API key from DINO-X Platform (A free quota is available for new users).
Replace your-api-key-here
in the configuration above with your actual API key.
Restart your MCP client, and you should be able to use the following tools:
Method Name | Description | Input | Output |
---|---|---|---|
detect-all-objects | Detects and localizes all recognizable objects in an image. | Image | Category names + bounding boxes + captions |
object-detection-by-text | Detects and localizes objects in an image based on a natural language prompt. | Image + Text prompt | Bounding boxes + object captions |
detect-human-pose-keypoints | Detects 17 human body keypoints per person in an image for pose estimation. | Image | Keypoint coordinates and captions |
https://
👍file://
)jpg, jpeg, png, webp
Please refer to DINO-X Platform for API usage limits and pricing information.
During development, you can use watch mode for automatic rebuilding:
pnpm run watch
Use MCP Inspector to debug the server:
pnpm run inspector
Apache License 2.0