DINO-X
STDIOFine-grained object detection and image understanding for LLMs using DINO-X technology.
Fine-grained object detection and image understanding for LLMs using DINO-X technology.
English | 中文
DINO-X Official MCP Server — powered by the DINO-X and Grounding DINO models — brings fine-grained object detection and image understanding to your multimodal applications.
With DINO-X MCP, you can:
Fine-Grained Understanding: Full image detection, object detection, and region-level descriptions.
Structured Outputs: Get object categories, counts, locations, and attributes for VQA and multi-step reasoning tasks.
Composable: Works seamlessly with other MCP servers to build end-to-end visual agents or automation pipelines.
DINO-X MCP supports two transport modes:
| Feature | STDIO (default) | Streamable HTTP | 
|---|---|---|
| Runtime | Local | Local or Cloud | 
| Transport | Standard I/O | HTTP (streaming responses) | 
| Input source | file:// and https:// | https:// only | 
| Visualization | Supported (saves annotated images locally) | Not supported (for now) | 
Any MCP-compatible client works, e.g.:
Apply on the DINO-X platform: Request API Key (new users get free quota).
Add to your MCP client config and replace with your API key:
{ "mcpServers": { "dinox-mcp": { "url": "https://mcp.deepdataspace.com/mcp?key=your-api-key" } } }
Install Node.js first
Download the installer from nodejs.org
Or use command:
# macOS / Linux curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash # or wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash # load nvm into current shell (choose the one you use) source ~/.bashrc || true source ~/.zshrc || true # install and use LTS Node.js nvm install --lts nvm use --lts # Windows (one of the following) winget install OpenJS.NodeJS.LTS # or with Chocolatey (in admin PowerShell) iwr -useb https://raw.githubusercontent.com/chocolatey/chocolatey/master/chocolateyInstall/InstallChocolatey.ps1 | iex choco install nodejs-lts -y
Configure your MCP client:
{ "mcpServers": { "dinox-mcp": { "command": "npx", "args": ["-y", "@deepdataspace/dinox-mcp"], "env": { "DINOX_API_KEY": "your-api-key-here", "IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory" } } } }
Note: Replace your-api-key-here with your real key.
Make sure Node.js is installed (see Option B), then:
# clone git clone https://github.com/IDEA-Research/DINO-X-MCP.git cd DINO-X-MCP # install deps npm install # build npm run build
Configure your MCP client:
{ "mcpServers": { "dinox-mcp": { "command": "node", "args": ["/path/to/DINO-X-MCP/build/index.js"], "env": { "DINOX_API_KEY": "your-api-key-here", "IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory" } } } }
Common flags
--http: start in Streamable HTTP mode (otherwise STDIO by default)--stdio: force STDIO mode--dinox-api-key=...: set API key--enable-client-key: allow API key via URL ?key= (Streamable HTTP only)--port=8080: HTTP port (default 3020)Environment variables
DINOX_API_KEY (required/conditionally required): DINO-X platform API keyIMAGE_STORAGE_DIRECTORY (optional, STDIO): directory to save annotated imagesAUTH_TOKEN (optional, HTTP): if set, client must send Authorization: Bearer <token>Examples:
# STDIO (local) node build/index.js --dinox-api-key=your-api-key # Streamable HTTP (server provides a shared API key) node build/index.js --http --dinox-api-key=your-api-key # Streamable HTTP (custom port) node build/index.js --http --dinox-api-key=your-api-key --port=8080 # Streamable HTTP (require client-provided API key via URL) node build/index.js --http --enable-client-key
Client config when using ?key=:
{ "mcpServers": { "dinox-mcp": { "url": "http://localhost:3020/mcp?key=your-api-key" } } }
Using AUTH_TOKEN with a gateway that injects Authorization: Bearer <token>:
AUTH_TOKEN=my-token node build/index.js --http --enable-client-key
Client example with supergateway:
{ "mcpServers": { "dinox-mcp": { "command": "npx", "args": [ "-y", "supergateway", "--streamableHttp", "http://localhost:3020/mcp?key=your-api-key", "--oauth2Bearer", "my-token" ] } } }
| Capability | Tool ID | Transport | Input | Output | 
|---|---|---|---|---|
| Full-scene object detection | detect-all-objects | STDIO / HTTP | Image URL | Category + bbox + (optional) captions | 
| Text-prompted object detection | detect-objects-by-text | STDIO / HTTP | Image URL + English nouns (dot-separated for multiple, e.g., person.car) | Target object bbox + (optional) captions | 
| Human pose estimation | detect-human-pose-keypoints | STDIO / HTTP | Image URL | 17 keypoints + bbox + (optional) captions | 
| Visualization | visualize-detection-result | STDIO only | Image URL + detection results array | Local path to annotated image | 
| 🎯 Scenario | 📝 Input | ✨ Output | 
|---|---|---|
| Detection & Localization | 💬 Prompt:Detect and visualize the fire areas in the forest 🖼️ Input Image:  | |
| Object Counting | 💬 Prompt:Please analyze thiswarehouse image, detectall the cardboard boxes,count the total number🖼️ Input Image:  | |
| Feature Detection | 💬 Prompt:Find all red carsin the image🖼️ Input Image:  | |
| Attribute Reasoning | 💬 Prompt:Find the tallest personin the image, describetheir clothing🖼️ Input Image:  | |
| Full Scene Detection | 💬 Prompt:Find the fruit withthe highest vitamin Ccontent in the image🖼️ Input Image:  | Answer: Kiwi fruit (93mg/100g)  | 
| Pose Analysis | 💬 Prompt:Please analyze whatyoga pose this is🖼️ Input Image:  | 
file:// and https://https:// onlyUse watch mode to auto-rebuild during development:
npm run watch
Use MCP Inspector for debugging:
npm run inspector
Apache License 2.0