Gemini视频识别
STDIO基于Gemini AI的多媒体识别服务器
基于Gemini AI的多媒体识别服务器
An MCP (Model Context Protocol) server that provides tools for image, audio, and video recognition using Google's Gemini AI.
Clone the repository:
git clone https://github.com/yourusername/mcp-video-recognition.git cd mcp-video-recognition
Install dependencies:
npm install
Build the project:
npm run build
To integrate this MCP server with Cline or other MCP clients via configuration files:
Open your Cline settings:
Add the server configuration to the mcpServers
object:
{ "mcpServers": { "video-recognition": { "command": "node", "args": [ "/path/to/mcp-video-recognition/dist/index.js" ], "disabled": false, "autoApprove": [] } } }
Replace /path/to/mcp-video-recognition/dist/index.js
with the actual path to the index.js
file in your project directory. Use forward slashes (/) or double backslashes (\\) for the path on Windows.
Save the settings file. Cline should automatically connect to the server.
The server is configured using environment variables:
GOOGLE_API_KEY
(required): Your Google Gemini API keyTRANSPORT_TYPE
: Transport type to use (stdio
or sse
, defaults to stdio
)PORT
: Port number for SSE transport (defaults to 3000)LOG_LEVEL
: Logging level (verbose
, debug
, info
, warn
, error
, defaults to info
)GOOGLE_API_KEY=your_api_key npm start
GOOGLE_API_KEY=your_api_key TRANSPORT_TYPE=sse PORT=3000 npm start
The server provides three tools that can be called by MCP clients:
{ "name": "image_recognition", "arguments": { "filepath": "/path/to/image.jpg", "prompt": "Describe this image in detail", "modelname": "gemini-2.0-flash" } }
{ "name": "audio_recognition", "arguments": { "filepath": "/path/to/audio.mp3", "prompt": "Transcribe this audio", "modelname": "gemini-2.0-flash" } }
{ "name": "video_recognition", "arguments": { "filepath": "/path/to/video.mp4", "prompt": "Describe what happens in this video", "modelname": "gemini-2.0-flash" } }
All tools accept the following parameters:
filepath
(required): Path to the media file to analyzeprompt
(optional): Custom prompt for the recognition (defaults to "Describe this content")modelname
(optional): Gemini model to use for recognition (defaults to "gemini-2.0-flash")GOOGLE_API_KEY=your_api_key npm run dev
src/index.ts
: Entry pointsrc/server.ts
: MCP server implementationsrc/tools/
: Tool implementationssrc/services/
: Service implementations (Gemini API)src/types/
: Type definitionssrc/utils/
: Utility functionsMIT