Minimax Image TTS
STDIOMCP server implementation for Minimax API integration with image generation and text-to-speech functionality.
MCP server implementation for Minimax API integration with image generation and text-to-speech functionality.
A Model Context Protocol (MCP) server implementation with Minimax API integration for AI-powered image generation and text-to-speech functionality.
English | 简体中文
Create or update your MCP configuration file:
{ "mcpServers": { "minimax-mcp-tools": { "command": "npx", "args": [ "minimax-mcp-tools" ], "env": { "MINIMAX_API_KEY": "your-minimax-api-key", "MINIMAX_GROUP_ID": "your-minimax-group-id" } } } }
Generate images based on text prompts:
// Example parameters for image generation { "prompt": "A mountain landscape at sunset", "aspectRatio": "16:9", "n": 1, "outputFile": "/absolute/path/to/image.jpg", "subjectReference": "/path/to/reference.jpg" // Optional: local file or URL }
Parameters:
prompt
(required): Description of the image to generateoutputFile
(required): Absolute path to save the generated image file. The directory must already exist. When generating multiple images (n>1), files will be named with sequential numbers (e.g., 'image-1.jpg', 'image-2.jpg').aspectRatio
(optional): Aspect ratio of the image (default: "1:1", options: "1:1", "16:9", "4:3", "3:2", "2:3", "3:4", "9:16", "21:9")n
(optional): Number of images to generate (default: 1, range: 1-9). When n>1, the output filenames will be automatically numbered.subjectReference
(optional): Path to a local image file or a public URL for character reference. When provided, the generated image will use this as a reference for character appearance. Supported formats: JPG, JPEG, PNGConvert text to speech with various customization options:
// Example parameters for text-to-speech { "text": "Hello, this is a test of the text-to-speech functionality.", "model": "speech-02-hd", "voiceId": "female-shaonv", "speed": 1.0, "volume": 1.0, "pitch": 0, "emotion": "happy", "format": "mp3", "outputFile": "/absolute/path/to/audio.mp3", "subtitleEnable": true }
text
(required): Text to convert to speech (max 10,000 characters)outputFile
(required): Absolute path to save the generated audio filemodel
(optional): Model version to use (default: "speech-02-hd", options: "speech-02-hd", "speech-02-turbo")
speech-02-hd
: High-definition model with excellent timbre similarity, rhythm stability, and studio-grade audio qualityspeech-02-turbo
: Fast model with excellent performance and low latency, enhanced multilingual capabilitiesvoiceId
(optional): Voice ID to use (default: "male-qn-qingse")speed
(optional): Speech speed (default: 1.0, range: 0.5-2.0)volume
(optional): Speech volume (default: 1.0, range: 0.1-10.0)pitch
(optional): Speech pitch (default: 0, range: -12 to 12)emotion
(optional): Emotion of the speech (default: "neutral", options: "happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral")timberWeights
(optional): Voice mixing settings, allows mixing up to 4 different voices with weights
"timberWeights": [ { "voice_id": "male-qn-qingse", "weight": 70 }, { "voice_id": "female-shaonv", "weight": 30 } ]
format
(optional): Audio format (default: "mp3", options: "mp3", "pcm", "flac", "wav")sampleRate
(optional): Sample rate in Hz (default: 32000, options: 8000, 16000, 22050, 24000, 32000, 44100)bitrate
(optional): Bitrate for MP3 format (default: 128000, options: 32000, 64000, 128000, 256000)channel
(optional): Number of audio channels (default: 1, options: 1=mono, 2=stereo)latexRead
(optional): Whether to read LaTeX formulas (default: false)pronunciationDict
(optional): List of pronunciation replacements
"pronunciationDict": ["处理/(chu3)(li3)", "危险/dangerous"]
stream
(optional): Whether to use streaming mode (default: false)languageBoost
(optional): Enhance recognition of specific languages
subtitleEnable
(optional): Whether to enable subtitle generation (default: false)MIT
Contributions are welcome! Please feel free to submit a Pull Request.