
Chatterbox TTS
STDIOText-to-speech generation with automatic playback using Chatterbox TTS model
Text-to-speech generation with automatic playback using Chatterbox TTS model
A simplified Model Context Protocol (MCP) server that provides text-to-speech generation with automatic playback using the Chatterbox TTS model. The server loads the model automatically on first use and provides real-time progress notifications to keep users informed throughout the process.
This MCP server exposes Chatterbox TTS functionality through a single, streamlined tool that generates speech from text and plays it automatically. The server handles model loading, progress reporting, temporary file management, and audio playback seamlessly.
speak_text
The speak_text
tool provides complete text-to-speech functionality:
Parameters:
text
(required): The text to convert to speechexaggeration
(optional): Controls expressiveness (0.0-1.0, default 0.5)cfg_weight
(optional): Controls classifier-free guidance (0.0-1.0, default 0.5)Features:
afplay
chatterbox://model-info
Get information about the TTS model status and device capabilities:
The server provides detailed progress notifications throughout the speech generation process:
Model Loading Phase:
Speech Generation Phase:
Playback Phase:
Status Updates:
Install dependencies:
pip install mcp torch torchaudio
Install Chatterbox TTS:
Follow the Chatterbox TTS installation instructions to ensure the chatterbox.tts
module is available.
By default, the server stores audio files in ~/.chatterbox/audio
. You can configure a custom location using:
Command line argument:
python chatterbox_mcp_server.py --audio-dir /path/to/custom/audio/directory
Environment variable:
export CHATTERBOX_AUDIO_DIR="/path/to/custom/audio/directory" python chatterbox_mcp_server.py
Priority order:
--audio-dir
argument (highest priority)CHATTERBOX_AUDIO_DIR
environment variable~/.chatterbox/audio
(lowest priority)By default, audio files are automatically cleaned up after 1 hour. You can configure a custom TTL:
Command line argument:
python chatterbox_mcp_server.py --audio-ttl-hours 24 # Keep files for 24 hours
Environment variable:
export CHATTERBOX_AUDIO_TTL_HOURS=24 python chatterbox_mcp_server.py
Priority order:
--audio-ttl-hours
argument (highest priority)CHATTERBOX_AUDIO_TTL_HOURS
environment variableBy default, the TTS model is loaded on first use to minimize startup time. You can pre-load it at startup:
Command line argument:
python chatterbox_mcp_server.py --auto-load-model
This will load the model during server startup, which takes a few seconds but ensures the first TTS request is faster.
Audio Storage Features:
chatterbox://audio/{resource_id}
resources~
home directory notationStandalone:
python chatterbox_mcp_server.py
With MCP tools:
mcp dev chatterbox_mcp_server.py
Add to your Claude Desktop MCP configuration:
Basic configuration:
{ "mcpServers": { "chatterbox-tts": { "command": "python", "args": ["/path/to/chatterbox_mcp_server.py"], "env": {} } } }
With custom configuration:
{ "mcpServers": { "chatterbox-tts": { "command": "python", "args": [ "/path/to/chatterbox_mcp_server.py", "--audio-dir", "/custom/audio/path", "--auto-load-model", "--audio-ttl-hours", "24" ], "env": { "CHATTERBOX_AUDIO_DIR": "/custom/audio/path", "CHATTERBOX_AUDIO_TTL_HOURS": "24" } } } }
Basic text-to-speech:
Please use the speak_text tool to say "Hello, welcome to the Chatterbox TTS demonstration!"
Expressive speech:
Use speak_text to generate enthusiastic speech for "This is amazing!" with high expressiveness
The tool will automatically:
chatterbox-mcp/
├── chatterbox_mcp_server.py # MCP server implementation
└── README.md # This documentation
speak_text
tool instead of multiple toolsCommon Issues:
Model loading slow:
Audio playback issues:
afplay
command is macOS-specificMemory issues:
Device selection:
This MCP server implementation follows the same license as the underlying Chatterbox TTS model.