icon for mcp server

Text-to-Speech

STDIO

MCP server providing text-to-speech functionality with multiple TTS service options

mcp-tts Logo

mcp-tts

MCP Server for TTS (Text-to-Speech)


What? 🤔

Adds Text-to-Speech to things like Claude Desktop and Cursor IDE.

It registers four TTS tools:

  • say_tts
  • elevenlabs_tts
  • google_tts
  • openai_tts

say_tts

Uses the macOS say binary to speak the text with built-in system voices

elevenlabs_tts

Uses the ElevenLabs text-to-speech API to speak the text with premium AI voices

google_tts

Uses Google's Gemini TTS models to speak the text with 30 high-quality voices. Available voices include:

  • Zephyr (Bright), Puck (Upbeat), Charon (Informative)
  • Kore (Firm), Fenrir (Excitable), Leda (Youthful)
  • Orus (Firm), Aoede (Breezy), Callirhoe (Easy-going)
  • Autonoe (Bright), Enceladus (Breathy), Iapetus (Clear)
  • And 18 more voices with various characteristics

openai_tts

Uses OpenAI's Text-to-Speech API to speak the text with 10 natural-sounding voices:

  • alloy (Warm, conversational, modern)
  • ash (Confident, assertive, slightly textured)
  • ballad (Gentle, melodious, slightly lyrical)
  • coral (Cheerful, fresh, upbeat)
  • echo (Neutral, calm, balanced)
  • fable (Storyteller-like, expressive)
  • nova (Clear, precise, slightly formal)
  • onyx (Deep, authoritative, resonant)
  • sage (Soothing, empathetic, reassuring)
  • shimmer (Bright, animated, playful)
  • verse (Versatile, expressive)

Supports three quality models:

  • gpt-4o-mini-tts - Default, optimized quality and speed
  • tts-1 - Standard quality, faster generation
  • tts-1-hd - High definition audio, premium quality

Additional features:

  • Speed control from 0.25x to 4.0x (default: 1.0x)
  • Custom voice instructions (e.g., "Speak in a cheerful and positive tone") via parameter or OPENAI_TTS_INSTRUCTIONS environment variable

Configuration

Sequential vs Concurrent TTS

By default, the TTS server enforces sequential speech operations - only one TTS request can play audio at a time. This prevents multiple agents from speaking simultaneously and creating an unintelligible cacophony. Subsequent requests will wait in a queue until the current speech completes.

Multi-Instance Protection: The mutex works both within a single MCP server process and across multiple Claude Desktop instances. When running multiple Claude Desktop terminals, they coordinate via a system-wide file lock to prevent overlapping speech.

To allow concurrent TTS operations (multiple speeches playing simultaneously):

Environment Variable:

export MCP_TTS_ALLOW_CONCURRENT=true

Command Line Flag:

mcp-tts --sequential-tts=false

Note: Concurrent TTS may result in overlapping audio that's difficult to understand. Use this option only when you explicitly want multiple TTS operations to run simultaneously.

Suppressing "Speaking:" Output

By default, TTS tools return a message like "Speaking: [text]" when speech completes. This can interfere with LLM responses. To suppress this output:

Environment Variable:

export MCP_TTS_SUPPRESS_SPEAKING_OUTPUT=true

Command Line Flag:

mcp-tts --suppress-speaking-output

When enabled, tools return "Speech completed" instead of echoing the spoken text.

Getting Started

Install

go install github.com/blacktop/mcp-tts@latest
❱ mcp-tts --help TTS (text-to-speech) MCP Server. Provides multiple text-to-speech services via MCP protocol: • say_tts - Uses macOS built-in 'say' command (macOS only) • elevenlabs_tts - Uses ElevenLabs API for high-quality speech synthesis • google_tts - Uses Google's Gemini TTS models for natural speech • openai_tts - Uses OpenAI's TTS API with various voice options Each tool supports different voices, rates, and configuration options. Requires appropriate API keys for cloud-based services. Designed to be used with the MCP (Model Context Protocol). Usage: mcp-tts [flags] Flags: -h, --help help for mcp-tts --sequential-tts Enforce sequential TTS (prevent concurrent speech) (default true) --suppress-speaking-output Suppress 'Speaking:' text output -v, --verbose Enable verbose debug logging

Set Claude Desktop Config

{ "mcpServers": { "say": { "command": "mcp-tts", "env": { "ELEVENLABS_API_KEY": "********", "ELEVENLABS_VOICE_ID": "1SM7GgM6IMuvQlz2BwM3", "GOOGLE_AI_API_KEY": "********", "OPENAI_API_KEY": "********", "OPENAI_TTS_INSTRUCTIONS": "Speak in a cheerful and positive tone", "MCP_TTS_SUPPRESS_SPEAKING_OUTPUT": "true", "MCP_TTS_ALLOW_CONCURRENT": "false" } } } }

Environment Variables

  • ELEVENLABS_API_KEY: Your ElevenLabs API key (required for elevenlabs_tts)
  • ELEVENLABS_VOICE_ID: ElevenLabs voice ID (optional, defaults to a built-in voice)
  • GOOGLE_AI_API_KEY or GEMINI_API_KEY: Your Google AI API key (required for google_tts)
  • OPENAI_API_KEY: Your OpenAI API key (required for openai_tts)
  • OPENAI_TTS_INSTRUCTIONS: Custom voice instructions for OpenAI TTS (optional, e.g., "Speak in a cheerful and positive tone")
  • MCP_TTS_SUPPRESS_SPEAKING_OUTPUT: Set to "true" to suppress "Speaking:" output (optional)
  • MCP_TTS_ALLOW_CONCURRENT: Set to "true" to allow concurrent TTS operations (optional, defaults to sequential)

Test

Test macOS TTS

cat test/say.json | go run main.go --verbose 2025/03/23 22:41:49 INFO Starting MCP server name="Say TTS Service" version=1.0.0 2025/03/23 22:41:49 DEBU Say tool called request="{Request:{Method:tools/call Params:{Meta:<nil>}} Params:{Name:say_tts Arguments:map[text:Hello, world!] Meta:<nil>}}" 2025/03/23 22:41:49 DEBU Executing say command args="[--rate 200 Hello, world!]" 2025/03/23 22:41:49 INFO Speaking text text="Hello, world!"
{"jsonrpc":"2.0","id":3,"result":{"content":[{"type":"text","text":"Speaking: Hello, world!"}]}}

Test Google TTS

cat test/google_tts.json | go run main.go --verbose 2025/05/23 18:26:45 INFO Starting MCP server name="Say TTS Service" version="" 2025/05/23 18:26:45 DEBU Google TTS tool called request="{...}" 2025/05/23 18:26:45 DEBU Generating TTS audio model=gemini-2.5-flash-preview-tts voice=Kore text="Hello! This is a test of Google's TTS API. How does it sound?" 2025/05/23 18:26:49 INFO Playing TTS audio via beep speaker bytes=181006 2025/05/23 18:26:53 INFO Speaking via Google TTS text="Hello! This is a test of Google's TTS API. How does it sound?" voice=Kore
{"jsonrpc":"2.0","id":4,"result":{"content":[{"type":"text","text":"Speaking: Hello! This is a test of Google's TTS API. How does it sound? (via Google TTS with voice Kore)"}]}}

Test OpenAI TTS

cat test/openai_tts.json | go run main.go --verbose 2025/05/23 19:15:32 INFO Starting MCP server name="Say TTS Service" version="" 2025/05/23 19:15:32 DEBU OpenAI TTS tool called request="{...}" 2025/05/23 19:15:32 DEBU Generating OpenAI TTS audio model=tts-1 voice=nova speed=1.2 text="Hello! This is a test of OpenAI's text-to-speech API. I'm using the nova voice at 1.2x speed." 2025/05/23 19:15:34 DEBU Decoding MP3 stream from OpenAI 2025/05/23 19:15:34 DEBU Initializing speaker for OpenAI TTS sampleRate=22050 2025/05/23 19:15:36 INFO Speaking text via OpenAI TTS text="Hello! This is a test of OpenAI's text-to-speech API. I'm using the nova voice at 1.2x speed." voice=nova model=tts-1 speed=1.2
{"jsonrpc":"2.0","id":5,"result":{"content":[{"type":"text","text":"Speaking: Hello! This is a test of OpenAI's text-to-speech API. I'm using the nova voice at 1.2x speed. (via OpenAI TTS with voice nova)"}]}}

Test the mutex behavior with multiple TTS requests

# Sequential mode (default) - speeches play one after another cat test/sequential.json | go run main.go --verbose # Concurrent mode - allows overlapping speech cat test/sequential.json | go run main.go --verbose --sequential-tts=false

License

MIT Copyright (c) 2025 blacktop

Be the First to Experience MCP Now