
Voice Mode
STDIOVoice-based interaction system for AI assistants using natural conversations
Voice-based interaction system for AI assistants using natural conversations
Install via:
uvx voice-mode
|pip install voice-mode
| getvoicemode.com
Natural voice conversations for AI assistants. Voice Mode brings human-like voice interactions to Claude, ChatGPT, and other LLMs through the Model Context Protocol (MCP).
Runs on: Linux • macOS • Windows (WSL) | Python: 3.10+
All you need to get started:
📖 Using a different tool? See our Integration Guides for Cursor, VS Code, Gemini CLI, and more!
npm install -g @anthropic-ai/claude-code curl -LsSf https://astral.sh/uv/install.sh | sh claude mcp add --scope user voice-mode uvx voice-mode export OPENAI_API_KEY=your-openai-key claude converse
Watch Voice Mode in action:
See Voice Mode working with Google's Gemini CLI (their implementation of Claude Code):
Once configured, try these prompts with Claude:
"Let's have a voice conversation"
"Ask me about my day using voice"
"Tell me a joke"
(Claude will speak and wait for your response)"Say goodbye"
(Claude will speak without waiting)The new converse
function makes voice interactions more natural - it automatically waits for your response by default.
Voice Mode works with your favorite AI coding assistants:
curl -LsSf https://astral.sh/uv/install.sh | sh
)sudo apt install python3-dev libasound2-dev libportaudio2 portaudio19-dev ffmpeg
sudo dnf install python3-devel alsa-lib-devel portaudio-devel ffmpeg
# Install Homebrew if not already installed /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" # Install dependencies brew install portaudio ffmpeg
Follow the Ubuntu/Debian instructions above within WSL.
# Using Claude Code (recommended) claude mcp add --scope user voice-mode uvx voice-mode # Using UV uvx voice-mode # Using pip pip install voice-mode
📖 Looking for detailed setup instructions? Check our comprehensive Integration Guides for step-by-step instructions for each tool!
Below are quick configuration snippets. For full installation and setup instructions, see the integration guides above.
claude mcp add voice-mode -- uvx voice-mode
Or with environment variables:
claude mcp add voice-mode --env OPENAI_API_KEY=your-openai-key -- uvx voice-mode
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{ "mcpServers": { "voice-mode": { "command": "uvx", "args": ["voice-mode"], "env": { "OPENAI_API_KEY": "your-openai-key" } } } }
Add to your Cline MCP settings:
Windows:
{ "mcpServers": { "voice-mode": { "command": "cmd", "args": ["/c", "uvx", "voice-mode"], "env": { "OPENAI_API_KEY": "your-openai-key" } } } }
macOS/Linux:
{ "mcpServers": { "voice-mode": { "command": "uvx", "args": ["voice-mode"], "env": { "OPENAI_API_KEY": "your-openai-key" } } } }
Add to your .continue/config.json
:
{ "experimental": { "modelContextProtocolServers": [ { "transport": { "type": "stdio", "command": "uvx", "args": ["voice-mode"], "env": { "OPENAI_API_KEY": "your-openai-key" } } } ] } }
Add to ~/.cursor/mcp.json
:
{ "mcpServers": { "voice-mode": { "command": "uvx", "args": ["voice-mode"], "env": { "OPENAI_API_KEY": "your-openai-key" } } } }
Add to your VS Code MCP config:
{ "mcpServers": { "voice-mode": { "command": "uvx", "args": ["voice-mode"], "env": { "OPENAI_API_KEY": "your-openai-key" } } } }
{ "mcpServers": { "voice-mode": { "command": "uvx", "args": ["voice-mode"], "env": { "OPENAI_API_KEY": "your-openai-key" } } } }
Add to your Zed settings.json:
{ "context_servers": { "voice-mode": { "command": { "path": "uvx", "args": ["voice-mode"], "env": { "OPENAI_API_KEY": "your-openai-key" } } } } }
Add to your Roo Code MCP configuration:
{ "mcpServers": { "voice-mode": { "command": "uvx", "args": ["voice-mode"], "env": { "OPENAI_API_KEY": "your-openai-key" } } } }
docker run -it --rm \ -e OPENAI_API_KEY=your-openai-key \ --device /dev/snd \ -v /tmp/.X11-unix:/tmp/.X11-unix \ -e DISPLAY=$DISPLAY \ ghcr.io/mbailey/voicemode:latest
pipx install voice-mode
git clone https://github.com/mbailey/voicemode.git cd voicemode pip install -e .
Tool | Description | Key Parameters |
---|---|---|
converse | Have a voice conversation - speak and optionally listen | message , wait_for_response (default: true), listen_duration (default: 30s), transport (auto/local/livekit) |
listen_for_speech | Listen for speech and convert to text | duration (default: 5s) |
check_room_status | Check LiveKit room status and participants | None |
check_audio_devices | List available audio input/output devices | None |
start_kokoro | Start the Kokoro TTS service | models_dir (optional, defaults to ~/Models/kokoro) |
stop_kokoro | Stop the Kokoro TTS service | None |
kokoro_status | Check the status of Kokoro TTS service | None |
Note: The converse
tool is the primary interface for voice interactions, combining speaking and listening in a natural flow.
The only required configuration is your OpenAI API key:
export OPENAI_API_KEY="your-key"
# Custom STT/TTS services (OpenAI-compatible) export STT_BASE_URL="http://127.0.0.1:2022/v1" # Local Whisper export TTS_BASE_URL="http://127.0.0.1:8880/v1" # Local TTS export TTS_VOICE="alloy" # Voice selection # LiveKit (for room-based communication) # See docs/livekit/ for setup guide export LIVEKIT_URL="wss://your-app.livekit.cloud" export LIVEKIT_API_KEY="your-api-key" export LIVEKIT_API_SECRET="your-api-secret" # Debug mode export VOICEMODE_DEBUG="true" # Save all audio (TTS output and STT input) export VOICEMODE_SAVE_AUDIO="true" # Audio format configuration (default: pcm) export VOICEMODE_AUDIO_FORMAT="pcm" # Options: pcm, mp3, wav, flac, aac, opus export VOICEMODE_TTS_AUDIO_FORMAT="pcm" # Override for TTS only (default: pcm) export VOICEMODE_STT_AUDIO_FORMAT="mp3" # Override for STT upload # Format-specific quality settings export VOICEMODE_OPUS_BITRATE="32000" # Opus bitrate (default: 32kbps) export VOICEMODE_MP3_BITRATE="64k" # MP3 bitrate (default: 64k)
Voice Mode uses PCM audio format by default for TTS streaming for optimal real-time performance:
The audio format is automatically validated against provider capabilities and will fallback to a supported format if needed.
For privacy-focused or offline usage, Voice Mode supports local speech services:
These services provide the same API interface as OpenAI, allowing seamless switching between cloud and local processing.
By strictly adhering to OpenAI's API standard, Voice Mode enables powerful deployment flexibility:
BASE_URL
- no code changes requiredExample: Simply set OPENAI_BASE_URL
to point to your custom router:
export OPENAI_BASE_URL="https://router.example.com/v1" export OPENAI_API_KEY="your-key" # Voice Mode now uses your router for all OpenAI API calls
The OpenAI SDK handles this automatically - no Voice Mode configuration needed!
┌─────────────────────┐ ┌──────────────────┐ ┌─────────────────────┐
│ Claude/LLM │ │ LiveKit Server │ │ Voice Frontend │
│ (MCP Client) │◄────►│ (Optional) │◄───►│ (Optional) │
└─────────────────────┘ └──────────────────┘ └─────────────────────┘
│ │
│ │
▼ ▼
┌─────────────────────┐ ┌──────────────────┐
│ Voice MCP Server │ │ Audio Services │
│ • converse │ │ • OpenAI APIs │
│ • listen_for_speech│◄───►│ • Local Whisper │
│ • check_room_status│ │ • Local TTS │
│ • check_audio_devices └──────────────────┘
└─────────────────────┘
curl -LsSf https://astral.sh/uv/install.sh | sh
OPENAI_API_KEY
is set correctlyEnable detailed logging and audio file saving:
export VOICEMODE_DEBUG=true
Debug audio files are saved to: ~/voicemode_recordings/
Run the diagnostic script to check your audio setup:
python scripts/diagnose-wsl-audio.py
This will check for required packages, audio services, and provide specific recommendations.
To save all audio files (both TTS output and STT input):
export VOICEMODE_SAVE_AUDIO=true
Audio files are saved to: ~/voicemode_audio/
with timestamps in the filename.
MIT - A Failmode Project