Voice Recorder Whisper
STDIOMCP server for recording audio and transcribing with OpenAI's Whisper model.
MCP server for recording audio and transcribing with OpenAI's Whisper model.
An MCP server for recording audio and transcribing it using OpenAI's Whisper model. Designed to work as a Goose custom extension or standalone MCP server.
# Install from source git clone https://github.com/DefiBax/voice-recorder-mcp.git cd voice-recorder-mcp pip install -e .
# Run with default settings (base.en model) voice-recorder-mcp # Use a specific Whisper model voice-recorder-mcp --model medium.en # Adjust sample rate voice-recorder-mcp --sample-rate 44100
The MCP Inspector provides an interactive interface to test your server:
# Install the MCP Inspector npm install -g @modelcontextprotocol/inspector # Run your server with the inspector npx @modelcontextprotocol/inspector voice-recorder-mcp
Open Goose and go to Settings > Extensions > Add > Command Line Extension
Set the name to voice-recorder
In the Command field, enter the full path to the voice-recorder-mcp executable:
/full/path/to/voice-recorder-mcp
Or for a specific model:
/full/path/to/voice-recorder-mcp --model medium.en
To find the path, run:
which voice-recorder-mcp
No environment variables are needed for basic functionality
Start a conversation with Goose and introduce the recorder with: "I want you to take action from transcriptions returned by voice-recorder. For example, if I dictate a calculation like 1+1, please return the result."
start_recording
: Start recording audio from the default microphonestop_and_transcribe
: Stop recording and transcribe the audio to textrecord_and_transcribe
: Record audio for a specified duration and transcribe itThis extension supports various Whisper model sizes:
Model | Speed | Accuracy | Memory Usage | Use Case |
---|---|---|---|---|
tiny.en | Fastest | Lowest | Minimal | Testing, quick transcriptions |
base.en | Fast | Good | Low | Everyday use (default) |
small.en | Medium | Better | Moderate | Good balance |
medium.en | Slow | High | High | Important recordings |
large | Slowest | Highest | Very High | Critical transcriptions |
The .en
suffix indicates models specialized for English, which are faster and more accurate for English content.
You can configure the server using environment variables:
# Set Whisper model export WHISPER_MODEL=small.en # Set audio sample rate export SAMPLE_RATE=44100 # Set maximum recording duration (seconds) export MAX_DURATION=120 # Then run the server voice-recorder-mcp
Contributions are welcome! Please feel free to submit a Pull Request.
git checkout -b feature/amazing-feature
)git commit -m 'Add some amazing feature'
)git push origin feature/amazing-feature
)This project is licensed under the MIT License - see the LICENSE file for details.