语音录制与转写
STDIO用于录音和语音转写的MCP服务器
用于录音和语音转写的MCP服务器
An MCP server for recording audio and transcribing it using OpenAI's Whisper model. Designed to work as a Goose custom extension or standalone MCP server.
# Install from source git clone https://github.com/DefiBax/voice-recorder-mcp.git cd voice-recorder-mcp pip install -e .
# Run with default settings (base.en model) voice-recorder-mcp # Use a specific Whisper model voice-recorder-mcp --model medium.en # Adjust sample rate voice-recorder-mcp --sample-rate 44100
The MCP Inspector provides an interactive interface to test your server:
# Install the MCP Inspector npm install -g @modelcontextprotocol/inspector # Run your server with the inspector npx @modelcontextprotocol/inspector voice-recorder-mcp
Open Goose and go to Settings > Extensions > Add > Command Line Extension
Set the name to voice-recorder
In the Command field, enter the full path to the voice-recorder-mcp executable:
/full/path/to/voice-recorder-mcp
Or for a specific model:
/full/path/to/voice-recorder-mcp --model medium.en
To find the path, run:
which voice-recorder-mcp
No environment variables are needed for basic functionality
Start a conversation with Goose and introduce the recorder with: "I want you to take action from transcriptions returned by voice-recorder. For example, if I dictate a calculation like 1+1, please return the result."
start_recording
: Start recording audio from the default microphonestop_and_transcribe
: Stop recording and transcribe the audio to textrecord_and_transcribe
: Record audio for a specified duration and transcribe itThis extension supports various Whisper model sizes:
Model | Speed | Accuracy | Memory Usage | Use Case |
---|---|---|---|---|
tiny.en | Fastest | Lowest | Minimal | Testing, quick transcriptions |
base.en | Fast | Good | Low | Everyday use (default) |
small.en | Medium | Better | Moderate | Good balance |
medium.en | Slow | High | High | Important recordings |
large | Slowest | Highest | Very High | Critical transcriptions |
The .en
suffix indicates models specialized for English, which are faster and more accurate for English content.
You can configure the server using environment variables:
# Set Whisper model export WHISPER_MODEL=small.en # Set audio sample rate export SAMPLE_RATE=44100 # Set maximum recording duration (seconds) export MAX_DURATION=120 # Then run the server voice-recorder-mcp
Contributions are welcome! Please feel free to submit a Pull Request.
git checkout -b feature/amazing-feature
)git commit -m 'Add some amazing feature'
)git push origin feature/amazing-feature
)This project is licensed under the MIT License - see the LICENSE file for details.