Voice Recorder MCP Server

An MCP server for recording audio and transcribing it using OpenAI's Whisper model. Designed to work as a Goose custom extension or standalone MCP server.

Features

Record audio from the default microphone
Transcribe recordings using Whisper
Integrates with Goose AI agent as a custom extension
Includes prompts for common recording scenarios

Installation

# Install from source
git clone https://github.com/DefiBax/voice-recorder-mcp.git
cd voice-recorder-mcp
pip install -e .

Usage

As a Standalone MCP Server

# Run with default settings (base.en model)
voice-recorder-mcp

# Use a specific Whisper model
voice-recorder-mcp --model medium.en

# Adjust sample rate
voice-recorder-mcp --sample-rate 44100

Testing with MCP Inspector

The MCP Inspector provides an interactive interface to test your server:

# Install the MCP Inspector
npm install -g @modelcontextprotocol/inspector

# Run your server with the inspector
npx @modelcontextprotocol/inspector voice-recorder-mcp

With Goose AI Agent

Open Goose and go to Settings > Extensions > Add > Command Line Extension
Set the name to voice-recorder
In the Command field, enter the full path to the voice-recorder-mcp executable:
```
/full/path/to/voice-recorder-mcp
```
Or for a specific model:
```
/full/path/to/voice-recorder-mcp --model medium.en
```
To find the path, run:
```
which voice-recorder-mcp
```
No environment variables are needed for basic functionality
Start a conversation with Goose and introduce the recorder with: "I want you to take action from transcriptions returned by voice-recorder. For example, if I dictate a calculation like 1+1, please return the result."

Available Tools

start_recording: Start recording audio from the default microphone
stop_and_transcribe: Stop recording and transcribe the audio to text
record_and_transcribe: Record audio for a specified duration and transcribe it

Whisper Models

This extension supports various Whisper model sizes:

Model	Speed	Accuracy	Memory Usage	Use Case
`tiny.en`	Fastest	Lowest	Minimal	Testing, quick transcriptions
`base.en`	Fast	Good	Low	Everyday use (default)
`small.en`	Medium	Better	Moderate	Good balance
`medium.en`	Slow	High	High	Important recordings
`large`	Slowest	Highest	Very High	Critical transcriptions

The .en suffix indicates models specialized for English, which are faster and more accurate for English content.

Requirements

Python 3.12+
An audio input device (microphone)

Configuration

You can configure the server using environment variables:

# Set Whisper model
export WHISPER_MODEL=small.en

# Set audio sample rate
export SAMPLE_RATE=44100

# Set maximum recording duration (seconds)
export MAX_DURATION=120

# Then run the server
voice-recorder-mcp

Troubleshooting

Common Issues

No audio being recorded: Check your microphone permissions and settings
Model download errors: Ensure you have a stable internet connection for the initial model download
Integration with Goose: Make sure the command path is correct
Audio quality issues: Try adjusting the sample rate (default: 16000)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

语音录制