Whisper
STDIOAdvanced audio transcription and processing server using Whisper and GPT-4o models.
Advanced audio transcription and processing server using Whisper and GPT-4o models.
A Model Context Protocol (MCP) server for advanced audio transcription and processing using OpenAI's Whisper and GPT-4o models.
MCP Server Whisper provides a standardized way to process audio files through OpenAI's latest transcription and speech services. By implementing the Model Context Protocol, it enables AI assistants like Claude to seamlessly interact with audio processing capabilities.
Key features:
Note: This project is unofficial and not affiliated with, endorsed by, or sponsored by OpenAI. It provides a Model Context Protocol interface to OpenAI's publicly available APIs.
# Clone the repository git clone https://github.com/arcaputo3/mcp-server-whisper.git cd mcp-server-whisper # Using uv uv sync # Set up pre-commit hooks uv run pre-commit install
Create a .env file based on the provided .env.example:
cp .env.example .env
Edit .env with your actual values:
OPENAI_API_KEY=your_openai_api_key
AUDIO_FILES_PATH=/path/to/your/audio/files
Note: Environment variables must be available at runtime. For local development with Claude, use a tool like dotenv-cli to load them (see Usage section below).
The project includes a .mcp.json configuration file for local development with Claude. To use it:
.env file is configured with the required environment variablesbunx dotenv-cli -- claude
This will:
.env file.mcp.jsonThe .mcp.json configuration:
{ "mcpServers": { "whisper": { "command": "uv", "args": ["run", "mcp-server-whisper"], "env": { "OPENAI_API_KEY": "${OPENAI_API_KEY}", "AUDIO_FILES_PATH": "${AUDIO_FILES_PATH}" } } } }
list_audio_files - Lists audio files with comprehensive filtering and sorting options:
FilePathSupportParams with full metadataget_latest_audio - Gets the most recently modified audio file with model support infoconvert_audio - Converts audio files to supported formats (mp3 or wav)
AudioProcessingResult with output pathcompress_audio - Compresses audio files that exceed size limits
AudioProcessingResult with output pathtranscribe_audio - Advanced transcription using OpenAI's models:
whisper-1, gpt-4o-transcribe, and gpt-4o-mini-transcribeTranscriptionResult with text, usage data, and optional timestampschat_with_audio - Interactive audio analysis using GPT-4o audio models:
gpt-4o-audio-preview (recommended) and dated versionsgpt-4o-mini-audio-preview has limitations with audio chat and is not recommendedChatResult with response texttranscribe_with_enhancement - Enhanced transcription with specialized templates:
detailed - Includes tone, emotion, and background detailsstorytelling - Transforms the transcript into a narrative formprofessional - Creates formal, business-appropriate transcriptionsanalytical - Adds analysis of speech patterns and key pointsTranscriptionResult with enhanced outputcreate_audio - Generate text-to-speech audio using OpenAI's TTS API:
gpt-4o-mini-tts (preferred) and other speech modelsTTSResult with output path| Model | Supported Formats | 
|---|---|
| Transcribe | flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm | 
| Chat | mp3, wav | 
Note: Files larger than 25MB are automatically compressed to meet API limits.
Claude, please transcribe my latest audio file with detailed insights.
Claude will automatically:
get_latest_audiotranscribe_with_enhancement using the "detailed" templateClaude, list all my audio files that are longer than 5 minutes and were created after January 1st, 2024, sorted by size.
Claude will:
list_audio_files with appropriate filters:
min_duration_seconds: 300  (5 minutes)min_modified_time: <timestamp for Jan 1, 2024>sort_by: "size"Claude, find all MP3 files with "interview" in the filename and create professional transcripts for each one.
Claude will:
list_audio_files with pattern and format filterstranscribe_with_enhancement tool calls (MCP handles parallelism natively)enhancement_type: "professional" and returns a typed TranscriptionResultClaude, create audio with this script: "Welcome to our podcast! Today we'll be discussing artificial intelligence trends in 2025." Use the shimmer voice.
Claude will:
create_audio tool with:
text_prompt containing the scriptvoice: "shimmer"model: "gpt-4o-mini-tts" (default high-quality model)instructions: "Speak in an enthusiastic, podcast host style" (optional)speed: 1.0 (default, can be adjusted)For production use with Claude Desktop (as opposed to local development), add this to your claude_desktop_config.json:
{ "mcpServers": { "whisper": { "command": "uvx", "args": ["mcp-server-whisper"], "env": { "OPENAI_API_KEY": "your_openai_api_key", "AUDIO_FILES_PATH": "/path/to/your/audio/files" } } } }
AUDIO_FILES_PATH to /Users/<user>/Movies/Omi Screen Recorder and replace <user> with your usernameThis project uses modern Python development tools including uv, pytest, ruff, and mypy.
# Run tests uv run pytest # Run with coverage uv run pytest --cov=src # Format code uv run ruff format src # Lint code uv run ruff check src # Run type checking (strict mode) uv run mypy --strict src # Run the pre-commit hooks pre-commit run --all-files
The project uses GitHub Actions for CI/CD:
Note: Python 3.14t is the free-threaded build (without GIL) for testing true parallelism.
The release workflow supports two approaches:
Option 1: Automated Release (Recommended)
Push a tag to automatically create a release and publish to PyPI:
# 1. Update version in pyproject.toml # Edit the version field manually, e.g., "1.0.0" -> "1.1.0" # 2. Update __version__ in src/mcp_server_whisper/__init__.py to match # 3. Update the lock file uv lock # 4. Commit the version bump git add pyproject.toml src/mcp_server_whisper/__init__.py uv.lock git commit -m "chore: bump version to 1.1.0" # 5. Create and push the version tag git tag v1.1.0 git push origin main git push origin v1.1.0
This will:
Option 2: Manual Release
Create a release manually via GitHub UI, then publish optionally:
When you publish the release, the workflow will automatically publish to PyPI. You can also create a draft release to delay publishing.
MCP Server Whisper follows a flat, type-safe API design optimized for MCP clients:
TranscriptionResult, ChatResult, AudioProcessingResult, TTSResult)This design makes it significantly easier for AI assistants to use the tools correctly and handle results reliably.
For detailed architecture information, see Architecture Documentation.
MCP Server Whisper is built on the Model Context Protocol, which standardizes how AI models interact with external tools and data sources. The server:
Under the hood, it uses:
pydub for audio file manipulation (with audioop-lts for Python 3.13+)anyio for structured concurrency and task group managementaioresult for collecting results from parallel task groupsContributions are welcome! Please follow these steps:
git checkout -b feature/amazing-feature)uv run pytest && uv run ruff check src && uv run mypy --strict src)git commit -m 'Add some amazing feature')git push origin feature/amazing-feature)This project is licensed under the MIT License - see the LICENSE file for details.