Skip to content
/
Posts5/1/2026 by Jacky Liang

New Audio APIs for Speech and Transcription

New Audio APIs for Speech and Transcription

OpenRouter now has two dedicated audio endpoints: /api/v1/audio/speech for text-to-speech and /api/v1/audio/transcriptions for speech-to-text.

These new endpoints deliver specialized models that are generally faster and more cost-efficient than the general audio models(opens in new tab) we already support, but are more narrowly useful for specific audio tasks.

You can now generate speech from text with OpenAI, Google, or Mistral voices and transcribe audio files with OpenAI Whisper. All with the same routing, billing, and key management you already use for text, video and image generation.

Speech models(opens in new tab) · Transcription models(opens in new tab) · Speech docs(opens in new tab) · Transcription docs(opens in new tab)

Choosing a model: Audio vs. Speech vs. Transcription

The choice of models is a balance of specialization, cost, and speed. We've enabled access to the breadth of options so you can choose the right path for each use case:

Audio modelsSpeech modelsTranscription models
What it doesUnderstands audio input and reasons over it, like a voice-native LLMConverts text into lifelike spoken audioConverts audio into text
Input → OutputText/audio → text/audioText → audioAudio → text
Best forVoice agents, mixed-modality conversations, audio Q&AReading text aloud with built-in voices and streamingMeeting notes, subtitles, feeding voice input into text pipelines
Endpoint/chat/completions/audio/speech/audio/transcriptions
Trade-offsMore powerful but heavier and more expensiveSimpler, faster, cheaper (no reasoning needed)Purpose-built for accuracy across languages and accents
Browse modelsAudio models(opens in new tab)Speech models(opens in new tab)Transcription models(opens in new tab)
DocsAudio output guide(opens in new tab)Speech docs(opens in new tab)Transcription docs(opens in new tab)

Try it in the Playground

Both Speech and Transcription have dedicated Playground tabs on model pages (here's GPT-4o Mini TTS's Playground(opens in new tab) and GPT-4o Transcribe's Playground(opens in new tab) as examples). For speech models, pick a voice from the dropdown, type your text, and hear the result. For transcription models, drag and drop an audio file and see the transcription.

Each model page also shows quickstart code in Python, TypeScript, curl, and the OpenRouter SDK, so you can copy a working example and have audio running in your app in minutes.

Getting started with Speech models

Send text, get audio back. The response is a raw byte stream you can pipe straight to a file or audio player.


Speech providers currently include OpenAI (GPT-4o Mini TTS(opens in new tab)), Google (Gemini Flash TTS(opens in new tab)), and Mistral (Voxtral Mini TTS(opens in new tab)). Each model brings its own voice set, and you can browse available voices on each model's page. Output comes in MP3 or PCM format.

Provider-specific options pass through cleanly. For example, OpenAI's speech models accept an instructions field for tone control (e.g., "speak in a warm, friendly tone").

Getting started with Transcription models

The transcription endpoint takes a base64-encoded audio file and returns text. It supports WAV, MP3, FLAC, and other common formats.


Transcription providers currently include OpenAI (Whisper(opens in new tab), GPT-4o Transcribe(opens in new tab), GPT-4o Mini Transcribe(opens in new tab)), Google (Chirp 3(opens in new tab)), and Groq (with their fast Whisper(opens in new tab) inference). You can optionally pass a language hint to improve accuracy for non-English audio.

What's next

We're actively adding more providers and voices. If there's a speech or transcription model you want to see on OpenRouter, tell us on Discord(opens in new tab).

OpenRouter
© 2026 OpenRouter, Inc

Product

  • Chat
  • Rankings
  • Apps
  • Models
  • Providers
  • Pricing
  • Enterprise
  • Labs

Company

  • About
  • Announcements
  • CareersHiring
  • Privacy
  • Terms of Service
  • Support
  • State of AI
  • Works With OR
  • Data

Developer

  • Documentation
  • API Reference
  • SDK
  • Status

Connect

  • Discord
  • GitHub
  • LinkedIn
  • X
  • YouTube