Audio & Speech Models
Speech-to-text, text-to-speech, and audio processing models.
19
Models
| Compare | Model | Creator | Context Length | Input Price | Output Price | Image Generation | Video Generation | Open Weight |
|---|
|
Codex Security
|
OpenAI | 1050000 tokens |
text:
$2.5
|
text:
$15
|
- | - | No | |
|
Gemini 2.5 Flash Native Audio
|
131072 tokens | - | - | - | - | No | ||
|
Gemini 2.5 Flash Preview TTS
|
8192 tokens | - | - | - | - | No | ||
|
Gemini 2.5 Pro Preview TTS
|
8192 tokens | - | - | - | - | No | ||
|
GPT-4o Audio Preview
|
OpenAI | 128000 tokens |
audio:
$40
text:
$2.5
|
audio:
$80
text:
$10
|
- | - | No | |
|
GPT-4o Mini Audio Preview
|
OpenAI | 128000 tokens |
audio:
$10
text:
$0.15
|
audio:
$20
text:
$0.6
|
- | - | No | |
|
GPT-4o Mini Realtime Preview
|
OpenAI | 16000 tokens |
audio:
$10
text:
$0.6
|
audio:
$20
text:
$2.4
|
- | - | No | |
|
GPT-4o Mini Transcribe
|
OpenAI | 16000 tokens |
audio:
$3
text:
$1.25
|
text:
$5
|
- | - | No | |
|
GPT-4o Mini TTS
|
OpenAI | 2000 tokens |
text:
$0.6
|
audio:
$12
|
- | - | No | |
|
GPT-4o Realtime Preview
|
OpenAI | 32000 tokens |
audio:
$40
text:
$5
|
audio:
$80
text:
$20
|
- | - | No | |
|
GPT-4o Transcribe
|
OpenAI | 16000 tokens |
audio:
$6
text:
$2.5
|
text:
$10
|
- | - | No | |
|
GPT-4o Transcribe Diarize
|
OpenAI | 16000 tokens |
audio:
$6
text:
$2.5
|
text:
$10
|
- | - | No | |
|
GPT-5.3 Instant
|
OpenAI | 400000 tokens |
text:
$1.75
|
text:
$14
|
- | - | No | |
|
GPT-Audio
|
OpenAI | 128000 tokens |
audio:
$32
text:
$2.5
|
audio:
$64
text:
$10
|
- | - | No | |
|
GPT-Audio 1.5
|
OpenAI | 128000 tokens |
audio:
$32
text:
$2.5
|
audio:
$64
text:
$10
|
- | - | No | |
|
GPT-Audio Mini
|
OpenAI | 128000 tokens |
text:
$0.6
|
text:
$2.4
|
- | - | No | |
|
GPT-Realtime 1.5
|
OpenAI | 32000 tokens |
audio:
$32
image:
$5
text:
$4
|
audio:
$64
text:
$16
|
- | - | No | |
|
TTS-1
|
OpenAI | - | - | - | - | - | No | |
|
TTS-1 HD
|
OpenAI | 100000 tokens |
text:
$30
|
audio:
$30
|
- | - | No |