OpenAI Unveils Three New Audio Models. What They Can Do and Which Businesses Are Already Using Them
OpenAI introduced three audio models in the API — GPT‑Realtime‑2, GPT‑Realtime‑Translate, and GPT‑Realtime‑Whisper.
OpenAI introduced three audio models in the API — GPT‑Realtime‑2, GPT‑Realtime‑Translate, and GPT‑Realtime‑Whisper.
OpenAI introduced three audio models in the API — GPT‑Realtime‑2, GPT‑Realtime‑Translate, and GPT‑Realtime‑Whisper.
«The models we’re launching take real-time audio from simple call and response to voice interfaces that can actually do things: listen, reason, translate, transcribe, and take other actions during a conversation,» the company said in a blog post announcing the models.
At the same time, GPT‑Realtime‑2 is called the first voice model with a GPT‑5-class reasoning system that can process complex queries and naturally conduct a conversation.
GPT-Realtime-Translate is a new model for live translation that can translate user speech from over 70 input languages into 13 output languages, while keeping up with the speaker.
In turn, GPT-Realtime-Whisper includes new streaming speech-to-text functions, and transcribes speech in real time as the speaker speaks.
«As voice becomes a more natural way to use software, we’re seeing developers build their products around three new voice AI models,» OpenAI says.
According to the company, the audio models are already being tested by large businesses — clients include online real estate site Zillow, online travel agency Priceline, and telecommunications company Deutsche Telekom.
GPT-Realtime-2 pricing starts at $32 per million audio inbound tokens, GPT-Realtime-Translate costs $0.034 per minute, and GPT-Realtime-Whisper costs $0.017 per minute.
The day before, dev.ua also reported that OpenAI had updated the default ChatGPT model: GPT-5.5 Instant hallucinates 52% less often and responds shorter.



