OpenAI launches GPT-Realtime-2 with live translation and transcription features

OpenAI announced on Thursday that its API now incorporates several new voice intelligence features to help developers build applications that can talk, transcribe, and translate conversations with users.

The company’s latest GPT‑Realtime‑2 is an additional voice model designed to produce realistic vocal simulations for conversational interactions. Unlike its predecessor, GPT-Realtime-1.5, this version is built with GPT‑5‑level reasoning, which OpenAI claims enables it to handle more complex user requests.

Additionally, the company is introducing GPT‑Realtime‑Translate, which is intended to offer real-time translation services that keep pace with ongoing conversations. This feature supports over 70 input languages and 13 output languages.

Furthermore, a new transcription feature called GPT-Realtime-Whisper has been launched, providing users with live speech-to-text capabilities during interactions.

“Together, these models advance real-time audio from simple call-and-response interactions to voice interfaces that can listen, reason, translate, transcribe, and take action as conversations develop,” the company stated.

These updates are likely to benefit companies looking to enhance customer service capabilities. However, OpenAI also highlights that its new features could be applied across various sectors such as education, media, events, and creator platforms.

While these tools offer significant enterprise potential, there is also a possibility of misuse. The company has implemented guardrails to prevent abuse, such as creating spam, fraud, or other forms of online harm. Certain triggers are embedded to stop conversations that violate harmful content guidelines, OpenAI explained.

All the new voice models are accessible through OpenAI’s Realtime API. The Translate and Whisper features are billed per minute, whereas GPT-Realtime-2 is billed based on token usage.