OpenAI unveils three audio models for real-time voice tasks

OpenAI introduced three audio models for its developer platform on Thursday, aiming ​to make voice-based software agents more ‌conversational and capable of completing tasks in real time.

The launch of the application programming interface (API) ​moves the ChatGPT-maker beyond transcription and ​chat toward agents that can listen, translate ⁠and act during live conversations.

The new ​models are GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper. OpenAI ​said they are available to test in its developer playground.

GPT-Realtime-2 is designed to manage harder requests, call ​tools, handle interruptions and maintain context ​across longer voice sessions.

The second model supports translation ‌from ⁠more than 70 languages into 13 output languages, targeting customer support, education and other settings.

GPT-Realtime-Whisper provides live speech-to-text, allowing captions, meeting ​notes and ​workflow ⁠updates to be generated as a speaker talks.

Customers testing the models ​include online real estate marketplace ​Zillow, ⁠online travel agency Priceline and European telecommunications firm Deutsche Telekom.

Pricing for GPT-Realtime-2 starts at $32 ⁠per ​million audio input tokens, GPT-Realtime-Translate ​costs $0.034 per minute and GPT-Realtime-Whisper $0.017 per minute.