OpenAI is reportedly working on a new AI model optimized for audio with more natural speech.
OpenAI is reportedly developing a new AI model specifically optimized for audio applications. The focus is said to be on natural-sounding speech and real-time interaction.
The Information reports this, based on sources familiar with the plans. The model should perform better than OpenAI’s current audio models, especially in conversations requiring rapid interaction.
More natural speech
According to the report, OpenAI is building the new model on a new architecture. The company’s current real-time audio model, GPT-realtime, is based on the classic transformer architecture. It is still unclear whether OpenAI will completely abandon transformers, or opt for a modified or more efficient variant.
Some systems process speech directly, while similar models first convert audio to spectrograms. As with Whisper and other audio models, it is likely that OpenAI will offer multiple variants of the new model, with different quality and performance profiles.
io Products
OpenAI is reportedly working on an audio device that should be released within about a year. According to earlier reports, this could grow into a complete product line, including a smart speaker and smart glasses.
That ambition is supported by the acquisition of io Products. That is the design studio of Jony Ive, which OpenAI valued at $6.5 billion last year. The Financial Times reported in October that Ive is working on a compact device that will be placed on a desk or table.
The model is expected to be launched by the end of March
