OpenAI has introduced its latest generative AI innovation, the GPT-4o model, which is set to be integrated into both consumer and developer products over the coming weeks.The "o" in GPT-4o symbolizes its "omni" capability, indicating its extended functionality across multiple modes of communication including text, vision, and now, audio.
During a recent event at OpenAI's San Francisco headquarters, CTO Mira Murati highlighted that the GPT-4o not only matches the intelligence of its predecessor, GPT-4, but also enhances it, allowing for more dynamic interaction across different media forms. "GPT-4o can process voice, text, and visual information," Murati stated, emphasizing the futuristic vision of human-machine interaction.
Previously, the GPT-4 Turbo model, an advanced iteration of GPT-4, handled tasks involving both text and images. It was adept at extracting text from pictures and interpreting image content. However, the new GPT-4o introduces audio processing to its skill set, enhancing the versatility of the applications.
One significant improvement is the enhancement of the ChatGPT experience. ChatGPT, OpenAI’s widely used AI chatbot, benefits immensely from GPT-4o's capabilities, particularly in how it handles voice interactions. It now supports interruptions during responses, recognizes emotional cues in the user's voice, and responds in various emotive tones, including singing.
Additionally, GPT-4o extends its capabilities in image processing. For instance, it can analyze photographs or computer screens to swiftly provide relevant information about the contents, such as identifying software code specifics or determining apparel brands.
Murati also touched on future enhancements, suggesting that GPT-4o could eventually offer real-time explanations of live events, like sports games, directly through ChatGPT. The model also boasts improved multilingual support, performing faster and at a lower cost than previous versions in OpenAI's API, despite not all features, like voice, being immediately available to every user due to potential misuse concerns.
Starting today, GPT-4o is accessible in the free tier of ChatGPT, and to those subscribed to OpenAI's premium services, ChatGPT Plus and Team plans, which now enjoy significantly higher interaction limits.
Furthermore, OpenAI is revamping the ChatGPT user interface on the web to enhance user experience, introducing a more interactive home screen and message layout. A desktop application for macOS, which facilitates question-asking through keyboard shortcuts and discussion of screenshots, is also rolling out, with a Windows version expected later this year.
Additionally, the GPT Store, a repository of third-party chatbots powered by OpenAI’s models, is now accessible to users of the free tier of ChatGPT.
These users will also enjoy new features, such as a memory function that allows ChatGPT to retain user preferences across sessions.