top of page

Say hello to GPT-4o !?

  • May 13, 2024
  • 2 min read

Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time.


Prior to GPT-4o, you could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. To achieve this, Voice Mode is a pipeline of three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio. This process means that the main source of intelligence, GPT-4, loses a lot of information—it can’t directly observe tone, multiple speakers, or background noises, and it can’t output laughter, singing, or express emotion.

With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.


Model availability

GPT-4o is our latest step in pushing the boundaries of deep learning, this time in the direction of practical usability. We spent a lot of effort over the last two years working on efficiency improvements at every layer of the stack. As a first fruit of this research, we’re able to make a GPT-4 level model available much more broadly. GPT-4o’s capabilities will be rolled out iteratively (with extended red team access starting today).

GPT-4o’s text and image capabilities are starting to roll out today in ChatGPT. We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits. We'll roll out a new version of Voice Mode with GPT-4o in alpha within ChatGPT Plus in the coming weeks.

Developers can also now access GPT-4o in the API as a text and vision model. GPT-4o is 2x faster, half the price, and has 5x higher rate limits compared to GPT-4 Turbo. We plan to launch support for GPT-4o's new audio and video capabilities to a small group of trusted partners in the API in the coming weeks.





ad

MOFFATT AND THE AI.

Visit the gallery.

Print on Canvas available



 
 
 

Comments


  • Facebook
  • TikTok

SofaTV Network is a content curation site.
Content curation (etymologically from the Latin curare: to take care and from the English content curation or data curation) is a practice that consists of selecting, editing and sharing the most relevant content on the Web for a given request or subject. Curation is used and claimed by sites that wish to offer greater visibility and better readability to content (texts, documents, images, videos, sounds, etc.) that they deem useful to Internet users and whose sharing can help them or interest them.

Sofa Tv Network is optimized for desktop and tablet computers. We still work on the mobile version.
SofaTv also invites you to connect your devices to your big screen for quality listening. Enjoy our proposals.

SofaTv.ca

© 2023 par SofaTv.ca

bottom of page