Why OpenAI's Announcement Was A Bigger Deal Than People Think
Summary
TLDROpenAI's recent product event introduced a divisive update that has sparked significant debate. The event, initially thought to reveal a new search engine or personal assistant, unveiled GPT-4 Omni, a model with GPT-4 level intelligence that operates faster with enhanced real-time interaction capabilities across audio, vision, and text. This model, described as a significant step towards natural human-computer interaction, can accept and generate various input and output combinations, including text, audio, and images, with impressive response times. Additionally, the update made GPT-4 level models accessible for free, offering five times the capacity for paying users and prioritizing them for new features. The live demos showcased the model's conversational abilities, emotional awareness, and vision capabilities. While reactions varied, with some underwhelmed and others impressed by the model's capabilities, the update's significance lies in its transformative potential for human-computer interaction, its free accessibility, and its truly native multimodal functionality. OpenAI's CEO, Sam Altman, emphasized the company's mission to provide capable AI tools for free or at a great price, and the potential for AI to enable unprecedented productivity and interaction with technology.
Takeaways
- 📅 OpenAI held a significant product event, which was highly anticipated and divisive among the audience.
- 🚀 The event introduced a new flagship model called GPT-4 Omni, which is described as having GPT-4 level intelligence but faster and with improved interaction capabilities.
- 🔊 GPT-4 Omni is capable of real-time processing across audio, vision, and text, and can accept and generate a combination of text, audio, and image inputs.
- 🏎️ The model has a quick response time to audio inputs, averaging around 320 milliseconds, which is comparable to human conversational response times.
- 🆓 OpenAI made a GPT-4 level model available for free, which is a substantial increase in accessibility for all users.
- 📈 The update also included a 50% reduction in the cost of the API, making it more accessible for developers.
- 🎉 Live demos showcased the real-time conversational abilities, emotional awareness, and the ability to generate voice in various styles.
- 🖼️ GPT-4 Omni's new vision capabilities were demonstrated through solving a linear equation and describing what was seen on the screen after code execution.
- 🗣️ Real-time translation and emotion recognition from facial expressions were also demonstrated, highlighting the model's multimodal capabilities.
- 🤔 Reactions to the event were mixed, with some expressing disappointment while others found the updates to be groundbreaking and magical.
- 🌐 OpenAI's CEO, Sam Altman, emphasized the mission to provide capable AI tools for free or at a low cost, and positioned the new voice and video mode as a significant leap in human-computer interaction.
Q & A
What was the main focus of OpenAI's spring update event?
-The main focus of OpenAI's spring update event was the announcement of their new flagship model, GPT-40, which is a multimodal model capable of processing text, audio, and visual inputs simultaneously.
Why was the GPT-40 model described as divisive?
-The GPT-40 model was described as divisive because it sparked mixed reactions regarding its capabilities and the level of innovation it presented, with some people feeling underwhelmed compared to previous OpenAI releases.
What are the significant features of GPT-40?
-Significant features of GPT-40 include its ability to process inputs and generate outputs across text, audio, and image modalities in real time, its high speed response similar to human conversational times, and its enhanced voice modulation and emotional awareness.
How did OpenAI enhance accessibility with GPT-40?
-OpenAI enhanced accessibility by making GPT-40 available to free users, providing access to a GPT-4 level model, custom GPTs, and the GPT store, previously available only to paying users.
What does the 'Omni' in GPT-40 stand for?
-In GPT-40, the 'O' stands for 'Omni', indicating the model's capability to operate across multiple modalities (text, audio, vision) simultaneously, aiming for a more natural human-computer interaction.
What was the public's reaction to the live demos of GPT-40 during the event?
-The live demos received mixed reactions. Some attendees were impressed by the real-time capabilities and the natural-sounding AI voice, while others found the updates underwhelming compared to previous demonstrations like Google's duplex demo.
How did OpenAI address the expectations surrounding GPT 4.5 or GPT 5 at the event?
-OpenAI made it clear prior to the event that they would not be releasing GPT 4.5 or GPT 5, setting the stage for the introduction of GPT-40 instead.
What does the reduced API cost with the introduction of GPT-40 imply for developers?
-The reduced API cost by 50% with the introduction of GPT-40 implies that developers and businesses can integrate OpenAI's capabilities into their services at a lower cost, potentially broadening the model's usage and accessibility.
How does GPT-40 handle real-time translations during the demos?
-During the demos, GPT-40 showcased its ability to perform real-time translations effectively. For example, it translated spoken English into Italian instantaneously, demonstrating its proficiency in handling live multilingual communication.
What future enhancements did Sam Altman highlight regarding GPT-40?
-Sam Altman highlighted potential future enhancements like adding personalization, improving access to information, and enabling the AI to take actions on behalf of users, which he believes will significantly enrich the human-computer interaction experience.
Outlines
📢 OpenAI's Spring Update: A Divisive Milestone
The video discusses OpenAI's recent product event, which introduced updates that have sparked varied reactions. The event was initially anticipated to reveal a search engine to rival Google, but instead, it focused on a personal assistant update with enhanced voice features. The presentation was notable for the absence of Sam Altman, suggesting a potential shift in the company's direction. The update included a chat GPT desktop app, an updated user interface, and the introduction of GPT 40, a model with GPT 4-level intelligence that processes audio, vision, and text in real-time. The model's capabilities were demonstrated through various live demos, showcasing its speed, emotional responsiveness, and multimodal functionality. Despite initial skepticism, the update's significance lies in its potential to redefine human-computer interaction and its accessibility, with free access to a GPT 4-level model for all users.
🤖 GPT 40: Multimodal Magic and Mixed Receptions
The second paragraph delves into the technical aspects and public reception of GPT 40. It highlights the model's real-time conversational abilities, its emotional awareness, and its new vision capabilities. The paragraph also discusses the accessibility of the technology, with free users gaining access to a GPT 4 level model and paying users receiving increased capacity limits. The API was also made more affordable, dropping by 50%. Reactions to the update varied widely, with some critics finding it underwhelming compared to Google's offerings, while others were impressed by its capabilities. The paragraph also touches on the potential strategic timing of the announcement, aimed at preempting similar developments from Apple and Google. The significance of GPT 40's native multimodality is emphasized, as it processes all modalities within a single neural network, offering real-time voice translation and advanced image generation.
🚀 The Future of AI Interaction: OpenAI's Bold Bet
The final paragraph of the script reflects on the broader implications of OpenAI's update and the varied reactions from users and industry experts. It emphasizes the transformative potential of making a high-quality AI model freely accessible and the company's commitment to a new mode of human-computer interaction. The paragraph also speculates on the strategic timing of the announcement in relation to Google IO and Apple's ecosystem developments. The discussion includes the potential impact on productivity and society, with some commentators suggesting that the update may be more significant than initially perceived. The paragraph concludes by acknowledging the uncertainty of how these technologies will be adopted in the real world but asserts that OpenAI's update represents a significant step towards the future of AI interaction.
Mindmap
Keywords
💡OpenAI
💡Product Event
💡GPT-4 Omni
💡Multimodality
💡Real-time Interaction
💡Accessibility
💡API
💡Emotion Recognition
💡Personal Assistant
💡Demos
💡Divisive
Highlights
OpenAI held a product event that was highly anticipated and divisive, focusing on the OpenAI Spring Update.
Speculation suggested a potential search engine to compete with Google and updates to personal assistant features, particularly voice capabilities.
Sam Altman was not the presenter, indicating the possibility of a less significant announcement than expected.
CTO Mira Muradi announced three key components: a chat GPT desktop app, an updated chat GPT UI, and a new flagship model called GPT-4 Omni.
GPT-4 Omni is described as having GPT-4 level intelligence, faster response times, and improved interaction methods.
The model can reason across audio, vision, and text in real-time and is designed for more natural human-computer interaction.
GPT-4 Omni can accept and generate a combination of text, audio, and image inputs and outputs.
Response times to audio inputs are as fast as 232 milliseconds, comparable to human conversational response times.
Free users now have access to a GPT-4 level model, with paying users gaining five times the capacity limits and priority for new features.
The API for GPT-4 Omni will be 50% cheaper, making it more accessible for developers.
Live demos showcased the real-time conversational capabilities, including emotional awareness and voice modulation.
GPT-4 Omni demonstrated advanced capabilities such as solving equations, real-time translation, and emotion recognition from facial expressions.
The update was met with mixed reactions, with some finding it underwhelming while others were impressed by its capabilities.
Sam Altman emphasized the mission to provide capable AI tools for free or at a great price, and the potential for AI to create benefits for the world.
The new voice and video mode is considered a significant leap in computer interfaces, resembling AI from movies with human-like response times and expressiveness.
The update represents a transformation in accessibility, multimodality, and a new mode of human-computer interaction.
GPT-4 Omni's native multimodality allows for processing text, audio, and vision in a single neural network, offering real-time voice translation as a special case.
The update is seen as a strategic move to counter potential competition from Apple and Google, who are also integrating AI into their voice assistance systems.
Despite initial reactions, some believe the update is underrated and will have a significant impact on productivity and the future of AI interaction.
Transcripts
open aai just held a product event and
it's easily their most divisive yet in
this video we're going to talk about why
it was actually a bigger deal than it
might seem at first welcome back to the
AI Daily Brief today is one of those
days kind of the opposite of some of the
ones we've had recently where everyone
is talking about just one thing and so
instead of doing our whole normal brief
than main episode sort of conversation
we are just going to focus on the big
thing that everyone is talking about
which is of course open AI spring update
now this is the event that has been
rumored for a couple of weeks for a
while there was speculation that we were
going to see a search engine some sort
of competition with Google and
perplexity but towards the end of last
week as the event apparently got delayed
a couple of days it started to come into
view that the most likely candidate was
some sort of personal assistant update
particularly around voice features now
this I believe will go down as one of
the most divisive initially product
updates that open AI has ever released
so what we're going to do on this show
is first we're going to talk about what
they actually shared and then we'll get
into the reactions and why I think it's
actually more significant not less
significant than it seems at first right
away the first thing you noticed when it
kicked off was that Sam Alman was not
the one presenting I could be totally
wrong but I initially took this as a
sign that perhaps it wasn't going to be
as big an announcement as we might have
thought sort of with the idea that they
were keeping Sam in the background for
the big major updates like GPT 4.5 or
GPT 5 now one of the things that you'll
hear a lot throughout this assessment of
what happened is that I think that
people's expectations or hopes really
more than expectations of GPT 4.5 or GPT
5 colored the way that they received
what was actually shared this is of
course in spite of the fact that open
aai did make it clear and advance that
we were not getting GPT 4.5 or GPT 5
quickly CTO Mira moradi honed in on
three big pieces of the announcement
first there was a chat GPT desktop app
second there was an updated chat GPT UI
and three and obviously the most
important there was a new flagship model
called GPT 40 basically this was
described as GPT 4 level intelligence
but faster and with better ways to
interact on open ai's website they call
it their new flagship model that can
reason across audio vision and text in
real time the O they write stands for
Omni and is a quote step towards much
more natural human computer interaction
it accepts any input as combination of
text audio and image and generates any
combination of text audio and image
outputs plus they say it's really fast
it can respond to audio inputs in as
little as 232 milliseconds with an
average of 320 milliseconds which is
similar to human response time in a
conversation
before they got into the demos the next
part of the announcement had to do with
accessibility specifically they said
with the efficiencies of gbt 4 Omni we
can bring this to everyone what that
meant was that free users now have
access to a GPT 4 level model custom
gpts the GPT store basically everything
that you were paying for before paying
users didn't have access to any
differentiated technology anymore
instead they had five times the capacity
limits they also would be first in line
for new features as we saw later in the
day as gbt 4 started immediately rolling
out and as we'll discuss in a little bit
the Improvement in what's available at
the free base level is hugely massive
and the only reason I think that it
wasn't talked about as such is that the
vast majority of people who are spending
their time watching an open AI product
video are probably already springing for
the GPT Plus account in other words the
free access part doesn't benefit them so
it's easier for them to overlook the
significance in aggregate we'll come
back to that though in a few minutes GPT
40 was also going to impact the API
specifically it was going to make it 50%
cheaper which which is obviously a
significant change from there we got
into the live demos of the real-time
conversational capacity of the chat GPT
app when Mira Mora asked what's
different from the existing voice mode
that we have the presenters answered
that you can butt in whatever without
throwing it off that it has realtime
responsiveness that the model picks up
on emotion and that it can generate
voice in a wide variety of styles this
emotional awareness is pretty
significant one of the demos that they
did was telling a bedtime story and the
two presenters kept asking it to change
your module ated speech based on some
new criteria so first they wanted it to
be more dramatic then even more dramatic
than most dramatic of all which it did
each time very successfully and then
they switched it to dramatic but in a
robot voice and then they had it sing
the end of the story I will note here
that even for people who weren't that
impressed with anything else many had
the same thought that cassette AI had
when they said got to give GPT 40 props
that's the most natural sounding AI
voice I've ever heard next up they
showed off the New Vision capabilities
first they did a linear equation where
they asked ask chat GPT to help walk
them through how to solve it so instead
of just pointing the screen at an
equation on a piece of paper and asking
it to solve it the presenters were
really using it as a tutor more than
anything else and in that way I think it
reflected what they were really showing
off which is these features as not
somehow Standalone but as part of a
complete assistant experience and
speaking of that assistant capability
they also did a demo where they brought
up the chat gbt desktop app specifically
the conversational version of it and
were able to ask it about the code that
they were writing in a different
application simp L by copying it into
the chat GPT window they also showed off
chat GPT describing what it saw on
screen after the code was run the two
other demos they did theoretically from
audience input were real-time
translation where one of the presenters
spoke in English and then Meo responded
in Italian with chat GPT operating as
the translator in real time and then
finally they asked chat PT to recognize
the emotions looking at someone's face
and then that was it it was a tight half
an hour there was no big one more thing
Steve Jobs type of moment and like I
said there were a lot of of underwhelmed
responses Abacus AI CEO Bindu ready
writes is this me or was that it what
even that was the single most
underwhelming thing I've seen this year
I'm not sure what's cool about this that
Google duplex demo from 2019 was way
better the only highlight if any was the
tone modulation which wasn't even that
spectacular Theo jaffy writes maybe I'll
be crucified for this but I actually
wasn't blown away by this demo like I
was for the releases of chat GPT and GPT
4 this seems more like a product update
than a foundational new capability
breakthrough on the flip side you add
folks like Pete from the the neuron who
wrote GPT 40 is magical absolutely
magical Rory wrote blown away that more
people aren't Blown Away we just went
from smartphone to iPhone Chris France
writes LOL new open AI model is better
than all existing models at everything
supports real-time vision and audio and
is free what but what about the team at
open AI what story were they trying to
tell well Sam Alman wrote it up
explicitly on his blog he said that he
wanted to highlight two parts of the
announcement first he said a key part of
our mission is to put very capable AI
tools in the hands of people for free or
at a great price I'm very proud that
we've made the best model in the world
available for free in chat GPT without
ads or anything like that our initial
conception he continues when we started
open AI was that we'd create Ai and use
it to create all sorts of benefits for
the world instead it now looks like
we'll create Ai and then other people
will use it to create all sorts of
amazing things that we all benefit from
we are a business and we'll find plenty
of things to charge for and that will
help us provide free outstanding AI
service to hopefully billions of people
second Sam writes the new voice and
video mode is the best computer
interface I've ever used it feels like
AI from the movies and it's still a bit
surprising to me that it's real getting
to human level response times and
expressiveness turns out to be a big
change the original chat GPT showed a
hint of what was possible with language
interfaces this new thing feels
viscerally different it is fast smart
fun natural and helpful talking to a
computer has never felt really natural
for me now it does as we add optional
personalization access to your
information the ability to take actions
on your behalf and more I can really see
an exciting future where we are able to
use computers to do much more than ever
before and so I think Sam is getting
here at two of the three biggest parts
of the announcement the transformation
that this represents when you make it
free and open AI bet on a new mode of
human computer interaction I'm going to
talk about each of those in some more
detail but the third that I want to
point out is truly native multimodality
of this this was an announcement that
was not for a technical audience at
least it didn't seem to be to me all of
it was incredibly simple language and
they didn't even show off some of the
capabilities in fact because they didn't
explain it some people question what was
going on underneath the hood Andrew gal
writes for my technical audience
thoughts on what's behind GPT 40 is it
really multimodal and not converting
things to text I.E you can replicate the
demo by using whisper to convert speech
to text use regular gp4 and then convert
the response to speech using 11 Labs it
would be entirely different if open a
was actually going from Audio Waves to
Audio Waves end to end without other
models in between definitely possible
and would explain the ability to
understand and hear breathing in the
demo but this is also doable without
that necessarily well Andre carpy
previously of the founding team of open
AI explained it this way he said they
are releasing a combined text audio
Vision model that processes all three
modalities in one single neural network
which can then do real-time voice
translation as a special case
afterthought if you ask it to in other
words yes this is true native
multimodality it is not taking language
tokens and then converting them will
deoo who works on video generation at
open aai says I think people are
misunderstanding GPT 40 it isn't a text
model with a voice or image attachment
it's a natively multimodal token in
multimodal token out model you want it
to talk fast just prompt it to need to
translate into whale noises just use f
shot examples an example that he showed
was character consistent image
generation just by conditioning it on
previous images he then showed an
example and if any of you have spent any
time trying to get consistent characters
with workarounds like style reference on
Mid Journey or creating a custom GPT as
I've done or using a thirdparty
application like scenario. the fact that
it might just natively have these
capabilities is pretty significant so to
me the three biggest parts of this
announcement were one the fact that this
best-in-class model was free for
everyone two the fact that it was truly
natively multimodal and three the fact
that open AI was clearly making such a
huge bet on this new type of human
computer interaction as the future of
how we interact with AI but what about
when people started to get their hands
on it how are the reactions then well s
Omar from cognosis writes GPT 40 is way
way faster than GPT 4 it feels like an
entirely different model insanely fast
Andre G again writes to everyone
disappointed by open AI today don't be
the live stream was for General consumer
audience the cool stuff is hidden on
their site some of the examples he gives
are text to 3D hugely Advanced text and
AI generated images Andrew points out
they're so confident in their text image
abilities that they can create fonts
with GPT 40 and a bunch of other huge
things as well Sully again writes okay I
get where chat GPT is going ultimate
workflow equals screen share with chat
gbt chat gbt operates the computer for
you you can interject chat all through
voice it's like having someone there
directly working with you in fact right
now as we're recording this streaming
live on X is someone coding and cursor
with GPT 40 basically as a live Cod and
companion others pointed out that the
timing of this was no accident Robert
scoble writes what was just announced by
open aai was designed to blunt attacks
by Apple and Google as both companies
are about to change their voice
assistance to llm based systems that
will fix most of the things we hate
about both Apple has lots of advantages
that it can brag about like you'll be
able to change the brightness on your
phone by talking to Siri or be
integrated into Apple's ecosystem I.E
can you put something on my reminders
app others pointed out that the chat GPT
demo today was B basically the demo that
everyone freaked out about from gini
Ultra back in December that then
everyone found out was edited to death
and not actually representative of its
true capabilities even more than that
though Google IO is happening tomorrow
and Logan killpatrick who notably used
to work at open AI shared a video of
what is presumably a Gemini assistant
looking at the io stage and explaining
it to the person holding the phone so it
seems highly likely that tomorrow we're
going to be having a very similar
conversation comparing to whatever they
announc at Google IO to what we got from
open aai today oh and as one fun little
aside they did confirm that the I'm also
a good gpt2 chatbot that everyone has
been freaking out about on limus is
indeed a version of GPT 40 that they've
been testing when it comes to real world
response certainly the real-time
translation demo seems to have had an
impact one little coder pointed out a 5%
drop in duolingo's price in the wake of
the demo siki Chan summed up where I
think a lot of people will end up in the
long run when he wrote this will prove
to be in retrospect by far the most
underrated open AI event ever he even
went further and said tldr GPT 40 is a
significantly larger improvement over
GPT 4 than 3.5 was over three gbt 40
equals gbt
4.75 I think the point here one that
will ultimately be proven out or Not by
our interactions with it is that this
native multimodality plus the ability to
input on the basis of vision and video
transforms the use cases of chat GPT in
a huge way that we're probably
underestimating initially another part
of this though was summed up by Aaron
Levy from box he wrot quot the
productivity unlock for Humanity is
pretty insane when AI can bring this
level of intelligence to anyone like I
said I think the reason that we're not
talking more about just how significant
the free shift is is that most of us who
are doing the talking right now have
been paying for chat GPT since the
moment we could giving billions of
people access to that though for free is
just likely to have an enormous enormous
impact on work society and everything in
between ultimately we'll see I think it
is in no way guaranteed that the way
that people will want to interact with
these Technologies is through these sort
of chat modalities or interactions with
video the real world will show us that
one way or another regardless of what
plays out though it's pretty clear that
open aai believes that this is truly the
future of interaction with AI I think
just because Sam Alman wasn't doing the
presentation just because they might
have rushed this a little to get in
ahead of Google IO and just because they
didn't announce formally 4.5 or GPT 5 it
would be a mistake to underestimate how
significant this update is in the minds
of open AI themselves however there is
going to be a lot more to discuss with
this especially with Google IO coming
tomorrow so that is going to do it for
this edition of the AI Daily Brief until
next time peace
関連動画をさらに表示
Всё о новой нейросети GPT-4o за 7 минут!
Riassunto di tutti gli annunci di OpenAI: GPT4o e non solo!
O film gerçek oluyor: Yeni GPT-4o yapay zeka modelinin sesine inanamayacaksınız!
These AI Use Cases Will Affect Everyone You Know
AI News: This Was an INSANE Week in AI!
OpenAI presenta ChatGPT-4 OMNI (GPT-4o): GPT ORA SEMBRA AVERE EMOZIONI!
5.0 / 5 (0 votes)