Why OpenAI's Announcement Was A Bigger Deal Than People Think

The AI Breakdown
13 May 202413:38

Summary

TLDROpenAI's recent product event introduced a divisive update that has sparked significant debate. The event, initially thought to reveal a new search engine or personal assistant, unveiled GPT-4 Omni, a model with GPT-4 level intelligence that operates faster with enhanced real-time interaction capabilities across audio, vision, and text. This model, described as a significant step towards natural human-computer interaction, can accept and generate various input and output combinations, including text, audio, and images, with impressive response times. Additionally, the update made GPT-4 level models accessible for free, offering five times the capacity for paying users and prioritizing them for new features. The live demos showcased the model's conversational abilities, emotional awareness, and vision capabilities. While reactions varied, with some underwhelmed and others impressed by the model's capabilities, the update's significance lies in its transformative potential for human-computer interaction, its free accessibility, and its truly native multimodal functionality. OpenAI's CEO, Sam Altman, emphasized the company's mission to provide capable AI tools for free or at a great price, and the potential for AI to enable unprecedented productivity and interaction with technology.

Takeaways

  • 📅 OpenAI held a significant product event, which was highly anticipated and divisive among the audience.
  • 🚀 The event introduced a new flagship model called GPT-4 Omni, which is described as having GPT-4 level intelligence but faster and with improved interaction capabilities.
  • 🔊 GPT-4 Omni is capable of real-time processing across audio, vision, and text, and can accept and generate a combination of text, audio, and image inputs.
  • 🏎️ The model has a quick response time to audio inputs, averaging around 320 milliseconds, which is comparable to human conversational response times.
  • 🆓 OpenAI made a GPT-4 level model available for free, which is a substantial increase in accessibility for all users.
  • 📈 The update also included a 50% reduction in the cost of the API, making it more accessible for developers.
  • 🎉 Live demos showcased the real-time conversational abilities, emotional awareness, and the ability to generate voice in various styles.
  • 🖼️ GPT-4 Omni's new vision capabilities were demonstrated through solving a linear equation and describing what was seen on the screen after code execution.
  • 🗣️ Real-time translation and emotion recognition from facial expressions were also demonstrated, highlighting the model's multimodal capabilities.
  • 🤔 Reactions to the event were mixed, with some expressing disappointment while others found the updates to be groundbreaking and magical.
  • 🌐 OpenAI's CEO, Sam Altman, emphasized the mission to provide capable AI tools for free or at a low cost, and positioned the new voice and video mode as a significant leap in human-computer interaction.

Q & A

  • What was the main focus of OpenAI's spring update event?

    -The main focus of OpenAI's spring update event was the announcement of their new flagship model, GPT-40, which is a multimodal model capable of processing text, audio, and visual inputs simultaneously.

  • Why was the GPT-40 model described as divisive?

    -The GPT-40 model was described as divisive because it sparked mixed reactions regarding its capabilities and the level of innovation it presented, with some people feeling underwhelmed compared to previous OpenAI releases.

  • What are the significant features of GPT-40?

    -Significant features of GPT-40 include its ability to process inputs and generate outputs across text, audio, and image modalities in real time, its high speed response similar to human conversational times, and its enhanced voice modulation and emotional awareness.

  • How did OpenAI enhance accessibility with GPT-40?

    -OpenAI enhanced accessibility by making GPT-40 available to free users, providing access to a GPT-4 level model, custom GPTs, and the GPT store, previously available only to paying users.

  • What does the 'Omni' in GPT-40 stand for?

    -In GPT-40, the 'O' stands for 'Omni', indicating the model's capability to operate across multiple modalities (text, audio, vision) simultaneously, aiming for a more natural human-computer interaction.

  • What was the public's reaction to the live demos of GPT-40 during the event?

    -The live demos received mixed reactions. Some attendees were impressed by the real-time capabilities and the natural-sounding AI voice, while others found the updates underwhelming compared to previous demonstrations like Google's duplex demo.

  • How did OpenAI address the expectations surrounding GPT 4.5 or GPT 5 at the event?

    -OpenAI made it clear prior to the event that they would not be releasing GPT 4.5 or GPT 5, setting the stage for the introduction of GPT-40 instead.

  • What does the reduced API cost with the introduction of GPT-40 imply for developers?

    -The reduced API cost by 50% with the introduction of GPT-40 implies that developers and businesses can integrate OpenAI's capabilities into their services at a lower cost, potentially broadening the model's usage and accessibility.

  • How does GPT-40 handle real-time translations during the demos?

    -During the demos, GPT-40 showcased its ability to perform real-time translations effectively. For example, it translated spoken English into Italian instantaneously, demonstrating its proficiency in handling live multilingual communication.

  • What future enhancements did Sam Altman highlight regarding GPT-40?

    -Sam Altman highlighted potential future enhancements like adding personalization, improving access to information, and enabling the AI to take actions on behalf of users, which he believes will significantly enrich the human-computer interaction experience.

Outlines

00:00

📢 OpenAI's Spring Update: A Divisive Milestone

The video discusses OpenAI's recent product event, which introduced updates that have sparked varied reactions. The event was initially anticipated to reveal a search engine to rival Google, but instead, it focused on a personal assistant update with enhanced voice features. The presentation was notable for the absence of Sam Altman, suggesting a potential shift in the company's direction. The update included a chat GPT desktop app, an updated user interface, and the introduction of GPT 40, a model with GPT 4-level intelligence that processes audio, vision, and text in real-time. The model's capabilities were demonstrated through various live demos, showcasing its speed, emotional responsiveness, and multimodal functionality. Despite initial skepticism, the update's significance lies in its potential to redefine human-computer interaction and its accessibility, with free access to a GPT 4-level model for all users.

05:01

🤖 GPT 40: Multimodal Magic and Mixed Receptions

The second paragraph delves into the technical aspects and public reception of GPT 40. It highlights the model's real-time conversational abilities, its emotional awareness, and its new vision capabilities. The paragraph also discusses the accessibility of the technology, with free users gaining access to a GPT 4 level model and paying users receiving increased capacity limits. The API was also made more affordable, dropping by 50%. Reactions to the update varied widely, with some critics finding it underwhelming compared to Google's offerings, while others were impressed by its capabilities. The paragraph also touches on the potential strategic timing of the announcement, aimed at preempting similar developments from Apple and Google. The significance of GPT 40's native multimodality is emphasized, as it processes all modalities within a single neural network, offering real-time voice translation and advanced image generation.

10:01

🚀 The Future of AI Interaction: OpenAI's Bold Bet

The final paragraph of the script reflects on the broader implications of OpenAI's update and the varied reactions from users and industry experts. It emphasizes the transformative potential of making a high-quality AI model freely accessible and the company's commitment to a new mode of human-computer interaction. The paragraph also speculates on the strategic timing of the announcement in relation to Google IO and Apple's ecosystem developments. The discussion includes the potential impact on productivity and society, with some commentators suggesting that the update may be more significant than initially perceived. The paragraph concludes by acknowledging the uncertainty of how these technologies will be adopted in the real world but asserts that OpenAI's update represents a significant step towards the future of AI interaction.

Mindmap

Keywords

💡OpenAI

OpenAI is a research and deployment company that develops artificial general intelligence (AGI). In the video, it is the organization that held a product event, which is the central focus of the discussion. The company's updates and announcements are what make the event significant and are the main subject of the video.

💡Product Event

A product event is a formal presentation where a company unveils new products or updates to existing ones. In this context, OpenAI's product event is pivotal as it introduces new features and models, such as GPT-4 Omni, which are discussed in detail throughout the video.

💡GPT-4 Omni

GPT-4 Omni is a new flagship model of AI introduced by OpenAI, described as having GPT-4 level intelligence but with faster response times and improved interaction capabilities. It is a significant keyword as it represents the technological advancement that allows for multimodal interaction (text, audio, and image) and is a core component of the event's announcements.

💡Multimodality

Multimodality refers to the ability of a system to process and understand multiple modes of input and output, such as text, audio, and images. In the video, GPT-4 Omni's native multimodality is a key feature, allowing it to accept and generate various combinations of inputs and outputs, which is a major step towards more natural human-computer interaction.

💡Real-time Interaction

Real-time interaction implies that the system can respond to inputs immediately, without significant delay. The video emphasizes GPT-4 Omni's capability to respond to audio inputs in as little as 232 milliseconds, which is comparable to human response times, highlighting a significant leap in AI interactivity.

💡Accessibility

Accessibility, in the context of this video, refers to making advanced AI technology available to a broader audience. OpenAI's announcement about making GPT-4 level models available for free to all users is a major point of discussion, as it signifies a shift towards democratizing access to powerful AI tools.

💡API

API stands for Application Programming Interface, which is a set of protocols and tools that allows different software applications to communicate with each other. In the video, it is mentioned that the new updates will make the API 50% cheaper, which is significant for developers and businesses that rely on OpenAI's technology.

💡Emotion Recognition

Emotion recognition is the ability of a system to identify and respond to human emotions. The video script discusses a demo where GPT-4 Omni is shown to recognize emotions by looking at someone's face, showcasing the advanced capabilities of the new model in understanding human emotional cues.

💡Personal Assistant

A personal assistant, in the context of this video, refers to the AI's role in aiding with various tasks and providing personalized responses. The updates discussed are aimed at enhancing the personal assistant capabilities of OpenAI's models, making them more integrated into daily workflows and interactions.

💡Demos

Demos are demonstrations of how a product or technology works, often showcasing its features and capabilities. The video script describes several live demos that were part of OpenAI's product event, including real-time translation and vision capabilities, which serve to illustrate the practical applications of the new AI model.

💡Divisive

Divisive refers to something that causes disagreement or controversy. The video discusses how the OpenAI product event and the subsequent updates have been divisive, with some viewers being underwhelmed and others being impressed, indicating a split in the audience's reception and expectations.

Highlights

OpenAI held a product event that was highly anticipated and divisive, focusing on the OpenAI Spring Update.

Speculation suggested a potential search engine to compete with Google and updates to personal assistant features, particularly voice capabilities.

Sam Altman was not the presenter, indicating the possibility of a less significant announcement than expected.

CTO Mira Muradi announced three key components: a chat GPT desktop app, an updated chat GPT UI, and a new flagship model called GPT-4 Omni.

GPT-4 Omni is described as having GPT-4 level intelligence, faster response times, and improved interaction methods.

The model can reason across audio, vision, and text in real-time and is designed for more natural human-computer interaction.

GPT-4 Omni can accept and generate a combination of text, audio, and image inputs and outputs.

Response times to audio inputs are as fast as 232 milliseconds, comparable to human conversational response times.

Free users now have access to a GPT-4 level model, with paying users gaining five times the capacity limits and priority for new features.

The API for GPT-4 Omni will be 50% cheaper, making it more accessible for developers.

Live demos showcased the real-time conversational capabilities, including emotional awareness and voice modulation.

GPT-4 Omni demonstrated advanced capabilities such as solving equations, real-time translation, and emotion recognition from facial expressions.

The update was met with mixed reactions, with some finding it underwhelming while others were impressed by its capabilities.

Sam Altman emphasized the mission to provide capable AI tools for free or at a great price, and the potential for AI to create benefits for the world.

The new voice and video mode is considered a significant leap in computer interfaces, resembling AI from movies with human-like response times and expressiveness.

The update represents a transformation in accessibility, multimodality, and a new mode of human-computer interaction.

GPT-4 Omni's native multimodality allows for processing text, audio, and vision in a single neural network, offering real-time voice translation as a special case.

The update is seen as a strategic move to counter potential competition from Apple and Google, who are also integrating AI into their voice assistance systems.

Despite initial reactions, some believe the update is underrated and will have a significant impact on productivity and the future of AI interaction.

Transcripts

play00:00

open aai just held a product event and

play00:02

it's easily their most divisive yet in

play00:05

this video we're going to talk about why

play00:06

it was actually a bigger deal than it

play00:08

might seem at first welcome back to the

play00:10

AI Daily Brief today is one of those

play00:12

days kind of the opposite of some of the

play00:14

ones we've had recently where everyone

play00:16

is talking about just one thing and so

play00:18

instead of doing our whole normal brief

play00:20

than main episode sort of conversation

play00:22

we are just going to focus on the big

play00:24

thing that everyone is talking about

play00:26

which is of course open AI spring update

play00:28

now this is the event that has been

play00:30

rumored for a couple of weeks for a

play00:32

while there was speculation that we were

play00:33

going to see a search engine some sort

play00:34

of competition with Google and

play00:36

perplexity but towards the end of last

play00:38

week as the event apparently got delayed

play00:39

a couple of days it started to come into

play00:41

view that the most likely candidate was

play00:43

some sort of personal assistant update

play00:45

particularly around voice features now

play00:48

this I believe will go down as one of

play00:49

the most divisive initially product

play00:52

updates that open AI has ever released

play00:54

so what we're going to do on this show

play00:56

is first we're going to talk about what

play00:57

they actually shared and then we'll get

play00:59

into the reactions and why I think it's

play01:01

actually more significant not less

play01:02

significant than it seems at first right

play01:05

away the first thing you noticed when it

play01:06

kicked off was that Sam Alman was not

play01:07

the one presenting I could be totally

play01:09

wrong but I initially took this as a

play01:11

sign that perhaps it wasn't going to be

play01:13

as big an announcement as we might have

play01:15

thought sort of with the idea that they

play01:16

were keeping Sam in the background for

play01:18

the big major updates like GPT 4.5 or

play01:20

GPT 5 now one of the things that you'll

play01:23

hear a lot throughout this assessment of

play01:25

what happened is that I think that

play01:26

people's expectations or hopes really

play01:28

more than expectations of GPT 4.5 or GPT

play01:31

5 colored the way that they received

play01:33

what was actually shared this is of

play01:35

course in spite of the fact that open

play01:36

aai did make it clear and advance that

play01:38

we were not getting GPT 4.5 or GPT 5

play01:41

quickly CTO Mira moradi honed in on

play01:43

three big pieces of the announcement

play01:45

first there was a chat GPT desktop app

play01:47

second there was an updated chat GPT UI

play01:50

and three and obviously the most

play01:51

important there was a new flagship model

play01:53

called GPT 40 basically this was

play01:56

described as GPT 4 level intelligence

play01:58

but faster and with better ways to

play02:00

interact on open ai's website they call

play02:03

it their new flagship model that can

play02:04

reason across audio vision and text in

play02:06

real time the O they write stands for

play02:09

Omni and is a quote step towards much

play02:11

more natural human computer interaction

play02:13

it accepts any input as combination of

play02:15

text audio and image and generates any

play02:17

combination of text audio and image

play02:18

outputs plus they say it's really fast

play02:21

it can respond to audio inputs in as

play02:23

little as 232 milliseconds with an

play02:25

average of 320 milliseconds which is

play02:27

similar to human response time in a

play02:29

conversation

play02:30

before they got into the demos the next

play02:31

part of the announcement had to do with

play02:33

accessibility specifically they said

play02:35

with the efficiencies of gbt 4 Omni we

play02:37

can bring this to everyone what that

play02:39

meant was that free users now have

play02:40

access to a GPT 4 level model custom

play02:43

gpts the GPT store basically everything

play02:45

that you were paying for before paying

play02:47

users didn't have access to any

play02:49

differentiated technology anymore

play02:50

instead they had five times the capacity

play02:52

limits they also would be first in line

play02:54

for new features as we saw later in the

play02:56

day as gbt 4 started immediately rolling

play02:58

out and as we'll discuss in a little bit

play03:00

the Improvement in what's available at

play03:02

the free base level is hugely massive

play03:05

and the only reason I think that it

play03:06

wasn't talked about as such is that the

play03:08

vast majority of people who are spending

play03:10

their time watching an open AI product

play03:12

video are probably already springing for

play03:13

the GPT Plus account in other words the

play03:16

free access part doesn't benefit them so

play03:18

it's easier for them to overlook the

play03:20

significance in aggregate we'll come

play03:22

back to that though in a few minutes GPT

play03:24

40 was also going to impact the API

play03:27

specifically it was going to make it 50%

play03:28

cheaper which which is obviously a

play03:30

significant change from there we got

play03:32

into the live demos of the real-time

play03:35

conversational capacity of the chat GPT

play03:37

app when Mira Mora asked what's

play03:39

different from the existing voice mode

play03:41

that we have the presenters answered

play03:42

that you can butt in whatever without

play03:44

throwing it off that it has realtime

play03:46

responsiveness that the model picks up

play03:48

on emotion and that it can generate

play03:49

voice in a wide variety of styles this

play03:52

emotional awareness is pretty

play03:53

significant one of the demos that they

play03:55

did was telling a bedtime story and the

play03:57

two presenters kept asking it to change

play03:59

your module ated speech based on some

play04:01

new criteria so first they wanted it to

play04:03

be more dramatic then even more dramatic

play04:05

than most dramatic of all which it did

play04:07

each time very successfully and then

play04:08

they switched it to dramatic but in a

play04:10

robot voice and then they had it sing

play04:12

the end of the story I will note here

play04:14

that even for people who weren't that

play04:16

impressed with anything else many had

play04:18

the same thought that cassette AI had

play04:19

when they said got to give GPT 40 props

play04:22

that's the most natural sounding AI

play04:23

voice I've ever heard next up they

play04:25

showed off the New Vision capabilities

play04:27

first they did a linear equation where

play04:29

they asked ask chat GPT to help walk

play04:31

them through how to solve it so instead

play04:32

of just pointing the screen at an

play04:34

equation on a piece of paper and asking

play04:35

it to solve it the presenters were

play04:37

really using it as a tutor more than

play04:39

anything else and in that way I think it

play04:41

reflected what they were really showing

play04:42

off which is these features as not

play04:44

somehow Standalone but as part of a

play04:46

complete assistant experience and

play04:48

speaking of that assistant capability

play04:50

they also did a demo where they brought

play04:51

up the chat gbt desktop app specifically

play04:54

the conversational version of it and

play04:55

were able to ask it about the code that

play04:57

they were writing in a different

play04:58

application simp L by copying it into

play05:00

the chat GPT window they also showed off

play05:03

chat GPT describing what it saw on

play05:04

screen after the code was run the two

play05:07

other demos they did theoretically from

play05:09

audience input were real-time

play05:10

translation where one of the presenters

play05:12

spoke in English and then Meo responded

play05:14

in Italian with chat GPT operating as

play05:16

the translator in real time and then

play05:18

finally they asked chat PT to recognize

play05:20

the emotions looking at someone's face

play05:22

and then that was it it was a tight half

play05:24

an hour there was no big one more thing

play05:26

Steve Jobs type of moment and like I

play05:28

said there were a lot of of underwhelmed

play05:30

responses Abacus AI CEO Bindu ready

play05:33

writes is this me or was that it what

play05:35

even that was the single most

play05:37

underwhelming thing I've seen this year

play05:39

I'm not sure what's cool about this that

play05:40

Google duplex demo from 2019 was way

play05:42

better the only highlight if any was the

play05:44

tone modulation which wasn't even that

play05:46

spectacular Theo jaffy writes maybe I'll

play05:48

be crucified for this but I actually

play05:49

wasn't blown away by this demo like I

play05:51

was for the releases of chat GPT and GPT

play05:53

4 this seems more like a product update

play05:55

than a foundational new capability

play05:56

breakthrough on the flip side you add

play05:58

folks like Pete from the the neuron who

play06:00

wrote GPT 40 is magical absolutely

play06:02

magical Rory wrote blown away that more

play06:04

people aren't Blown Away we just went

play06:06

from smartphone to iPhone Chris France

play06:09

writes LOL new open AI model is better

play06:11

than all existing models at everything

play06:13

supports real-time vision and audio and

play06:14

is free what but what about the team at

play06:17

open AI what story were they trying to

play06:19

tell well Sam Alman wrote it up

play06:21

explicitly on his blog he said that he

play06:23

wanted to highlight two parts of the

play06:24

announcement first he said a key part of

play06:26

our mission is to put very capable AI

play06:28

tools in the hands of people for free or

play06:30

at a great price I'm very proud that

play06:32

we've made the best model in the world

play06:33

available for free in chat GPT without

play06:35

ads or anything like that our initial

play06:37

conception he continues when we started

play06:38

open AI was that we'd create Ai and use

play06:40

it to create all sorts of benefits for

play06:41

the world instead it now looks like

play06:43

we'll create Ai and then other people

play06:45

will use it to create all sorts of

play06:46

amazing things that we all benefit from

play06:48

we are a business and we'll find plenty

play06:49

of things to charge for and that will

play06:50

help us provide free outstanding AI

play06:52

service to hopefully billions of people

play06:54

second Sam writes the new voice and

play06:56

video mode is the best computer

play06:57

interface I've ever used it feels like

play06:59

AI from the movies and it's still a bit

play07:00

surprising to me that it's real getting

play07:02

to human level response times and

play07:03

expressiveness turns out to be a big

play07:05

change the original chat GPT showed a

play07:07

hint of what was possible with language

play07:08

interfaces this new thing feels

play07:10

viscerally different it is fast smart

play07:12

fun natural and helpful talking to a

play07:14

computer has never felt really natural

play07:16

for me now it does as we add optional

play07:19

personalization access to your

play07:20

information the ability to take actions

play07:22

on your behalf and more I can really see

play07:23

an exciting future where we are able to

play07:25

use computers to do much more than ever

play07:27

before and so I think Sam is getting

play07:29

here at two of the three biggest parts

play07:30

of the announcement the transformation

play07:32

that this represents when you make it

play07:33

free and open AI bet on a new mode of

play07:35

human computer interaction I'm going to

play07:37

talk about each of those in some more

play07:39

detail but the third that I want to

play07:40

point out is truly native multimodality

play07:42

of this this was an announcement that

play07:44

was not for a technical audience at

play07:47

least it didn't seem to be to me all of

play07:49

it was incredibly simple language and

play07:51

they didn't even show off some of the

play07:52

capabilities in fact because they didn't

play07:54

explain it some people question what was

play07:55

going on underneath the hood Andrew gal

play07:57

writes for my technical audience

play07:59

thoughts on what's behind GPT 40 is it

play08:01

really multimodal and not converting

play08:03

things to text I.E you can replicate the

play08:05

demo by using whisper to convert speech

play08:07

to text use regular gp4 and then convert

play08:09

the response to speech using 11 Labs it

play08:12

would be entirely different if open a

play08:13

was actually going from Audio Waves to

play08:15

Audio Waves end to end without other

play08:16

models in between definitely possible

play08:18

and would explain the ability to

play08:19

understand and hear breathing in the

play08:20

demo but this is also doable without

play08:22

that necessarily well Andre carpy

play08:25

previously of the founding team of open

play08:26

AI explained it this way he said they

play08:28

are releasing a combined text audio

play08:30

Vision model that processes all three

play08:32

modalities in one single neural network

play08:34

which can then do real-time voice

play08:35

translation as a special case

play08:36

afterthought if you ask it to in other

play08:38

words yes this is true native

play08:40

multimodality it is not taking language

play08:42

tokens and then converting them will

play08:44

deoo who works on video generation at

play08:46

open aai says I think people are

play08:47

misunderstanding GPT 40 it isn't a text

play08:50

model with a voice or image attachment

play08:52

it's a natively multimodal token in

play08:53

multimodal token out model you want it

play08:55

to talk fast just prompt it to need to

play08:57

translate into whale noises just use f

play08:59

shot examples an example that he showed

play09:01

was character consistent image

play09:02

generation just by conditioning it on

play09:04

previous images he then showed an

play09:06

example and if any of you have spent any

play09:07

time trying to get consistent characters

play09:09

with workarounds like style reference on

play09:12

Mid Journey or creating a custom GPT as

play09:14

I've done or using a thirdparty

play09:16

application like scenario. the fact that

play09:18

it might just natively have these

play09:20

capabilities is pretty significant so to

play09:22

me the three biggest parts of this

play09:24

announcement were one the fact that this

play09:26

best-in-class model was free for

play09:27

everyone two the fact that it was truly

play09:29

natively multimodal and three the fact

play09:31

that open AI was clearly making such a

play09:33

huge bet on this new type of human

play09:35

computer interaction as the future of

play09:37

how we interact with AI but what about

play09:40

when people started to get their hands

play09:41

on it how are the reactions then well s

play09:44

Omar from cognosis writes GPT 40 is way

play09:47

way faster than GPT 4 it feels like an

play09:49

entirely different model insanely fast

play09:52

Andre G again writes to everyone

play09:53

disappointed by open AI today don't be

play09:55

the live stream was for General consumer

play09:57

audience the cool stuff is hidden on

play09:59

their site some of the examples he gives

play10:01

are text to 3D hugely Advanced text and

play10:04

AI generated images Andrew points out

play10:06

they're so confident in their text image

play10:08

abilities that they can create fonts

play10:09

with GPT 40 and a bunch of other huge

play10:11

things as well Sully again writes okay I

play10:14

get where chat GPT is going ultimate

play10:16

workflow equals screen share with chat

play10:18

gbt chat gbt operates the computer for

play10:20

you you can interject chat all through

play10:22

voice it's like having someone there

play10:23

directly working with you in fact right

play10:25

now as we're recording this streaming

play10:27

live on X is someone coding and cursor

play10:29

with GPT 40 basically as a live Cod and

play10:31

companion others pointed out that the

play10:33

timing of this was no accident Robert

play10:35

scoble writes what was just announced by

play10:37

open aai was designed to blunt attacks

play10:39

by Apple and Google as both companies

play10:41

are about to change their voice

play10:42

assistance to llm based systems that

play10:44

will fix most of the things we hate

play10:45

about both Apple has lots of advantages

play10:47

that it can brag about like you'll be

play10:49

able to change the brightness on your

play10:50

phone by talking to Siri or be

play10:52

integrated into Apple's ecosystem I.E

play10:54

can you put something on my reminders

play10:55

app others pointed out that the chat GPT

play10:58

demo today was B basically the demo that

play11:00

everyone freaked out about from gini

play11:01

Ultra back in December that then

play11:03

everyone found out was edited to death

play11:05

and not actually representative of its

play11:06

true capabilities even more than that

play11:08

though Google IO is happening tomorrow

play11:11

and Logan killpatrick who notably used

play11:13

to work at open AI shared a video of

play11:15

what is presumably a Gemini assistant

play11:17

looking at the io stage and explaining

play11:19

it to the person holding the phone so it

play11:21

seems highly likely that tomorrow we're

play11:23

going to be having a very similar

play11:24

conversation comparing to whatever they

play11:26

announc at Google IO to what we got from

play11:28

open aai today oh and as one fun little

play11:30

aside they did confirm that the I'm also

play11:33

a good gpt2 chatbot that everyone has

play11:35

been freaking out about on limus is

play11:37

indeed a version of GPT 40 that they've

play11:39

been testing when it comes to real world

play11:41

response certainly the real-time

play11:43

translation demo seems to have had an

play11:45

impact one little coder pointed out a 5%

play11:47

drop in duolingo's price in the wake of

play11:49

the demo siki Chan summed up where I

play11:52

think a lot of people will end up in the

play11:54

long run when he wrote this will prove

play11:56

to be in retrospect by far the most

play11:58

underrated open AI event ever he even

play12:00

went further and said tldr GPT 40 is a

play12:03

significantly larger improvement over

play12:04

GPT 4 than 3.5 was over three gbt 40

play12:08

equals gbt

play12:10

4.75 I think the point here one that

play12:12

will ultimately be proven out or Not by

play12:14

our interactions with it is that this

play12:16

native multimodality plus the ability to

play12:18

input on the basis of vision and video

play12:20

transforms the use cases of chat GPT in

play12:23

a huge way that we're probably

play12:24

underestimating initially another part

play12:26

of this though was summed up by Aaron

play12:27

Levy from box he wrot quot the

play12:29

productivity unlock for Humanity is

play12:31

pretty insane when AI can bring this

play12:33

level of intelligence to anyone like I

play12:36

said I think the reason that we're not

play12:37

talking more about just how significant

play12:39

the free shift is is that most of us who

play12:41

are doing the talking right now have

play12:43

been paying for chat GPT since the

play12:44

moment we could giving billions of

play12:46

people access to that though for free is

play12:48

just likely to have an enormous enormous

play12:51

impact on work society and everything in

play12:53

between ultimately we'll see I think it

play12:55

is in no way guaranteed that the way

play12:57

that people will want to interact with

play12:58

these Technologies is through these sort

play13:00

of chat modalities or interactions with

play13:02

video the real world will show us that

play13:04

one way or another regardless of what

play13:06

plays out though it's pretty clear that

play13:08

open aai believes that this is truly the

play13:11

future of interaction with AI I think

play13:13

just because Sam Alman wasn't doing the

play13:14

presentation just because they might

play13:16

have rushed this a little to get in

play13:17

ahead of Google IO and just because they

play13:19

didn't announce formally 4.5 or GPT 5 it

play13:22

would be a mistake to underestimate how

play13:24

significant this update is in the minds

play13:26

of open AI themselves however there is

play13:28

going to be a lot more to discuss with

play13:30

this especially with Google IO coming

play13:32

tomorrow so that is going to do it for

play13:33

this edition of the AI Daily Brief until

play13:36

next time peace

Rate This

5.0 / 5 (0 votes)

Ähnliche Tags
OpenAIGPT-4Multimodal AIProduct UpdateReal-time InteractionAI AssistantVoice FeaturesAI Daily BriefTech EventInnovationArtificial Intelligence
Benötigen Sie eine Zusammenfassung auf Englisch?