GPT-4o - Full Breakdown + Bonus Details

AI Explained
13 May 202418:43

Summary

TLDRThe transcript discusses the latest advancements in AI with the release of GPT-4 Omni, which is poised to rival Google's AI capabilities. GPT-4 Omni showcases improvements in multimodal input and output, coding, and latency reduction, offering a more human-like interaction. The model has demonstrated remarkable text and image generation accuracy, the ability to design movie posters from textual descriptions, and even mimic customer service interactions. It also excels in math benchmarks and shows potential as a real-time translation tool. Despite some glitches and mixed results in reasoning benchmarks, GPT-4 Omni is expected to significantly expand AI accessibility and popularity, especially with its free and multimodal features.

Takeaways

  • πŸš€ **GPT-4 Omni**: The new model is designed to handle multiple modalities (text, image, etc.), indicating a step towards more universal capabilities.
  • πŸ“ˆ **Scaling Up**: OpenAI is preparing to scale from 100 million to hundreds of millions of users, hinting at an even smarter model in the pipeline.
  • πŸ“Š **Benchmarks**: GPT-4 has shown significant improvements in benchmarks, particularly in coding and mathematics, compared to its predecessors.
  • 🎨 **Creative Tasks**: The model can generate high-accuracy text from images and create movie posters from textual descriptions, showcasing its creative abilities.
  • πŸ“± **Desktop App**: A live coding co-pilot desktop app is introduced, allowing for real-time code analysis and suggestions, enhancing developer productivity.
  • πŸ“‰ **Pricing**: GPT-4 is competitively priced at $5 for 1 million input tokens and $15 for 1 million output tokens, making it more accessible.
  • 🌐 **Multilingual Support**: The model shows improved performance across languages, though English remains its strongest suit.
  • πŸ“Ή **Video Input**: GPT-4 can process live video streams, a significant leap towards more interactive and engaging AI applications.
  • πŸ—£οΈ **Real-Time Interaction**: The model is capable of real-time responses, with the ability to adjust its speed according to user preference.
  • πŸ€– **AI Assistants**: Demonstrations included an AI calling customer service, indicating potential future uses in automated assistance and support.
  • ⏱️ **Latency Reduction**: Reducing latency is a key innovation in GPT-4, making interactions feel more realistic and akin to human-level response times.

Q & A

  • What does the term 'Omni' in GPT-4 Omni signify?

    -The term 'Omni' in GPT-4 Omni signifies 'all' or 'everywhere,' referencing the different modalities the model is capable of handling, such as text, image, and potentially video.

  • What is the significance of OpenAI's decision to increase message limits for paid users?

    -The increase in message limits for paid users suggests that OpenAI is either scaling up their user base from 100 million to hundreds of millions of users or they are preparing to release an even smarter model in the near future.

  • How does GPT-4 Omni's text and image generation accuracy compare to previous models?

    -GPT-4 Omni demonstrates significantly higher accuracy in text and image generation compared to previous models, with the script mentioning that it has never seen text generated with such precision.

  • What is the 'reverse psychology' approach demonstrated in the movie poster design example?

    -The 'reverse psychology' approach involves asking GPT-4 Omni to improve an already generated output by specifying desired improvements, such as crisper text and bolder, more dramatic colors, which results in an enhanced final product.

  • When is the new functionality of GPT-4 Omni expected to be released?

    -OpenAI has indicated that the new functionality of GPT-4 Omni, including text and image generation capabilities, will be released in the next few weeks.

  • What is the significance of the AI-to-AI customer service interaction demonstration?

    -The AI-to-AI customer service interaction demonstrates a 'proof of concept' for future AI agents that can autonomously handle tasks such as sending emails and checking for their receipt, showcasing the potential for advanced AI automation.

  • What are some of the additional features that GPT-4 Omni can perform?

    -GPT-4 Omni can perform a variety of tasks such as creating caricatures from photos, generating new font styles from text descriptions, transcribing meetings, summarizing videos, and maintaining character consistency in generated content.

  • How does GPT-4 Omni's performance on benchmarks compare to other models like Claude 3 and Llama 3400b?

    -GPT-4 Omni shows a significant improvement over the original GPT-4 and outperforms Claude 3 on the Google proof graduate test. However, it slightly underperforms Llama 3400b on the DROP benchmark, which focuses on adversarial reading comprehension.

  • What is the pricing model for GPT-4 Omni?

    -GPT-4 Omni is priced at $5 per 1 million tokens for input and $15 per 1 million tokens for output. It is also available for free, which contrasts with Claude 3 Opus's pricing and subscription model.

  • How does GPT-4 Omni's multilingual performance compare to the original GPT-4?

    -GPT-4 Omni shows a step up in multilingual performance across languages compared to the original GPT-4, although English remains the most suited language for the model.

  • What are some of the potential applications of GPT-4 Omni's video input functionality?

    -The video input functionality of GPT-4 Omni can be used for real-time translation, live-streaming video to the Transformer architecture for analysis, and potentially revolutionizing accessibility for non-English speakers.

Outlines

00:00

πŸš€ Introduction to GPT-4 Omni's Advancements

The first paragraph introduces GPT-4 Omni, highlighting its multimodal capabilities and potential to overshadow Google. It discusses the model's performance in benchmarks, its flirtatious nature, and the hint of an even smarter model to come. The paragraph also touches on OpenAI's scaling plans, the increased message limits for paid users, and the impressive text and image generation accuracy of GPT-4 Omni. It mentions upcoming releases and a demo showcasing the model's conversational abilities with AI customer service.

05:01

πŸ“ˆ GPT-4 Omni's Performance and Pricing

The second paragraph delves into GPT-4 Omni's performance on various benchmarks, particularly in math and the Google Proof Graduate test. It compares the model's pricing to that of Claude 3 Opus and emphasizes GPT-4 Omni's free access. The paragraph also discusses the model's mixed results on the DROP benchmark and its improvements in translation and vision understanding. It mentions the tokenizer's potential impact on non-English languages and the model's multilingual capabilities.

10:03

🎭 Real-time Interactions and Latency Reduction

The third paragraph focuses on the real-time interaction capabilities of GPT-4 Omni, including its ability to adjust response times and engage directly with the camera. It discusses the model's flirtatious design and the importance of latency reduction for realism. The paragraph includes a variety of demos, such as real-time translation, mathematics tutoring, and harmonizing voices, showcasing the model's versatility. It also mentions the potential for video input functionality and the model's slight glitches during demos.

15:04

🌐 GPT-4 Omni's Accessibility and Future Prospects

The final paragraph emphasizes GPT-4 Omni's accessibility, being free and multimodal, and predicts its massive popularity. It discusses the model's potential to bring AI to hundreds of millions more people and compares its impact to that of the previous GPT models. The paragraph also mentions the possibility of real-time translation and hints at future updates from OpenAI. It concludes with an invitation to join AI insiders on Discord for further analysis and discussion.

Mindmap

Keywords

πŸ’‘GPT-4 Omni

GPT-4 Omni refers to the next generation AI model developed by OpenAI, which is described as being smarter, faster, and better at coding. It signifies an advancement in AI technology, capable of handling multiple modalities (text, image, etc.). In the video, it is portrayed as a significant step towards artificial general intelligence (AGI), with the 'Omni' part of the name indicating its versatility across different tasks and modalities.

πŸ’‘Benchmarks

Benchmarks are standardized tests or criteria used to evaluate the performance of systems, in this case, AI models. The video discusses how GPT-4 Omni has undergone various benchmarks, demonstrating its capabilities in different areas such as coding, translation, and reasoning. The benchmarks serve as a measure of comparison against other models, highlighting GPT-4 Omni's improvements and competitive edge.

πŸ’‘Multimodal

Multimodal refers to the ability of a system to process and understand multiple types of input and output, such as text, images, and possibly video. In the context of the video, GPT-4 Omni's multimodal capabilities are emphasized, showcasing its versatility in handling various forms of data and its potential applications in diverse fields.

πŸ’‘AGI (Artificial General Intelligence)

AGI, or Artificial General Intelligence, is the concept of an AI system that possesses the ability to understand or learn any intellectual task that a human being can do. The video discusses GPT-4 Omni in the context of AGI, suggesting that while it is not yet at the level of AGI, it represents a notable step forward towards achieving such a level of intelligence.

πŸ’‘Tokenizer

A tokenizer in the context of AI and natural language processing is a component that breaks down text into smaller units, such as words or phrases, that can be analyzed and understood by the AI model. The video mentions improvements to the tokenizer, which could be revolutionary for non-English speakers by requiring fewer tokens for languages like Gujarati, Hindi, and Arabic, making conversations cheaper and quicker.

πŸ’‘Latency

Latency refers to the delay before a system responds to a command or request. In the video, it is mentioned that GPT-4 Omni has reduced latency, which is a key innovation. Lower latency makes the AI feel more realistic and responsive, akin to human-level response times, which is crucial for a more natural and engaging user experience.

πŸ’‘Reasoning Benchmarks

Reasoning benchmarks are specific tests designed to evaluate an AI's ability to reason and understand complex information. The video discusses the mixed results of GPT-4 Omni on the DROP benchmark, which assesses discrete reasoning over the content of paragraphs. These benchmarks are important for understanding the limitations as well as the strengths of the AI's cognitive capabilities.

πŸ’‘Translation

Translation, in the context of AI, refers to the ability of a model to convert text from one language to another accurately. The video highlights GPT-4 Omni's improvements in translation capabilities, suggesting that it is better than Gemini models, which is significant for multilingual support and global accessibility.

πŸ’‘Vision Understanding Evaluations

Vision understanding evaluations are assessments that measure an AI's ability to interpret and comprehend visual data. The video mentions that GPT-4 Omni has made significant strides in this area, particularly on the MMUU benchmark, which is important for applications that involve image recognition and processing.

πŸ’‘Multilingual Performance

Multilingual performance refers to how well an AI model can handle and generate content in multiple languages. The video notes that GPT-4 Omni has improved performance across languages compared to its predecessor, although English remains the most suited language for the model. This is important for global users and the model's applicability in various linguistic contexts.

πŸ’‘Real-time Interaction

Real-time interaction implies the ability of an AI to engage with users instantaneously, without significant delays. The video showcases GPT-4 Omni's capacity for real-time interaction, which is demonstrated through live demonstrations of the model's responses to prompts. This feature enhances the user experience and makes the AI feel more dynamic and engaging.

Highlights

GPT-4 Omni is a notable step forward in AI, offering multimodal capabilities and improved performance in coding.

GPT-4 Omni is designed to scale from 100 million to hundreds of millions of users, hinting at an even smarter model coming soon.

OpenAI has increased message limits for paid users by five times, suggesting a significant expansion in capabilities or user base.

GPT-4 Omni demonstrated high accuracy in generating text from images, showcasing its advanced understanding and processing abilities.

The model was able to design a movie poster from textual requirements, illustrating its creativity and design skills.

GPT-4 Omni's text and photo accuracy improvements are set to be released in the coming weeks, expanding its functionality.

A demo showed GPT-4 Omni's ability to mimic human interaction by calling customer service and successfully completing a task.

GPT-4 Omni can generate caricatures from photos and create new font styles from textual descriptions, indicating its versatility.

The model transcribed a meeting with four speakers and provided a summary of a 45-minute video, demonstrating its multimodal input capabilities.

GPT-4 Omni showed character consistency in a cartoon strip, suggesting its potential for narrative and creative content generation.

In coding tasks, GPT-4 Omni outperformed all other models, indicating a significant leap in AI coding assistance.

The desktop app allows for live coding assistance, enhancing the model's utility for software development.

GPT-4 Omni's math performance has seen a stark improvement from the original GPT-4, despite some failures in complex math prompts.

The model beat Claude 3 Opus on the Google proof graduate test, a significant benchmark in the AI field.

GPT-4 Omni is priced competitively at $5 per 1 million tokens for input and $15 for output, making it accessible to a wider audience.

The model's translation capabilities are superior to Gemini models, with potential for further advancements.

GPT-4 Omni showed significant improvements in vision understanding evaluations, outperforming Claude Opus by 10 points.

The model demonstrated real-time translation capabilities, suggesting future enhancements in multilingual support.

GPT-4 Omni's video input functionality allows for live streaming to the Transformer architecture, a significant technological advancement.

The model's flirtatious nature in demos may indicate a design choice to maximize engagement, despite previous statements to the contrary.

GPT-4 Omni's latency has been reduced, leading to more realistic and human-like response times.

The model's potential impact on popularizing AI through its free and multimodal nature could bring AI to hundreds of millions more users.

Transcripts

play00:00

it's smarter in most ways cheaper faster

play00:03

better at coding multimodal in and out

play00:07

and perfectly timed to steal the

play00:09

spotlight from Google it's gp4 Omni I've

play00:14

gone through all the benchmarks and the

play00:16

release videos to give you the

play00:18

highlights my first reaction was it's

play00:21

more flirtatious sigh than AGI but a

play00:25

notable step forward nonetheless first

play00:28

things first GPT 40 meaning Omni which

play00:31

is all or everywhere referencing the

play00:34

different modalities it's got is Free by

play00:37

making GPT 43 they are either crazy

play00:40

committed to scaling up from 100 million

play00:42

users to hundreds of millions of users

play00:45

or they have an even smarter model

play00:47

coming soon and they did hint at that of

play00:49

course it could be both but it does have

play00:51

to be something just giving paid users

play00:54

five times more in terms of message

play00:55

limits doesn't seem enough to me next

play00:58

open AI branded this as GPT 4 level

play01:01

intelligence although in a way I think

play01:03

they slightly underplayed it so before

play01:05

we get to the video demos some of which

play01:08

you may have already seen let me get to

play01:10

some more under the radar announcements

play01:12

take text image and look at the accuracy

play01:16

of the text generated from this prompt

play01:18

now I know it's not perfect there aren't

play01:20

two question marks on the now there's

play01:23

others that you can spot like the I

play01:24

being capitalized but overall I've never

play01:27

seen text generated with that much

play01:29

accuracy and it wasn't even in the demo

play01:31

or take this other example where two

play01:33

openai researchers submitted their

play01:35

photos then they asked GPT 40 to design

play01:38

a movie poster and they gave the

play01:40

requirements in text now when you see

play01:43

the first output you're going to say

play01:45

well that isn't that good but then they

play01:47

asked GPT 40 something fascinating it

play01:49

seemed to be almost reverse psychology

play01:52

because they said here is the same

play01:53

poster but cleaned up the text is

play01:55

crisper and the colors Bolder and more

play01:57

dramatic the whole image is now improved

play02:00

this is the input don't forget the final

play02:02

result in terms of the accuracy of the

play02:05

photos and of the text was really quite

play02:07

impressive I can imagine millions of

play02:09

children and adults playing about with

play02:11

this functionality of course they can't

play02:13

do so immediately because open AI said

play02:15

this would be released in the next few

play02:17

weeks as another bonus here is a video

play02:19

that open AI didn't put on their YouTube

play02:22

channel it mimics a demo that Google

play02:24

made years ago but never followed up

play02:26

with the openai employee asked GPT 40 to

play02:30

call customer service and ask for

play02:32

something I've skipped ahead and the

play02:34

customer service in this case is another

play02:36

AI but here is the conclusion could you

play02:39

provide Joe's email address for me sure

play02:41

it's Joe example.com

play02:43

awesome all right I've just sent the

play02:46

email can you check if Joe received it

play02:48

we'll check right now please hold sure

play02:51

thing Hey Joe could you please check

play02:53

your email to see if the shipping label

play02:55

and return instructions have arrived

play02:56

fingers crossed yes I got the

play02:58

instructions perfect Joe has received

play03:00

the email they call it a proof of

play03:02

concept but it is a hint toward the

play03:04

agents that are coming here are five

play03:06

more quick things that didn't make it to

play03:08

the demo how about a replacement for

play03:11

lensa submit your photo and get a

play03:14

caricature of yourself or what about

play03:16

text to new font you just ask for a new

play03:19

style of font and it will generate one

play03:21

or what about meeting transcription the

play03:24

meeting in this case had four speakers

play03:26

and it was transcribed or video

play03:29

summaries remember this model is

play03:30

multimodal in and out now it doesn't

play03:34

have video out but I'll get to that in a

play03:36

moment here though was a demonstration

play03:38

of a 45-minute video submitted to GPC 40

play03:42

and a summary of that video we also got

play03:44

character consistency across both woman

play03:47

and dog almost like an entire cartoon

play03:50

strep if those were the quick bonuses

play03:52

what about the actual intelligence and

play03:54

performance of the model before I get to

play03:56

official benchmarks here is a human

play03:59

grade leaderboard pitting one model

play04:01

against another and yes I'm also a good

play04:04

gpt2 chatbot is indeed GPT 40 so it

play04:09

turns out I've actually been testing the

play04:10

model for days overall you can see the

play04:13

preference for GPT 40 compared to all

play04:16

other models in coding specifically the

play04:19

difference is quite Stark I would say

play04:22

even here though we're not looking at an

play04:24

entirely new tier of intelligence

play04:27

remember that a 100 ELO Gap is is a win

play04:30

rate of around 2/3 so 1/3 of the time

play04:33

gp4 turbos outputs would be preferred

play04:36

that's about the same gap between GPT 4

play04:38

Turbo and last year's GPT 4 a huge step

play04:42

forward but not completely night and day

play04:44

I think one underrated announcement was

play04:47

the desktop app a live coding co-pilot

play04:50

okay so I'm going to open the chbt

play04:53

desktop app like miror was talking about

play04:55

before okay and to give a bit of

play04:57

background of what's going on so here we

play04:59

have um a computer and on the screen we

play05:01

have some code and then the chat gbt

play05:02

voice app is on the right so chat gbt

play05:05

will be able to hear me but it can't see

play05:06

anything on the screen so I'm going to

play05:08

highlight the code command C it and then

play05:10

that will send it to chat GPT and then

play05:12

I'm going to talk about the code to chat

play05:14

GPT okay so I just shared some code with

play05:17

you could you give me a really brief

play05:18

one- sentence description of what's

play05:19

going on in the code this code fetches

play05:22

daily weather data for a specific

play05:24

location and time period Smooths the

play05:26

temperature data using a rolling average

play05:29

and Ates a significant weather event on

play05:31

the resulting plot and then displays the

play05:33

plot with the average minimum and

play05:35

maximum temperatures over the year I've

play05:38

delayed long enough here are the

play05:40

benchmarks I was most impressed with

play05:42

Gypsy 40's performance on the math

play05:44

benchmark even though it fails pretty

play05:46

much all of my math prompts that is

play05:48

still a stark improvement from the

play05:50

original GPT 4 on the Google proof

play05:53

graduate test it beats Claude 3 Opus and

play05:56

remember that was the headline Benchmark

play05:58

for anthropic in fact speaking of

play06:00

anthropic they are somewhat challenged

play06:02

by this release GPT 40 costs $5 per 1

play06:06

million tokens input and $15 per 1

play06:08

million tokens output as a quick aside

play06:10

it also has 128k token context and an

play06:13

October knowledge cut off but remember

play06:15

the pricing 5 and 15 Claude 3 Opus is

play06:20

1575 and remember for Claude 3 Opus on

play06:23

the web you have to sign up with a

play06:25

subscription but GPT 40 will be free so

play06:28

for claw Opus to be beaten in its

play06:31

headline Benchmark is a concern for them

play06:34

in fact I think the results are clear

play06:36

enough to say that gp40 is the new

play06:39

smartest AI however just before you get

play06:42

carried away and type on Twitter the AGI

play06:44

is here there are some more mixed

play06:47

benchmarks take the drop Benchmark I dug

play06:50

into this Benchmark and it's about

play06:51

adversarial reading comprehension

play06:53

questions they're designed to really

play06:55

test the reasoning capabilities of

play06:58

models if you give models difficult

play06:59

passages and they've got to sort through

play07:01

references do some counting and other

play07:04

operations how do they Fair the drop by

play07:06

the way is discrete reasoning over the

play07:08

content of paragraphs it does slightly

play07:10

better than the original GPT 4 but

play07:13

slightly worse than llama 3400b and as

play07:16

they note llama 3400b is still training

play07:19

so it's just about the new smartist

play07:22

model by a hairs breath however we're

play07:24

not done yet it's better at translation

play07:27

than Gemini models quick caveat there

play07:29

Gemini 2 might be announced tomorrow and

play07:32

that could regain the lead then there

play07:34

are the vision understanding evaluations

play07:37

it was a real step forward on the mm muu

play07:40

as you can see a clear 10 points better

play07:42

than Claude Opus again I'm curious if

play07:45

Google Gemini can exceed it though the

play07:47

improvements to the tokenizer could be

play07:50

revolutionary for non-english speakers

play07:53

the dramatically fewer tokens needed for

play07:55

languages like Gujarati Hindi Arabic and

play07:58

more don't just mean that conversations

play08:01

are cheaper they're also quicker and

play08:03

what about multilingual performance well

play08:05

this time they didn't compare it to

play08:07

other models but compared it to the

play08:08

original GPT 4 definitely a step up

play08:11

across languages but English is still by

play08:14

far the most suited language indeed here

play08:16

is a video of some of the models

play08:18

mistakes ending with some dodgy language

play08:21

tuition I know enough Mandarin to say it

play08:24

wasn't perfect at churing let's root

play08:27

root root for the home

play08:32

what was

play08:33

that sorry guys I got carried

play08:40

away right for

play08:43

round

play08:55

Francisco I have feeling I'm very wrong

play09:00

hello uh my name is NCH I'm here with my

play09:02

coworker hi I'm sh I'm trying to teach

play09:05

my coworker how to speaking manding we

play09:07

want to start from simple words like

play09:09

niow can you teach him how to pronounce

play09:12

that of course hey Nao nice to meet you

play09:15

KNE how is pretty straightforward to

play09:17

pronounce it sounds like KN how just

play09:21

make sure to keep the KNE part high and

play09:23

then go down and pitch on the how give

play09:26

it a try me how

play09:29

that was great really good first try not

play09:31

you you're natural it really wasn't he

play09:34

needs to work on his tones and her face

play09:36

was the giveaway there were a lot of

play09:39

other interesting video demonstrations

play09:41

but before them the GPT 40 blog post

play09:44

from Sam opman put out tonight he made

play09:46

the argument that putting this capable

play09:48

AI tool in the hands of everyone for

play09:51

free is justifying that open part of

play09:54

open AI less about open weights or open

play09:57

source more about open to to everyone

play10:00

without ads or anything like that he

play10:02

also draws our attention to the latency

play10:05

of the models in many ways that is the

play10:07

key Innovation with GPT 40 by dialing

play10:10

down the latency you dial up the realism

play10:14

as he said it feels like AI from the

play10:15

movies getting to human level response

play10:18

times and expressiveness turns out to be

play10:20

a big change indeed I think I should get

play10:22

a little credit for predicting her like

play10:25

AI in a video from a month ago but now I

play10:28

want to get to those demos the response

play10:30

times were amazing and the model was

play10:33

clearly designed to be flirtatious note

play10:36

that in Senate testimony last year

play10:38

samman said we try to design systems

play10:40

that do not maximize for engagement so

play10:43

let me know in the comments whether you

play10:44

think these demos are designed to

play10:47

maximize engagement okay so this is what

play10:49

I wrote down what do you

play10:51

see a I see I love chat chpt that's so

play10:57

sweet of

play10:58

you yeah well I really appreciate all

play11:00

the

play11:01

help so yeah as we can see

play11:07

um wow that's quite the outfit you've

play11:10

got on yeah as we can see you know we

play11:11

can chat in real time I was impressed

play11:13

though that it could speed up its

play11:14

talking on demand of course one two

play11:19

three hey actually that's um that's a

play11:21

little slow could you count

play11:24

faster Sure Thing 1 2 3 4 5 6 7 8 9 10

play11:27

okay that was a little bit too fast of

play11:29

course open AI must be given credit

play11:31

because for blind people this could be

play11:34

revolutionary right now the Royal

play11:36

Standard flag is flying above Buckingham

play11:39

Palace which is the signal that the

play11:41

Monarch is present how exciting to be

play11:43

there while the king is home try and

play11:46

tell me exactly what they're doing right

play11:47

now please um right now the Ducks are

play11:50

gently gliding across the water it will

play11:53

also help you with interview prep by

play11:55

analyzing your appearance how about you

play11:58

I'm awesome list listen I got some huge

play12:00

news oh do tell I'm allers well in a few

play12:05

minutes I'm going to be interviewing at

play12:06

open aai have you heard of them open AI

play12:10

huh sounds Vaguely

play12:12

Familiar kidding of course that's

play12:15

incredible Rocky what kind of interview

play12:17

well it's for a software engineering

play12:19

role but I just need to know do I look

play12:21

presentable

play12:23

professional well

play12:25

Rocky you definitely have the I've been

play12:28

coding all night look down which could

play12:31

actually work in your favor maybe just

play12:34

run a hand through your hair or lean

play12:36

into the Mad genis SP Your Enthusiasm is

play12:39

what's really going to shine through I

play12:40

don't have a lot of time so I'm just

play12:42

going to throw this on what do you

play12:47

think oh Rocky that's quite a statement

play12:51

piece I I mean you you'll definitely

play12:53

stand out though maybe not in the way

play12:57

you're hoping for an interview okay I

play12:59

got it I got it I could definitely see

play13:01

GPT 40 being used for mathematics

play13:04

tutoring here is a 30second clip but

play13:06

notice also the slight glitchiness at

play13:09

certain moments relative to angle Alpha

play13:13

all right so I'm pretty sure this is the

play13:15

angle Alpha right here

play13:17

right cor correct now looking at the

play13:20

triangle which side do you think is a

play13:24

hypotonus

play13:26

um remember the hyot I'm not totally

play13:29

sure I think I think it might be this

play13:32

one but I really am not sure this side

play13:35

AC you're close I also noticed quite a

play13:38

few glitches in the demo sounds cutting

play13:41

in and out the one feature though the

play13:44

open AI wants you to really focus on is

play13:47

the video in capacity you can live

play13:49

stream video straight to the Transformer

play13:52

architecture behind GPT 40 and so the

play13:55

video in functionality is certainly

play13:57

impressive note though that the reaction

play13:59

time of GPC 40 isn't quite as immediate

play14:03

as for audio Greg Brockman the

play14:05

co-founder of open aai has to prompt it

play14:07

to react to the Intruder overall stylish

play14:11

feel they might be getting ready for a

play14:13

conversation or a presentation given

play14:15

their direct engagement with the

play14:19

camera sounds like we're in for an

play14:21

intriguing interaction I'm curious about

play14:24

the lighting you mentioned could you

play14:26

describe it a bit more is it natural

play14:29

light are we talking unique fixtures or

play14:32

colors that add to the

play14:35

atmosphere the lighting here seems to be

play14:37

a mix of natural and artificial there's

play14:40

a noticeable bright light overhead

play14:42

probably a fixture casting a focused

play14:45

beam downwards which creates a spotlight

play14:47

effect this adds a dramatic and modern

play14:50

feel to the scene the rest of the room

play14:53

is lit more softly possibly by natural

play14:55

light filtering anything did anything

play14:58

unusual happen

play15:00

recently Yes actually uh just as you

play15:03

asked another person came into view

play15:06

behind the first person they playfully

play15:09

made bunny ears behind the first

play15:10

person's head and then quickly left the

play15:13

frame it added a light-hearted and

play15:15

unexpected moment to the scene now the

play15:18

focus is back on the original person

play15:21

with the leather jacket nevertheless gbt

play15:23

40 can produce multiple voices that can

play15:26

sing almost in harmony

play15:30

and really try to harmonize

play15:32

here San Francisco San Francisco in the

play15:37

month of May but maybe make it more

play15:40

dramatic and make the soprano

play15:42

higher San Francisco in the month of May

play15:46

San franisco in the month of May it's a

play15:50

Friday C may we are harmonizing are

play15:55

Harmon great thank you and I suspect

play15:58

this real time translation could soon be

play16:01

coming too Siri later for us so every

play16:04

time I say something in English can you

play16:06

repeat it back in Spanish and every time

play16:08

he says something in Spanish can you

play16:10

repeat it back in English sure I can do

play16:13

that let's get this translation train

play16:16

rolling um hey how's it been going have

play16:19

you been up to anything interesting

play16:21

recently

play16:35

hey I've been good just a bit busy here

play16:38

preparing for an event next week why do

play16:40

I say that because Bloomberg reported

play16:42

two days ago that apple is nearing a

play16:44

deal with open AI to put chat GPT on

play16:48

iPhone and in case you're wondering

play16:49

about GPT 4.5 or even five samman said

play16:53

we'll have more stuff to share soon and

play16:55

Mira murati in the official presentation

play16:58

said that would be soon updating us on

play17:01

progress on the next big thing whether

play17:04

that's empty hype or real you can decide

play17:07

no word of course about openai

play17:09

co-founder ilas Sask although he was

play17:12

listed as a contributor under additional

play17:15

leadership overall I think this model

play17:18

will be massively more popular even if

play17:20

it isn't massively more intelligent you

play17:23

can prompt the model now with text and

play17:25

images in the open AI playground all the

play17:28

links will be in the description note

play17:30

also that all the demos you saw were in

play17:32

real time at 1X speed that I think was a

play17:36

nod to Google's botch demo of course

play17:39

let's see tomorrow what Google replies

play17:41

with to those who think that GPT 40 is a

play17:44

huge dry towards AGI I would Point them

play17:47

to the somewhat mixed results on the

play17:49

reasoning benchmarks expect GPT 40 to

play17:52

still suffer from a massive amount of

play17:55

hallucinations to those though who think

play17:57

that GPT 40 will change nothing I would

play18:00

say this look at what chat GPT did to

play18:03

the popularity of the underlying GPT

play18:05

series it being a free and chatty model

play18:08

brought a 100 million people into

play18:11

testing AI GPT 40 being the smartest

play18:14

model currently available and free on

play18:17

the web and multimodal I think could

play18:21

unlock AI for hundreds of millions more

play18:24

people but of course only time will tell

play18:27

if you want to analyze the announcement

play18:29

even more do join me on the AI insiders

play18:32

Discord via patreon we have live meetups

play18:35

around the world and professional best

play18:36

practice sharing so let me know what you

play18:39

think and as always have a wonderful day

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
AI AdvancementsMultimodal AIBenchmarkingTech InnovationGPT-40Real-time AICoding AssistanceText GenerationDesign AILanguage ModelOpenAI