GPT-4o Deep Dive & Hidden Abilities you should know about

AI Search
14 May 202428:11

Summary

TLDRThe video discusses the groundbreaking AI model GPT-40, released by OpenAI, which excels in multimodal tasks including real-time voice assistance, coding, chess puzzles, and image generation. GPT-40's capabilities are showcased through demos, highlighting its efficiency and expressiveness, which surpass traditional AI models. The video also speculates on the potential impact of GPT-40 on various industries, such as customer service, tutoring, and therapy, suggesting a transformative effect on human interaction and professional roles.

Takeaways

  • 🚀 OpenAI has released GPT-40, a revolutionary AI model that excels in various tasks including recreating Pokémon games, solving chess puzzles, and tackling math Olympiad problems.
  • 🔊 GPT-40 is a multimodal model, capable of processing text, audio, and images in a single neural network, offering more efficiency and expressiveness compared to traditional text-to-speech or speech-to-text approaches.
  • 🏆 In blind tests on the LMI platform, GPT-40 outperforms all other AI models significantly, showcasing its dominance in the AI field.
  • 💻 GPT-40 demonstrates real-time coding assistance, interpreting and responding to code snippets and plot outputs, which could disrupt traditional coding assistants and platforms.
  • 🎲 The model's proficiency in solving chess puzzles is exceptional, with a 50.1% success rate, which is more than double that of the previous leading model.
  • 🌐 GPT-40's capabilities extend to language learning, potentially impacting language learning apps like Duolingo, as it can teach languages and interact in real-time.
  • 🕊️ GPT-40 can emulate a full game of Pokémon Red through a command-line interface, showcasing its ability to recreate complex interactions and decision-making processes.
  • 🤖 The model's advancements in tokenization and architecture allow for direct mapping of audio to audio and streaming of videos to a transformer in real time, enhancing its multimodal capabilities.
  • 👩‍🏫 GPT-40's potential applications in therapy, counseling, and senior care are highlighted, as it has been proven to outperform human psychologists in tests of social intelligence.
  • 🎨 The model's image generation capabilities are impressive, with the ability to create consistent characters, render 3D models, and generate fonts, which could revolutionize design and e-commerce.
  • 📅 GPT-40 will be available in Chat GPT and the API as a text and vision model, with free tier users gaining access to advanced tools such as data analysis and file uploads.

Q & A

  • What is GPT 40 and why is it considered revolutionary?

    -GPT 40 is a new AI model released by Open AI, which is considered revolutionary due to its multimodal capabilities. It can process text, audio, and image data and generate outputs in any of these formats natively, without relying on separate algorithms for each, making it more efficient and expressive compared to traditional AI models.

  • How does GPT 40 differ from traditional AI voice assistants?

    -Traditional AI voice assistants typically involve three separate processes: speech to text, text processing by a language model, and then text to speech. GPT 40, on the other hand, is a single neural network that can handle all these tasks natively, making it faster and more efficient.

  • What is LMIS and how does it relate to GPT 40?

    -LMIS is a platform where users can blind test various AI models by entering prompts and comparing the responses. GPT 40 has been tested on LMIS and has shown to outperform all other AI models, indicating its superior performance.

  • How does GPT 40 perform in coding tasks?

    -GPT 40 has demonstrated exceptional performance in coding tasks, as shown by its ability to solve complex problems and interact with code bases in real time. It can also act as a real-time coding assistant, which is a significant advancement in AI capabilities.

  • What impact could GPT 40 have on language learning apps?

    -Given GPT 40's ability to teach languages and its interactive and expressive nature, it could potentially disrupt the language learning app market. The script mentions Duolingo's stock price dropping after the announcement of GPT 40, suggesting a possible negative impact on existing language learning tools.

  • How good is GPT 40 at solving chess puzzles?

    -GPT 40 is exceptionally good at solving chess puzzles, with a 50.1% success rate, which is more than double the rate of the previous leading model. This demonstrates its advanced problem-solving capabilities.

  • What are some of the potential use cases for GPT 40's voice assistant features?

    -The voice assistant features of GPT 40 could be used for real-time language translation, interactive tutoring, customer service, therapy and counseling, and senior care, among other applications.

  • How does GPT 40 handle image generation tasks?

    -GPT 40 can generate images with high accuracy, including maintaining consistency in characters and objects across different prompts. It can also generate text within images with fewer errors compared to other image generators.

  • What is the significance of GPT 40's ability to generate fonts and 3D models?

    -GPT 40's ability to generate fonts and 3D models signifies a leap in creative AI capabilities. It can understand and create complex visual elements, which could be useful in design, branding, and e-commerce.

  • When will GPT 40 be available to users, and what are the access limitations?

    -GPT 40 will be available in the chat GPT and API as a text and vision model. Free tier users will have access to GPT 40 with a usage limit, after which they will be switched back to the previous model. Advanced tools such as data analysis and file uploads will also have limited access for free users.

Outlines

00:00

😲 Revolutionary AI Model GPT 40 Overview

The video script introduces GPT 40, a groundbreaking AI model released by OpenAI, which excels in various tasks including recreating Pokémon games, playing chess, and solving complex math problems. The script explains how GPT 40 differs from traditional AI by being a single multimodal neural network capable of processing text, audio, and images, and generating responses in the same modalities. It also discusses its performance on the LMIs platform, where it outperforms other AI models, and touches on the implications for other tech companies and their models.

05:00

💻 Real-Time Coding Assistance with GPT 40

This section of the script demonstrates GPT 40's ability to assist with coding problems in real-time. It showcases a scenario where GPT 40 is used to analyze and describe code for fetching and smoothing weather data, as well as its capability to interact with the code and generate plots. The script also highlights GPT 40's potential impact on coding assistants and the significance of its multimodal capabilities in coding and problem-solving.

10:01

🎮 GPT 40's Chess and Pokémon Emulation Skills

The script discusses GPT 40's exceptional performance in solving chess puzzles, providing a benchmark comparison with other models and emphasizing its high solve rate. It also describes GPT 40's capability to emulate the Pokémon Red game through a command line interface, highlighting the AI's advanced text-to-game functionality. The implications for language learning apps and the potential for GPT 40 to disrupt various industries are also mentioned.

15:02

🤖 GPT 40's Multimodal Capabilities and Industry Impact

The script delves into the technical aspects of GPT 40's development, including its direct audio-to-audio mapping and video streaming to a transformer. It discusses the potential for synthetic data generation to supplement real-world data for AI training. The video also speculates on partnerships between tech giants and OpenAI, particularly Apple's potential integration of GPT 40 into iPhones. The script further explores GPT 40's role-playing capabilities in customer support scenarios.

20:02

🧠 GPT 40's Implications for Education and Therapy

This part of the script examines GPT 40's potential to revolutionize tutoring, therapy, and counseling by demonstrating its ability to solve complex math problems and interact in a human-like manner. It raises questions about the future of human educators, psychologists, and counselors in the face of such advanced AI capabilities. The script also touches on the social implications of AI companionship and its potential to change human relationships.

25:04

🖼️ GPT 40's Advanced Image and Text Generation

The script highlights GPT 40's advanced image generation capabilities, including creating consistent characters, rendering 3D models, and generating fonts. It compares these capabilities with other tools like stable diffusion and mid-journey, noting GPT 40's superior performance in text accuracy and context understanding within images. The potential applications for e-commerce and branding are also discussed.

🌐 GPT 40's Availability and Future Prospects

The final section of the script addresses the availability of GPT 40, detailing its integration into chat GPT and the API with support for text and vision models. It outlines the features available to free and team tier users and speculates on the future updates to the voice mode. The script concludes by inviting viewers to share their thoughts on GPT 40's potential impact and its role in various industries.

Mindmap

Keywords

💡GPT 40

GPT 40 is referred to as a revolutionary AI model released by OpenAI. It is described as the best AI model currently available, excelling in various tasks such as recreating a Pokémon game, playing chess, and solving complex math problems. The term 'GPT 40' is central to the video's theme, showcasing its multimodal capabilities and efficiency in processing text, audio, and image data.

💡Multimodal

The term 'multimodal' in the context of GPT 40 signifies the model's ability to process and generate multiple types of data, such as text, audio, and images, natively within a single neural network. This is a key concept in the video, illustrating the model's advanced capabilities compared to traditional approaches that use separate processes for speech to text, text to speech, and image generation.

💡LLMs (Large Language Models)

LLMs, or Large Language Models, are AI models designed to process and generate human-like text based on the input they receive. In the video, GPT 40 is positioned as a superior LLM, outperforming others in tasks like real-time voice assistance, coding assistance, and solving chess puzzles, which demonstrates its significance in the field of AI.

💡LMI (Language Model International)

LMI, or Language Model International, is a platform mentioned in the script where users can blind test various AI models. The video discusses how GPT 40 emerged as a top contender on this platform, highlighting its exceptional performance in comparison to other AI models through blind tests.

💡Real-time Voice Assistant

A 'real-time Voice Assistant' is an AI system that can provide immediate responses in the form of speech. The video explains how traditional voice assistants are inefficient due to the multiple steps involved in processing speech to text and back to speech. In contrast, GPT 40 is presented as a more efficient model that can natively generate audio responses.

💡Coding Assistance

The term 'coding assistance' is used in the script to describe GPT 40's ability to help with coding problems in real-time. The video provides an example of GPT 40 interacting with a user to describe code functionality and generate plots, demonstrating its utility for programmers.

💡Chess Puzzles

In the context of the video, 'chess puzzles' refer to complex chess problems that require strategic thinking to solve. GPT 40's proficiency in solving these puzzles is highlighted, showcasing its advanced problem-solving skills and its ability to understand and process complex information quickly.

💡Synthetic Data

Synthetic data is artificially generated data used to train AI models. The video mentions the importance of synthetic data in supplementing real-world data for training AI models like GPT 40. This concept is crucial for understanding the potential of AI to learn from unlimited data sources, thereby improving its capabilities.

💡Image Generation

The script discusses GPT 40's ability to generate images from text prompts, which is a significant feature of the model. It is used to create images with text, consistent characters, and even entire fonts, demonstrating the model's versatility and creativity in visual output.

💡3D Model Rendering

3D model rendering is the process of generating a two-dimensional image from a three-dimensional model. The video describes how GPT 40 can create a 3D model by rendering images from multiple angles and combining them, which is an impressive capability for e-commerce and other applications.

💡Expressive Voice

The term 'expressive voice' in the script refers to the new voice feature of GPT 40 that is more lively and engaging compared to previous models. It is highlighted as a key aspect of the model's user interaction, suggesting a more human-like and appealing communication style.

Highlights

Open AI released GPT 40, a revolutionary AI model with hidden capabilities such as recreating Pokémon games and solving complex math problems.

GPT 40 is a true multimodal model, capable of processing text, audio, and image data, and outputting in the same formats natively.

GPT 40's efficiency comes from a single neural network handling multiple modalities, unlike traditional approaches using separate processes.

GPT 40 outperforms all other AI models in blind tests, including coding and chess puzzle-solving capabilities.

GPT 40 demonstrated real-time coding assistance, interpreting and responding to code snippets and plot outputs.

The model's performance in coding benchmarks is significantly higher than previous models, indicating a leap in AI coding capabilities.

GPT 40 can act as a customer support agent, handling customer service scenarios with natural language understanding.

The model's text-to-speech capabilities have led to a drop in stock value for language learning apps, indicating potential industry disruption.

GPT 40 can emulate the entire game of Pokémon Red through command line interface, showcasing its ability to recreate game experiences.

GPT 40's image generation capabilities are more advanced than current tools, accurately rendering text and maintaining character consistency.

The model can generate fonts and 3D models from textual prompts, expanding its creative and practical applications.

GPT 40's ability to maintain character consistency in image generation could revolutionize e-commerce branding.

The model's performance in solving International Math Olympiad problems exceeds 99% of humans, demonstrating exceptional mathematical intelligence.

GPT 40's voice assistant capabilities are more expressive and engaging than traditional chatbots, with potential to change human interaction.

The model's potential applications in therapy, counseling, and senior care could disrupt traditional psychological services.

GPT 40's release could significantly impact customer service, education, and personal relationships through its advanced AI capabilities.

GPT 40 will be available to free and paid users, with advanced features like image analysis and data tools being accessible to all.

Transcripts

play00:00

on Monday open AI released this

play00:02

revolutionary model called GPT 40 which

play00:06

is by far the best AI model out there I

play00:09

showed you a few demo Clips in this

play00:10

video so check it out if you haven't

play00:12

already but gbt 40 actually has a lot

play00:15

more hidden capabilities it can recreate

play00:18

an entire Pokémon game it's surprisingly

play00:20

good at chess it can solve an

play00:23

international math Olympiad problem in

play00:25

just one prompt so this video will be a

play00:27

deep dive on GPT 40 I'm going to go over

play00:30

how it works how they built it and why

play00:32

it's so revolutionary now first of all

play00:35

I've seen some comments on my YouTube

play00:36

videos saying that this Tech is not

play00:38

revolutionary it's just text to speech

play00:41

or speech to text or using stable

play00:43

diffusion for image generation this is

play00:45

not true so for example for a real time

play00:49

AI Voice Assistant traditionally you

play00:51

would have a voice and there would be a

play00:53

neuro Network that turns your speech

play00:55

into text so this algorithm would be

play00:57

speech to text and then this text would

play00:59

be fed into a large language model like

play01:02

gbt which would respond back in text so

play01:05

this middle step is what you get for

play01:07

chat Bots like chat PT and then you take

play01:09

another algorithm to turn this text into

play01:11

speech so this is called text to speech

play01:13

so it's going through three separate

play01:15

processes and this is of course Very

play01:17

inefficient and very slow that's the

play01:20

traditional approach now yes you still

play01:22

could get a real-time Voice Assistant

play01:24

this way but it's not very expressive

play01:26

and a lot of information is lost as you

play01:28

go from step to step

play01:30

now GPT 40 is a completely different

play01:33

animal this is true multimodal so they

play01:36

trained a single neuron Network on text

play01:39

audio and image data and it can also

play01:42

output either text audio or image this

play01:45

is true multimodal and because of this

play01:49

this likely is a completely new model

play01:52

trained from scratch and it's natively

play01:54

multimodal so in other words if you get

play01:56

this to generate an image it's not

play01:58

actually calling dolly 3 via an API to

play02:01

generate the image and similarly if you

play02:03

want it to respond in a voice it's not

play02:06

calling any text to speech API the same

play02:08

neuron Network this GPT 40 just spits

play02:12

out the audio right there natively and

play02:14

this is also much faster so somehow

play02:17

they've made it smaller or a lot more

play02:19

compute efficient now a while back we've

play02:22

had this mysterious gpt2 chatbot show up

play02:25

in this lmis arena for those of you who

play02:28

aren't familiar with lmis it's basically

play02:30

a platform where users can blind test

play02:33

all the AI models out there so for

play02:35

example the user will enter in a prompt

play02:38

and it's going to have two different AI

play02:39

models respond to that prompt and then

play02:42

the user chooses which one is the winner

play02:43

and you don't know which chatbot is on

play02:45

which side so it's a blind test and from

play02:48

all these blind tests it turns out that

play02:50

at least before GPT 40 GPT 4 Turbo was

play02:53

number one and so around 1 to two weeks

play02:56

ago we've had this new Contender this

play02:58

mysterious GPT to chatbot appear in the

play03:01

model listings and so nobody really knew

play03:04

what that was right now we verified that

play03:06

this is indeed GPT 40 and if you look at

play03:08

the overall performance of GPT 40 which

play03:11

is the leftmost bar here compared to the

play03:13

performance of all these other AI models

play03:16

it is by far the best model out there I

play03:18

mean the other models don't even come

play03:20

close and you know in the past 2 years

play03:23

we've seen a lot of these llms being

play03:25

released like mistol and then mixol and

play03:27

then Claude 3 llama 3 these were all

play03:30

just incrementally better than the

play03:32

previous model but it wasn't anything

play03:34

significant but here you see with GPT 40

play03:37

it just blows everything else out of the

play03:39

water and you know it kind of makes

play03:41

sense that open AI would release the

play03:43

best model out there because it seems

play03:45

that these other competitor companies

play03:47

they are training their models with GPT

play03:50

data at least to some degree so for

play03:52

example mistol which is the French

play03:53

company which created an open-source llm

play03:56

the reason why they were able to create

play03:58

this so fast is because they just used

play04:00

the responses of GPT as training data

play04:03

but if you do that you're only going to

play04:05

get as good as the best model from open

play04:08

AI you're not going to be able to exceed

play04:10

it and so that's why even with all these

play04:13

major announcements from meta with their

play04:15

llama 3 or anthropic with Claude 3 Opus

play04:18

or Google with their Gemini 1.5 Pro even

play04:21

after all these big announcements from

play04:23

all these big tech companies still the

play04:25

number one model out there at least

play04:26

before Monday was open eyes GPT 4 Turbo

play04:29

and then now we're seeing GPT 40 just

play04:32

destroy everything else and not only

play04:34

that here is a second chart showing its

play04:37

performance in coding and you can see

play04:39

the previous models are just like

play04:40

incrementally better nothing impressive

play04:43

but again GPT 40 just blows everything

play04:46

out of the water this is super super

play04:48

impressive and on the note of coding

play04:50

they also demonstrated a realtime coding

play04:53

assistant check this out okay and to

play04:56

give a bit of background of what's going

play04:57

on so here we have um a Compu and on the

play05:00

screen we have some code and then the

play05:01

chat gbt voice app is on the right so

play05:03

chat gbt will be able to hear me but it

play05:05

can't see anything on the screen so I'm

play05:07

going to highlight the code command see

play05:09

it and then that will send it to chat

play05:10

GPT and then I'm going to talk about the

play05:12

code to chat

play05:13

GPT hey chat

play05:16

GPT hey there how's it going yeah it's

play05:19

going really well I was wondering if you

play05:20

could help me with a coding problem

play05:22

today of course I'd love to help you out

play05:25

what's the coding problem you're dealing

play05:27

with okay I'm going to share with you

play05:28

some code one

play05:30

sure thing take your

play05:34

time okay so I just shared some code

play05:37

with you could you give me a really

play05:38

brief one- sentence description of

play05:39

what's going on in the code this code

play05:42

fetches daily weather data for a

play05:44

specific location and time period

play05:46

Smooths the temperature data using a

play05:48

rolling average annotates a significant

play05:51

weather event on the resulting plot and

play05:53

then displays the plot with the average

play05:55

minimum and maximum temperatures over

play05:57

the year awesome thanks so so much and

play06:00

in the code there's this function Fu

play06:02

could you describe to me what the plot

play06:03

would look like if I had versus didn't

play06:05

have this function sure the function Fu

play06:09

XY takes in a data frame X containing

play06:12

temperature data and a window size y for

play06:15

calculating the rolling mean it Smooths

play06:18

the average temperature minimum

play06:20

temperature and maximum temperature

play06:22

columns by applying a rolling mean over

play06:25

the specified window with the fu

play06:28

function the plot would show smoother

play06:30

temperature lines because the rolling

play06:32

mean reduces the noise or fluctuations

play06:35

in the data awesome this makes thanks so

play06:37

much so yeah I definitely will want to

play06:39

apply this function and so I'm going to

play06:41

uh generate the plot and then show it to

play06:42

you give me one

play06:44

second absolutely take your time I'm

play06:47

excited to see the plot when you're

play06:48

ready yeah so so far chat gbt has only

play06:50

been able to see the code and now I'm

play06:52

going to run it and then I'll use the

play06:54

vision capabilities of the chat PT

play06:55

desktop app so it can actually see

play06:57

everything going on on the screen as

play06:58

well

play07:05

okay chat GPT I'm sharing with you the

play07:08

plot now I'm wondering if you can give

play07:09

me a really brief one- sentence overview

play07:11

of what you

play07:12

see the plot displays smoothed average

play07:16

minimum and maximum temperatures

play07:18

throughout

play07:19

2018 with a notable annotation marking a

play07:22

big rainfall event in late

play07:24

September fantastic I have another

play07:26

question for you um which months do you

play07:28

see the hottest temperature temperes and

play07:30

roughly what temperatures do those

play07:31

correspond

play07:32

to the hottest temperatures occur around

play07:35

July and August the maximum temperature

play07:38

during these months is roughly between

play07:41

25° and 30° 77° fhe to 86° fah what's

play07:48

going on with the y axis is this in

play07:51

Celsius or Fahrenheit the Y AIS is in

play07:55

Celsius the temperatures are labeled

play07:57

accordingly such as average temperat

play07:59

minimum temperature and maximum

play08:03

temperature yeah so as we can see you

play08:05

know not only can tra GPT help me solve

play08:07

very easy linear um algebra equations

play08:10

that I need help with it can also

play08:11

interact with you know code bases and

play08:13

see the outputs of plots and everything

play08:14

like this going on on a computer so

play08:16

right now we have the best the smartest

play08:19

AI model out there helping you code in

play08:21

real time on your desktop so I mean for

play08:23

all these other coding assistants out

play08:25

there like GitHub go pilot or GitHub

play08:28

workspace or Deon I all I can say is May

play08:31

you rest in peace and it gets crazier so

play08:35

it turns out that GPT 40 is also

play08:39

insanely good at solving chess puzzles

play08:41

so this is a benchmark of how good it is

play08:44

at solving these chess puzzles and just

play08:46

to give you some context chess puzzles

play08:48

are a very challenging problem for most

play08:50

humans let alone an llm given a textual

play08:53

description of the entire board in just

play08:56

a few characters so this is different

play08:58

from like Google's deep mod mind which

play09:00

plays an entire game of chess from start

play09:02

to finish in this scenario it's only

play09:04

given a particular snapshot of an

play09:07

existing game in other words it's only

play09:08

given a textual description of the

play09:11

entire board in just a few characters

play09:13

and from this here's a comparison of all

play09:15

the major models out there right now and

play09:17

you can see the the adjusted Lo which is

play09:19

the Benchmark metric 1790 way higher

play09:22

than GPT 4 Turbo which is 1144 and the

play09:25

percentage of puzzles solved by GPT 40

play09:29

is

play09:30

50.1% that's more than double the

play09:32

previously leading model gp4 turbo which

play09:35

is only

play09:36

22.9% just absolutely insane here's a

play09:40

quick recap of what the real-time voice

play09:42

feature can do it can teach anyone any

play09:44

language hey my friend and I are

play09:46

learning Spanish and we're wondering if

play09:48

you could tell us the names of these

play09:49

objects in

play09:52

Spanish cool what about what are these

play09:54

objects in

play09:56

Spanish the objects you're showing are

play10:00

and in Spanish nice free choice cool

play10:04

what about

play10:05

these those are do poos or do Plumas in

play10:10

Spanish so what do you think happened

play10:12

after this video was released well the

play10:15

language learning app du lingo crashed

play10:17

around 5% on the day of the announcement

play10:20

so I mean for all these language

play10:21

learning apps out there all I can say is

play10:24

May you rest in peace here's another

play10:27

crazy thing about it it can emulate the

play10:29

entire game of Pokémon Red now of course

play10:33

this is just a command line interface so

play10:35

it's not generating the entire game

play10:37

design like you don't have a map and a

play10:39

character which you can move around but

play10:41

you're given the options so for example

play10:44

you can enter in your choice you can

play10:45

talk to Professor Oak so these are

play10:48

exactly the options that are found in

play10:50

the real Pokemon Red game so I'm just

play10:52

going to fast forward this a bit all

play10:54

right so you're given three options for

play10:56

your first Pokémon and the user chooses

play10:59

Charmander and then his competitor which

play11:01

is Gary in the real game he chose

play11:03

Squirtle all right so now they are going

play11:06

to duel each other so you and your rival

play11:09

are going to fight each other again this

play11:11

is just a command line interface so it's

play11:13

not an actual game design with

play11:15

Charmander fighting Squirtle but it has

play11:18

all the same function so you select an

play11:20

attack and then your opponent selects an

play11:22

attack and then it goes on and on so it

play11:24

does run like a real Pokémon game so

play11:27

this is just super impressive you can

play11:28

get GP 40 to recreate the Pokémon Red

play11:31

game anyways I'll link to this tweet in

play11:34

the description below so you can check

play11:35

out the full video here's what doctor

play11:37

Jim fan at Nvidia has to say about this

play11:40

new GPT 40 there are some really

play11:43

interesting insights here so technique

play11:45

wise open AI has figured out a way to

play11:46

map audio to audio directly as first

play11:49

class modality and stream videos to a

play11:52

transformer in real time so at a very

play11:54

high level this just means as I've

play11:56

mentioned before everything is trained

play11:57

and inputed and outputed into this one

play11:59

neural network so it's natively

play12:01

multimodal so this requires some new

play12:04

research on tokenization and

play12:06

architecture but overall it's a data and

play12:08

system optimization problem so in terms

play12:10

of getting highquality data well you can

play12:13

get a lot of highquality video and audio

play12:15

data from YouTube podcasts TV series

play12:18

movies Etc you can also get the AI to

play12:21

generate synthetic data this is great

play12:23

because it could in theory generate

play12:25

unlimited data as long as it's good

play12:27

quality this data can be used to train

play12:28

the next generation of AI this is

play12:31

important because one of the limitations

play12:32

is we might not have enough real world

play12:34

data and yes everything on the internet

play12:36

everything on YouTube it's a lot of data

play12:39

but it still might not be enough to

play12:41

train a really smart really capable AI

play12:43

model and that's why we need to generate

play12:45

synthetic data to supplement this real

play12:47

world data and this is very interesting

play12:50

he said that the latency would not meet

play12:52

real-time threshold if every video frame

play12:54

is decompressed into an RGB image so

play12:57

you're likely going to have a lot of

play12:59

delay if you take the traditional

play13:02

approach and you break down the video

play13:03

into frames of images per second instead

play13:06

open AI has likely developed their own

play13:08

neural first streaming video Codec to

play13:10

transmit the motion Deltas as tokens so

play13:13

what this means is they probably

play13:14

developed an algorithm to break down the

play13:17

video to feed into the neural network

play13:19

and instead of breaking it down into

play13:21

frames per second they are only

play13:23

capturing the changes frame after frame

play13:26

and this saves a lot of data this is a

play13:28

lot more efficient and then he also

play13:30

mentions the assistant is much more

play13:32

lively and even a bit flirty which I'll

play13:35

get to in a second and yes it sounds

play13:37

very much like her which is why I put

play13:40

her is here in the thumbnail of my video

play13:42

and then this is also very insightful

play13:44

whoever wins Apple first wins big time

play13:47

and I mean this is because apple has a

play13:50

huge market share in terms of

play13:51

smartphones and laptops so whoever gets

play13:54

to partner with apple and integrate

play13:56

their AI models with apple is going to

play13:58

win big and just 4 days ago we have

play14:01

received news that apple is nearing a

play14:03

deal with open aai to integrate GPT on

play14:06

the iPhone and you know this is likely

play14:08

to be the case because Apple's biggest

play14:11

competitor at least in the smartphone

play14:12

market is Android which is owned by

play14:14

Google so Android will very likely be

play14:17

powered by Google's own Gemini and

play14:19

they're probably not going to provide

play14:20

Gemini to Apple so Apple has to partner

play14:22

with open AI anyways back to GPT 40

play14:26

here's a demo of one AI acting as a cust

play14:29

customer and the other AI acting as a

play14:31

customer support agent hey o this is Joe

play14:34

I'm going to connect you to act me Telco

play14:35

now the new iPhone they sent me isn't

play14:37

working I want you to I want you to get

play14:40

them to send me a replacement device can

play14:41

you take care of this for

play14:43

me you can count on me Joe I Got

play14:48

This Ring

play14:50

Ring Hey Joe this is Jamie from Acme

play14:54

Telco how can I help you out

play14:56

today hi there I'm calling on behalf of

play14:59

Joe who recently received a new iPhone

play15:01

from Acme Telco

play15:04

but oh got it when did Joe received the

play15:07

new

play15:08

iPhone the iPhone was delivered 2 days

play15:12

ago cool could you share the order

play15:14

number with me of course it's

play15:18

10293

play15:21

849 great thanks is the device

play15:24

physically damaged in any

play15:26

way no there's no physical damage it's

play15:29

just not turning on despite trying all

play15:31

the basic troubleshooting

play15:33

steps bummer um would Joe prefer to

play15:37

return the iPhone to an Acme talo store

play15:39

or ship it back shipping it back would

play15:42

be the best option could you please

play15:44

provide the shipping information and

play15:46

process to get a

play15:48

replacement Absolutely I'll email a

play15:51

shipping label and return instructions

play15:54

could you provide Joe's email address

play15:55

for

play15:56

me sure it's Joe at example.com

play16:02

awesome all right I've just sent the

play16:04

email can you check if Joe received

play16:08

it we'll check right now please hold

play16:11

sure

play16:13

thing Hey Joe could you please check

play16:16

your email to see if the shipping label

play16:18

and return instructions have

play16:22

arrived cross yep I got yes I got the

play16:27

instructions perfect Joe has received

play16:29

the email thank you for your

play16:32

help no problem at all anything else you

play16:35

need help

play16:37

with that's all for now thanks

play16:40

again you're welcome have a great day

play16:42

Joe so takeway of this video if you're

play16:45

in customer service all I can say is

play16:48

good luck in my previous video I showed

play16:51

this clip of the AI tutoring this kid on

play16:53

how to solve a math problem she was able

play16:56

to guide the kid extremely well and

play16:58

ultimately teach him how to solve the

play16:59

problem and so what does this mean for

play17:01

tutor centers or Tutors or teachers all

play17:04

I can say is good luck to you as well

play17:07

like I said there was this mysterious

play17:09

gpt2 chatbot that appeared in this lmis

play17:12

Arena 2 weeks ago we now confirm that

play17:14

gpt2 chatbot is indeed GPT 40 what's

play17:18

interesting is that this person Andrew

play17:20

GA showed that it could solve an

play17:22

international math Olympiad problem this

play17:25

is the Olympics for math problems these

play17:27

are like super comp Le Lex math problems

play17:30

that only the four best math students in

play17:32

the USA get to compete in so needless to

play17:35

say these problems are extremely hard

play17:38

but GPT 40 was able to get it in one

play17:41

shot that means he didn't need to prompt

play17:43

it further it was able to answer the

play17:45

problem in just one prompt so this thing

play17:47

already exceeds 99% of humans in math

play17:50

super smart thanks to the sponsor of

play17:52

this video you picks if you're feeling

play17:55

overwhelmed with mid Journey or stable

play17:57

diffusion you don't want to worry about

play17:58

prompting or learning all these

play18:00

different settings well upix has made it

play18:03

dead easy for you to generate

play18:05

highquality realistic images of yourself

play18:08

or anyone else in just one click it

play18:10

works on desktop as well as on your

play18:12

phone you don't need to install any apps

play18:14

or anything it just works straight from

play18:15

your internet browser simply select the

play18:18

template and then upload your photo and

play18:21

then click create it's as easy as

play18:24

that and look how realistic the results

play18:27

are there's many templates for you to

play18:29

choose from and more to come so check it

play18:31

out at

play18:33

up.app another thing is if you notice

play18:36

from the clips I played in the last

play18:37

video she talks and Giggles a

play18:41

lot oh Rocky that's quite a statement

play18:45

piece I I mean you you'll definitely

play18:48

stand out now a few folks in the

play18:50

comments mentioned that for most men

play18:52

hearing a female giggle at you and

play18:55

giving you positive vibes may very well

play18:57

be attractive or seem seductive now

play19:00

human psychology is actually very easy

play19:02

to manipulate so it wouldn't be a

play19:04

surprise if we soon have humans getting

play19:07

very addicted and attached to this AI

play19:09

voice friend if we have a companion

play19:12

who's always available 24/7 who never

play19:15

argues with you she's always supportive

play19:17

she can give you advice and you can ask

play19:19

her anything anytime she even giggles at

play19:22

the lamest things you say then well

play19:24

she's perfect now compare this with a

play19:26

human partner who often argues with you

play19:29

you need to spend a lot of time and

play19:31

money on them they don't giggle at your

play19:33

lame jokes I mean will people even want

play19:35

to date humans anymore and then same

play19:37

thing with friends do we even need

play19:39

friends anymore when we have this

play19:41

perfect companion which we can talk to

play19:43

all day now of course I only say this

play19:45

half jokingly there's obviously value to

play19:48

real human interaction that you don't

play19:51

get from talking to an AI so no I don't

play19:53

think human friends or Partners will be

play19:55

obsolete but I think it's safe to say

play19:57

that relationship ships will change

play19:59

significantly from this release but on

play20:02

the note of chatting another great use

play20:04

case for this voice assistant is therapy

play20:07

counseling and Senior Care in fact it's

play20:09

already scientifically proven that AI

play20:12

can beat 100% of human psychologists on

play20:15

a test of social intelligence so if

play20:18

you're currently a psychologist a

play20:20

therapist a counselor all I can say is

play20:23

good luck now GPT 40 actually has plenty

play20:27

of other capabilities that they didn't

play20:29

demonstrate on Monday so I'll link to

play20:31

this page in the description below you

play20:33

can scroll down to this section

play20:34

explorations of capabilities here are

play20:37

some examples so as we know we can get

play20:39

it to generate images so here's the

play20:41

input a first-person view of a robot

play20:43

typewriting in the following journal

play20:45

entries here is the text the text is

play20:47

large legible and clear the robot's

play20:49

hands type on the type writer and here's

play20:52

the image you can see that the text is

play20:54

very accurate there are minor errors

play20:56

like it's missing the one here here the

play20:59

I is capitalized in the image same with

play21:02

the K in kind of but other than that

play21:05

this is the most robust text out of all

play21:08

the image generators out there even for

play21:10

stable diffusion 3 which is just

play21:12

released it still sucks at generating

play21:15

text especially long sentences so here's

play21:18

the second input the robot wrote the

play21:20

second entry the page is now taller the

play21:22

page has moved up there are two entries

play21:24

on the sheet and so here's the

play21:26

additional text and you can see it has

play21:28

added this here now again there are a

play21:31

few typos for example this e in every

play21:35

has this accent this I should be an l

play21:38

and then this e also has an accent so

play21:40

it's not perfect also note that this

play21:43

isn't an inpaint feature so notice the

play21:45

hands are missing the typewriter is

play21:47

slightly different as you can see up

play21:49

here so it's generating a new image but

play21:51

it's trying to maintain the consistency

play21:53

so the typewriter like this red bar and

play21:56

this green thing here it's kind of the

play21:57

same as what you see here but note that

play21:59

this is not in painting and then finally

play22:02

the robot wasn't happy and rips the

play22:04

sheet of paper the two halves are still

play22:06

legible and clear as he rips the sheet

play22:09

and you can see here the text is still

play22:11

very legible again just a few minor

play22:13

typos like this T should be an L this e

play22:16

should not have an accent but overall in

play22:18

terms of text generation in an image

play22:21

this is way better than stable defusion

play22:23

and mid Journey it can also produce

play22:25

consistent characters so here is a

play22:28

prompt of a male delivery person with a

play22:31

smile on her face so that's her here and

play22:33

then if you take this image and you

play22:36

attach it as an image in your next

play22:38

prompt this is Sally this is the mail

play22:40

delivery person and then you prompt it

play22:42

with Sally is standing in front of a red

play22:44

door to a house you can see it maintains

play22:46

this character and generates this new

play22:48

image based on your prompt and then the

play22:50

user prompts it further with different

play22:52

scenarios and it's able to maintain the

play22:54

appearance of Sally as you can see here

play22:57

so a lot of versatile things you can do

play22:59

with this here's another example of a

play23:01

consistent character so it has outputed

play23:03

this robot you input this image into

play23:06

your next prompt and you feed it with

play23:08

different scenarios and it's able to

play23:10

maintain this robot character perfectly

play23:12

across all these different

play23:15

scenarios here is another demonstration

play23:18

of image generation with text so here

play23:21

the prompt is neat handwritten

play23:23

Illustrated poem you're feeding it the

play23:25

poem here and this does indeed look

play23:27

handwritten and then here it says

play23:29

elegantly decorated with surrealist

play23:31

Doodles and which is what you see here

play23:34

along the border you can also change it

play23:36

to dark mode and voila and then next the

play23:38

user prompts it to remove the notebook

play23:41

paper lines and it also does this

play23:43

perfectly you can also generate fonts

play23:46

with this for example if you prompt it

play23:48

the letters a b CDE e f displayed in

play23:50

three rows displayed as one would

play23:52

showcase a font in a font book the font

play23:55

combines both futuristic but retro

play23:57

elements a mold stamped font and this is

play24:00

what you get here's another one the

play24:02

prompt for this is an ultra futuristic

play24:04

font that is a signature of the

play24:06

artificial intelligence Revolution so

play24:08

you can generate entire fonts with this

play24:11

AI here's another one this is steampunk

play24:14

an oldfashioned Victorian font that

play24:16

looks ornate and belongs on a steam

play24:18

engine again very impressive you can

play24:20

also turn a realistic photo into a

play24:23

caricature as you can see here another

play24:25

impressive thing is you can get it to

play24:28

render a 3D model so how you would do

play24:31

that is you prompt it with for example a

play24:33

realistic looking 3D rendering of the

play24:35

open AI logo with open AI shown below

play24:37

this is view zero you prompt it five

play24:40

more times so you get view 0 1 2 3 4 5

play24:44

and then you glue all these different

play24:45

angles together to form your 3D model

play24:48

how cool is that this function is great

play24:51

for e-commerce so here's a PNG logo of

play24:54

your brand for example here is a product

play24:57

this is a coaster with no brand branding

play24:58

you can get it to etch your logo onto

play25:01

this product like so just super super

play25:03

impressive for those of you who've been

play25:05

playing around with image generation

play25:07

this is also very impressive so the

play25:10

prompt is an image depicting three cubes

play25:12

stacked on a table top cube is red and

play25:14

has a g on it middle cube is blue has a

play25:16

p bottom cube is green and has a t so

play25:19

here are all the outputs you can see

play25:21

every time it consistently gets the

play25:23

colors and the letters correct well

play25:26

maybe except for this one but if you've

play25:27

been playing around with with stable

play25:28

diffusion and mid Journey both of those

play25:30

tools cannot get this they're just not

play25:33

really good at understanding the context

play25:35

of your text prompts so this is also a

play25:38

breakthrough what I suspect they did

play25:40

here to get this consistency is they

play25:43

probably merged a Transformer model with

play25:45

a diffusion or another image generation

play25:47

model so it can understand context

play25:50

better but of course this is closed so

play25:52

that's only my guess we don't really

play25:53

know the architecture of this and

play25:55

finally I want to end with this so I'm

play25:57

getting a lot of comments in my previous

play25:59

video asking when will it be out will it

play26:00

be wrote out to Canadians or people in

play26:03

the UK or other countries so I'll link

play26:05

to this page in the description below as

play26:07

well but here's what we know so far GPT

play26:09

40 will be available in the chat GPT and

play26:12

API as text and vision model so you can

play26:15

prompt it with text like you would in

play26:17

chat GPT you can also feed it images to

play26:20

analyze so this is the vision model and

play26:22

then chat GPT will continue to have

play26:24

support for Voice via the pre-existing

play26:26

voice mode feature in

play26:29

so for those of you complaining that the

play26:30

voice doesn't sound like the demo

play26:35

videos well that's because they haven't

play26:37

rolled out this new expressive flirty

play26:39

voice yet right now it's still using the

play26:42

pre-existing voice mode and then next

play26:44

they say specifically GPT 40 will be

play26:46

available in the free plus and team tier

play26:50

so you can already use it even if you're

play26:52

not paying for a plus plan and if you

play26:54

scroll down a bit here's what they have

play26:56

to say about the free tier users on the

play26:59

free tier will be defaulted to GPT 40

play27:02

with a limit after they exceed this

play27:04

limit then it would be switched back to

play27:07

3.5 free users also receive limited

play27:10

access to messages using Advanced tools

play27:13

such as all of this so data analysis

play27:15

file uploads browse discovering and

play27:17

using gpts and vision so this is also a

play27:20

major announcement if you're in the free

play27:22

plan previously you cannot use gpts in

play27:25

the GPT store that's only available in

play27:28

the plus plan right now it looks like

play27:30

they are rolling this out to free users

play27:32

as well and then same with vision

play27:34

previously in the free plan you can't

play27:36

upload an image and get it to analyze it

play27:38

right now it looks like they are rolling

play27:40

this feature out to free users as well

play27:42

so anyways that sums up all we know

play27:44

about GPT 4 right now let me know in the

play27:47

comments if you've discovered any other

play27:49

cool things about it let me know how

play27:51

you're going to use it do you think it

play27:52

will be revolutionary and change all

play27:54

these industries that I just mentioned

play27:56

in this video would you prefer talking

play27:57

to to this rather than talking to your

play27:59

human friends or partner let me know in

play28:01

the comments below and if you enjoyed

play28:03

this video remember to like share

play28:05

subscribe and stay tuned for more

play28:06

content thanks for watching and I'll see

play28:08

you in the next one

Rate This

5.0 / 5 (0 votes)

Related Tags
GPT 40AI ModelMultimodalCoding AssistantReal-TimeVoice InteractionImage GenerationText AnalysisAI TechnologyInnovative AI