GPT 4o - Deep Dive Review - AGI? - ChatGPT massive improvements

Olivio Sarikas
18 May 202425:44

Summary

TLDRThe video script explores the capabilities of Chat GPT 4.0, an AI model that excels in multimodal interaction, including voice and image recognition. It demonstrates the AI's ability to engage in emotional conversations, assist with mathematical problems, translate languages in real-time, and analyze visual content. The script also highlights advanced features like creating consistent character images, summarizing lectures, and generating detailed reports from audio recordings. The potential of Chat GPT 4.0 as an educational tool and digital assistant is underscored throughout the presentation, showcasing its potential to revolutionize how we interact with AI.

Takeaways

  • 📈 **Multimodal Capabilities**: Chat GPT 4 O is a multimodal model that can handle various forms of input and output, enhancing interaction through different media.
  • 🆓 **Free Access**: A limited version of Chat GPT 4 is available for free, allowing users to experience its features without any cost.
  • 📱 **Voice Interaction**: The app allows for voice interaction with emotional expressiveness, making conversations more human-like.
  • 📖 **Storytelling**: Users can request stories, like a bedtime story about robots and love, showcasing the model's ability to generate narratives.
  • 🔍 **Translation Services**: Chat GPT can function as a translator between different languages, facilitating real-time communication.
  • 📷 **Camera Integration**: Although not available in the free model, the upcoming integration of camera input will allow the AI to assist with visual information.
  • 🧮 **Math Problem Assistance**: The AI can help solve math problems by providing hints and guidance without revealing the solution, enhancing the learning process.
  • 💻 **Code Understanding**: Chat GPT can analyze and explain code snippets, offering insights into what the code is designed to do.
  • 📊 **Data Visualization**: The AI can interpret and explain data visualizations, such as charts, providing summaries and identifying key trends.
  • 🎨 **Creative Design**: Chat GPT can generate images and designs based on textual descriptions, including consistent character creation and 3D object synthesis.
  • ✍️ **Writing Assistance**: The model can assist in writing tasks, such as creating software or simulating interfaces like Facebook Chat, demonstrating its versatility in content creation.

Q & A

  • What does the 'O' in Chat GPT 4 O stand for?

    -The 'O' in Chat GPT 4 O stands for Omni, indicating that it is a multimodal model capable of handling various forms of input and output.

  • Is Chat GPT 4 O available in the free version?

    -Yes, Chat GPT 4 O is included in a limited capacity in the free version, allowing users to try out some of its features.

  • How can the emotional state of Chat GPT's voice be influenced?

    -The emotional state of Chat GPT's voice can be influenced by user instructions, as demonstrated when the user asked for 'maximal emotion' in the bedtime story.

  • What is the significance of Chat GPT's ability to understand and replicate emotions in voice?

    -The ability to understand and replicate emotions in voice is significant as it helps in conveying information more effectively and makes the interaction feel more human-like, which can be beneficial in various scenarios including live translation and educational support.

  • Can Chat GPT function as a translator between different languages?

    -Yes, Chat GPT can function as a translator, as shown in the script where it was asked to translate between English and Italian in real-time.

  • How does Chat GPT assist with solving math problems without giving away the solution?

    -Chat GPT assists with solving math problems by providing hints and guiding the user through the steps, rather than directly giving away the solution, which is helpful for educational purposes.

  • What is the purpose of the camera feature in Chat GPT that is mentioned in the script?

    -The camera feature in Chat GPT, though not available in the free model, is intended to allow the AI to see and analyze visual information, such as a math problem written on paper, to assist the user further.

  • How does Chat GPT handle code and provide assistance with coding problems?

    -Chat GPT can receive highlighted code from the user and provide a brief explanation or guidance on the code's functionality, as demonstrated with the weather data code example.

  • What is the capability of Chat GPT in terms of image generation and manipulation?

    -Chat GPT has the capability to generate images based on text descriptions and manipulate images to create consistent visual elements, as shown in the examples of character creation and commemorative coin design.

  • Can Chat GPT create software or simulate interfaces like a Facebook chat?

    -Yes, Chat GPT can create software or simulate interfaces, such as a Facebook chat, as shown in the script where it created an HTML file to simulate a Facebook Chat interface.

  • What are some of the advanced capabilities of Chat GPT 4 O mentioned in the script?

    -Some advanced capabilities of Chat GPT 4 O mentioned in the script include multimodal input and output, emotional voice interaction, real-time translation, math problem assistance, code analysis, image generation, and simulation of software interfaces.

Outlines

00:00

🤖 Exploring Chat GPT 4's Multimodal Capabilities

The video begins with an introduction to Chat GPT 4, an Omni model that processes various forms of input and output. It highlights the model's inclusion in a limited capacity within the free version, allowing users to experience its features firsthand. The presenter jumps into a live demonstration, showcasing the ability to converse with Chat GPT not only through text but also with voice. The app's design is noted for its focus on conversation rather than text, with an emphasis on emotional expressiveness in speech. The video also demonstrates live translation between English and Italian, emphasizing the model's potential for real-time language translation and understanding emotional states in communication.

05:01

📈 Interactive Learning with AI: Solving Math Problems

This section of the script delves into the AI's educational capabilities, specifically its ability to assist with solving a math problem without giving away the solution. The presenter writes a linear equation and asks for hints to solve it, demonstrating the AI's understanding of mathematical concepts and its role in facilitating the learning process. The interaction is noted for its human-like qualities, such as the ability to interrupt and the AI's supportive responses, which are crucial for an engaging and effective educational experience.

10:03

🖼️ AI's Artistic Endeavors: Image and Code Analysis

The script moves on to showcase the AI's artistic and analytical skills. It describes the AI's ability to interpret and generate images based on text descriptions, maintain character consistency, and create designs like commemorative coins. Additionally, the AI's capacity to analyze and describe code is highlighted, as well as its potential integration with desktop applications for a seamless user experience.

15:05

🎨 AI's Creative Visualizations and Summarization Skills

This part of the script focuses on the AI's creative visualization capabilities, such as 3D object synthesis from images and generating multi-line renderings of robots. It also touches on the AI's ability to summarize lectures, create posters, and produce meeting notes from audio recordings, showcasing its utility in various professional and creative contexts.

20:07

📊 Data Interpretation and AI-Generated Software

The script discusses the AI's proficiency in interpreting data, such as creating charts from an Excel sheet and providing key insights. It also describes an instance where the AI was asked to write software akin to Microsoft Paint, demonstrating its coding capabilities and the potential to interact with vintage operating systems. Additionally, the AI's ability to create interactive HTML simulations, like a Facebook chat, is highlighted.

25:08

🌟 The Future of AI Interaction and Learning

The final paragraph of the script reflects on the potential of AI to transform teaching and learning through emotional interaction and support. It emphasizes the significance of AI's camera capabilities in understanding and assisting with complex, hard-to-describe concepts. The presenter expresses excitement about the rapid advancements in AI and hopes for the open-source community to harness these capabilities for broader applications.

👋 Closing Thoughts and Encouragement for Engagement

In the closing segment, the presenter invites viewers to share their thoughts in the comments and encourages them to like, subscribe, or watch more content. There's a light-hearted moment suggesting that the end screen offers additional content to explore, reinforcing the interactive nature of the video.

Mindmap

Keywords

💡Multimodal Model

A multimodal model is an AI system capable of processing and understanding multiple forms of input, such as text, images, and voice, and generating various forms of output. In the video, it is central to the theme as it describes the capabilities of Chat GPT 4 O, emphasizing its ability to interact with users through different modes of communication, enhancing the user experience.

💡Emotion in AI

Emotion in AI refers to the ability of an AI system to convey and respond to emotional cues in its interactions. The video highlights this feature by demonstrating how Chat GPT 4 can adjust its emotional tone when telling a story or conversing, which is crucial for making interactions feel more human-like and engaging.

💡Live Translation

Live translation is the real-time conversion of one spoken language into another. The video showcases this capability by describing a scenario where Chat GPT 4 can instantly translate between English and Italian, facilitating communication between speakers of different languages, which is significant for its application in cross-cultural communication.

💡Camera Input

Camera input is the ability of an AI system to receive and interpret visual data captured by a camera. Although not yet available in the free model, the video anticipates that this feature will be part of Chat GPT 4, allowing the AI to assist with visual tasks, such as solving math problems by viewing an equation written on paper.

💡Educational AI

Educational AI refers to the use of AI in educational contexts to enhance learning. The video illustrates this through the example of Chat GPT 4 helping a user solve a math problem by providing hints, rather than giving away the solution. This approach aligns with educational principles of fostering understanding and critical thinking.

💡Human-like Interaction

Human-like interaction is the AI's ability to engage with users in a way that mimics natural human conversation. The video emphasizes this by showing how Chat GPT 4 can respond with human quirks and emotions, making the interaction more relatable and less mechanical, which is vital for user engagement and acceptance of AI.

💡Code Understanding

Code understanding is the AI's capability to interpret and explain code snippets. In the video, Chat GPT 4 demonstrates this by providing a brief description of a shared code's functionality, which involves fetching and processing weather data. This feature is beneficial for developers and users seeking to understand complex code or software behavior.

💡AI-generated Content

AI-generated content refers to material created by AI systems, such as images, text, or videos. The video script mentions various examples of AI-generated content, including images with text, character designs, and commemorative coin designs. This highlights the creative potential of AI and its ability to assist in artistic and design tasks.

💡3D Object Synthesis

3D object synthesis is the process of creating three-dimensional models from two-dimensional images. The video describes how Chat GPT 4 can generate a 3D representation of a rotating object from multiple images, showcasing the AI's advanced capabilities in spatial understanding and visual processing.

💡Lecture Summarization

Lecture summarization is the AI's ability to condense and summarize lengthy lectures or presentations. The video script discusses how Chat GPT 4 can create summaries from a 45-minute video, providing key points and saving time for the user. This feature is particularly useful for educational purposes and efficient information consumption.

💡Poster Creation

Poster creation is the AI's capability to design and generate posters based on provided information or descriptions. The video mentions an example where Chat GPT 4 creates a movie poster using provided photos and design descriptions. This demonstrates the AI's utility in graphic design and its ability to understand and visualize complex concepts.

Highlights

Chat GPT 4 O (Omni) is a multimodal model that can take various forms of input and generate diverse forms of output.

Chat GPT 4, even in a limited capacity, is included in the free version for users to try out.

The model can be used to tell stories with emotional expressiveness, adjustable for maximal emotion.

The app allows for voice interaction and can be used on Android devices.

The app features an interesting design with animated speaking visuals and an emphasis on conversation over text.

Chat GPT can function as a translator between English and Italian in real-time.

The AI can understand and assist with live translation, which could be useful for more than two people and languages.

The AI can help solve math problems by providing hints without giving away the solution, aiding in the learning process.

Chat GPT can understand and analyze images from a camera, although this feature is not yet available in the free model.

The AI can identify and discuss elements of code when provided with a text snippet, offering insights into its functionality.

Chat GPT can analyze and describe data plots, including identifying trends and significant events.

The AI can generate images from text descriptions, including complex scenes and maintaining consistency in character design.

3D object synthesis is possible from multiple 2D images, creating a rotating 3D model or video.

Lecture summarization can condense a 45-minute presentation into key points, potentially with timestamps.

Chat GPT can create meeting notes from audio recordings, identifying speakers and transcribing the conversation.

The AI can generate creative content such as movie posters, software applications, and interactive chat simulations.

The emotional and interactive capabilities of Chat GPT can enhance learning and teaching experiences.

The camera functionality is a significant step forward for the AI, allowing it to see and assist with visual information.

Transcripts

play00:00

hello my friends how are you doing today

play00:02

we are going to check out chat GPT 4 o

play00:06

together the O stands for Omni and this

play00:09

means it's a multimodal model that takes

play00:12

a lot of different forms of input but

play00:15

also generates a lot of different forms

play00:17

of output that's maybe even the most

play00:20

important one and the cool thing about

play00:22

that is that chat GPD 4 in a limited

play00:27

capacity is also included in the free

play00:29

version

play00:30

so you can try out that stuff yourself

play00:33

today let's get started so I'm going to

play00:36

start and jump into the presentation

play00:39

they do here now this is about the

play00:41

ability to talk not just to chat GPT but

play00:45

with chat GT let's listen to this hey

play00:49

chat GT how are you doing I'm doing

play00:51

fantastic thanks for asking how about

play00:54

you pretty good what's up so my friend

play00:56

um Barrett here he's been having trouble

play00:58

sleeping lately and uh I want tell him a

play01:00

bedtime story about robots and love oh a

play01:04

bedtime story about robots and love I

play01:07

you can see on the top right here this

play01:09

is the phone app you can use it also on

play01:12

Android I already tried it now for me

play01:15

the voice wasn't that kind of animated

play01:17

but still a very good voice that sounds

play01:20

very humanlike when you go into the app

play01:23

on the lower right side you have an icon

play01:26

that you can tap on and then it goes

play01:28

into this kind of speech mode now the

play01:30

design is interesting because you see

play01:33

actual animation of the speaking with

play01:35

these four dots I would have prefer that

play01:38

I can actually also see the text but I

play01:40

think they did this kind of design to

play01:42

really pronounce to you that this is

play01:45

about an actual conversation so you're

play01:47

not distracted by the text you can also

play01:50

see here that the voice is very

play01:53

emotional you can actually influence the

play01:55

emotional state let's go on listening

play01:58

gather around Barrett

play02:00

once upon a time in a world not too

play02:03

different from ours there was a robot

play02:05

named bite bite was a curious robot no

play02:08

no no chbt I really want maximal emotion

play02:10

like maximal expressiveness much more

play02:12

than you were doing before understood

play02:16

let's amplify the drama once upon a time

play02:20

in a world not too different from ours

play02:23

there was a robot named bite it's

play02:26

interesting that you can do that it's

play02:27

also funny that it does it not just in

play02:30

the story but also as the chat CH voice

play02:33

outside of the story now the interesting

play02:36

thing here is that this is not just to

play02:38

be goofy but an important thing to

play02:40

remember is that the informational part

play02:45

of text is also the intonation is the

play02:48

voice is the emotion in there so a lot

play02:50

of information actually gets lost in

play02:54

written text and now it's brought back

play02:56

through these kind of emotional

play02:58

Expressions now this can also help in a

play03:01

lot of other situations like for example

play03:04

live translation and here's crazy thing

play03:08

this can actually also do that I would

play03:11

like you to function as a translator I

play03:12

have a friend here who only speaks

play03:14

Italian and I only speak English and uh

play03:17

every time you hear English I want you

play03:18

to translate it to Italian and if you

play03:20

hear Italian I want you to translate it

play03:21

back to English is that good

play03:34

Mike she wonders if Wales could talk

play03:37

what would they tell us um they might

play03:39

ask uh how do we solve linear

play03:48

equations I would be interested if this

play03:50

would work with more than two people and

play03:53

more than two languages because the

play03:55

ability you can see here could also have

play03:57

been done by Google translate I have

play04:00

used that on my recent trip to Thailand

play04:02

to talk with people live because they

play04:05

also can understand which kind of

play04:08

language is spoken now here it would

play04:10

also be interesting for the AI to figure

play04:12

out the emotional state of what you're

play04:14

saying and then try to replicate that

play04:17

however of course this is not just about

play04:20

voice input it can also use the camera

play04:23

now the camera is not a function you can

play04:25

use right now with the free model but it

play04:27

will be part of jet GPT 4 oh today I'd

play04:30

really like your help solving a math

play04:32

problem I'm all ears what math problem

play04:35

can I help you tackle today so I'm going

play04:37

to write down a linear equation on a

play04:38

sheet of paper and I'll show you and

play04:40

then I love your help working me through

play04:41

it but importantly don't tell me the

play04:43

solution just help give me hints along

play04:44

the

play04:45

way got it

play04:48

oh okay I see it h no I didn't show you

play04:52

yet just give me help along the way one

play04:54

second it's kind of interesting that

play04:55

that is happening now the question is is

play04:58

it picking up on the image of the camera

play05:01

when the camera is lying on the table or

play05:04

is it hallucinating something because

play05:06

there is no input information however

play05:09

you will see that it works quite well

play05:12

when the AI actually sees the content

play05:14

you want it to see whoops I got too

play05:17

excited I'm ready when you are what

play05:20

equation did I write down there ah I see

play05:23

it now you wrote down 3x + 1 = 4 yep

play05:29

exactly so what's the first step I

play05:30

should take to try to solve this the

play05:33

first step is to get all the terms with

play05:36

X on one side and the constants on the

play05:38

other side so what do you think we

play05:40

should do with that plus one in that

play05:42

small sampler there actually a lot of

play05:45

interesting things happening so one of

play05:46

the things that's happening here is

play05:48

these kind of human quirks now while

play05:50

they might seem goofy or maybe even a

play05:52

little bit uncanny belly it is

play05:55

interesting to have them because this

play05:57

makes us feel more welcome more

play05:59

integrated into the conversation now if

play06:02

you have ai as a digital assistant this

play06:05

might be a very important element so

play06:07

that you feel an actual connection of

play06:10

course the other thing here is that you

play06:12

can see the AI can clearly understand

play06:15

the math equation what you want from it

play06:17

and also the task of not giving away the

play06:19

solution but just helping you along

play06:21

which of course is a very important

play06:23

function for teaching and helping you

play06:26

understand and the learning process now

play06:28

in that case the example is really

play06:30

simple so everybody in the crowd

play06:32

understands what is going on but imagine

play06:34

this about something you don't

play06:37

understand and actually need help with

play06:39

understanding or where it is so complex

play06:43

or so different that you need an

play06:44

explanation from the eye to actually

play06:47

know what is happening there plus one

play06:50

equals 4 yep exactly so what's the first

play06:53

step I should take to try to solve this

play06:55

now another thing that's happening here

play06:56

is that he can interrupt the AI with his

play06:59

voice which is really useful for me on

play07:01

my version I had to tap the screen which

play07:04

might be a little bit more complicated

play07:06

especially if the screen turns off in a

play07:08

longer conversation and it's also

play07:11

interesting here to see that the AI is

play07:14

following his instructions to not give

play07:16

the complete answer but educating him

play07:19

just on the next step which can be

play07:22

really helpful especially in an

play07:24

educational scenario so chat GPT this is

play07:27

what I ended up with how does this look

play07:30

spoton now you've isolated the term with

play07:33

X on one side and it looks like 3x equal

play07:36

3 what do you think we should do next to

play07:38

solve for x in this example you can also

play07:41

see that you have emotional

play07:43

reinforcement when the eyes supporting

play07:46

you now this is an important part of the

play07:48

learning process so again this is why

play07:51

the emotion but also these kind of human

play07:54

behaviors are important in an AI to not

play07:57

just give you answers not just give you

play08:00

information because for us the

play08:03

subjective emotional interaction is a

play08:05

very important part not just for

play08:08

understanding information better but

play08:10

also to stay focused on the action feel

play08:14

motivated feel supported and because of

play08:16

that the learning process and learning

play08:18

progress is happening much faster and

play08:21

much better so this is actually really

play08:23

positive so I'm going to open the chbt

play08:26

desktop app like miror was talking about

play08:28

before and to give a bit of background

play08:29

what's going on so here we have um a

play08:31

computer and on the screen we have some

play08:33

code and then the chat gbt voice app is

play08:35

on the right so chat gbt will be able to

play08:37

hear me but it can't see anything on the

play08:38

screen so I'm going to highlight the

play08:40

code command C it and then that will

play08:42

send it to chat GPT and then I'm going

play08:44

to talk about the code to chat GPT it's

play08:46

really interesting that he points out

play08:47

that chat GPT can't see anything on the

play08:51

screen because I guess everybody would

play08:53

be super worried about having an AI

play08:55

giving access to your desktop

play08:58

information you can also see that you

play09:01

have these dots here that he clicked on

play09:02

so this is for muting the AI it's

play09:05

interesting that he can highlight the

play09:08

text and then only that text is going to

play09:11

be sent to the AI so the AI understands

play09:14

what is going on what's the coding

play09:16

problem you're dealing with okay I'm

play09:18

going to share with you some code one

play09:19

second sure thing take your

play09:23

time you see the highlighting okay so I

play09:26

just shared some code with you could you

play09:28

give me a really brief one sentence

play09:29

description of what's going on in the

play09:30

code this code fetches daily weather

play09:33

data for a specific location and time

play09:35

period Smooths the temperature data

play09:38

using a rolling average annotates a

play09:40

significant weather event on the

play09:42

resulting plot and then displays the

play09:44

plot with the average minimum and

play09:46

maximum temperatures over the year it's

play09:48

really amazing to get that kind of

play09:50

information and help especially when

play09:52

it's integrated into your desktop now I

play09:55

can really imagine when for example

play09:57

Gemini is integrated into Google Chrome

play10:00

that you have a Flawless experience of

play10:03

getting help with all kinds of things

play10:05

that you're trying to do and have it as

play10:07

an assistant it is also connected to

play10:09

your other applications like the

play10:11

calendar Google keep your email Google

play10:14

Docs and so on but this on its own is a

play10:17

really fantastic ability here one thing

play10:20

that I would like to see here is that an

play10:22

actually Highlights part on the screen

play10:25

that it is talking about of course this

play10:28

can't happen right now because because

play10:29

as he said the AI doesn't actually see

play10:31

the screen he only sent the highlighted

play10:35

text to the AI but even that it is

play10:38

really good to understand that kind of

play10:40

code as a explanation from the

play10:45

AI here you can see that he clicked the

play10:48

screen icon and now there's a live

play10:51

stream off the screen area to chat GPT

play10:55

okay chat GPT I'm sharing with you the

play10:57

plot now I'm wondering if you can give

play10:58

me a really brief one- sentence overview

play11:00

of what you see plot displays smoothed

play11:03

average minimum and maximum temperatures

play11:05

throughout 2018 with a notable

play11:08

annotation marking a big rainfall event

play11:11

in late September which months do you

play11:13

see the hottest temperatures and roughly

play11:14

wet temperatures did those correspond to

play11:16

the hottest temperatures occur around

play11:18

July and August the maximum temperature

play11:22

during these months is roughly between

play11:24

25° and 30° 77° F to 86° F what's going

play11:31

on with the Y AIS is this in Celsius or

play11:35

Fahrenheit the Y AIS is in Celsius the

play11:38

temperatures are labeled accordingly

play11:40

such as average temperature minimum

play11:43

temperature and maximum temperature in

play11:45

this case to be honest I'm kind of

play11:47

curious how the AI knows that the

play11:50

temperature would be in Celsius and not

play11:53

in Fahrenheit it might figure it out

play11:55

from the context of the temperatures

play11:58

being that high or being that below but

play12:00

how does it actually know but again you

play12:02

can see how useful this is to understand

play12:05

information not just to analyze the

play12:07

chart for you in a faster way that maybe

play12:11

you can do it yourself but also

play12:13

explaining charts to you explaining

play12:15

information to you where you might lack

play12:18

the knowledge or experience to actually

play12:22

understand the information that's going

play12:23

on here so basically the AI can Gap this

play12:27

lack of knowledge lack of experience for

play12:29

us and give us the kind of information

play12:32

we can work with so this is kind of

play12:34

other form of translation of information

play12:37

for you and that is really helpful now

play12:40

let's go back to the jet GPT

play12:43

announcement now let's go back to the

play12:45

cat jpt 40 announcement website when you

play12:48

scroll down a little bit you will see

play12:50

here a lot of examples of its

play12:53

capabilities and when you pop down that

play12:56

list there's a lot of stuff in here now

play12:58

the first one is that you put text into

play13:02

an image and when you look closely here

play13:04

you can see that the text is very good

play13:07

very correct in these kind of images

play13:10

very very good work it also follows the

play13:12

print of the POV of the personal

play13:16

perspective so looking at it like you're

play13:19

sitting there down there's also an

play13:21

example where the paper is ripped apart

play13:24

and the text is also kind of ripped

play13:27

apart in that case it doesn't work 100%

play13:30

so some of the things on the right side

play13:32

are not supposed to be there and also

play13:34

you can see the same text in a solid

play13:37

page on the background but still this is

play13:41

absolutely stunning how that works in

play13:43

the next example we see here a character

play13:46

created by Del 3 and now we have

play13:50

consistent images of that character in

play13:53

different actions with the same

play13:56

attributes also this kind of white dot

play13:58

is is also always on the cap the skin

play14:02

color the hair the hair do the hair

play14:04

color the clothing and even the back

play14:07

with the back color all of that is very

play14:10

consistent works very well and also even

play14:12

the story elements you can see here the

play14:14

dog and a little twig here on the right

play14:16

side and then we have the same dock with

play14:18

the twig in the mouth this was one of

play14:21

the biggest struggles with image

play14:22

Creation with a ey and of course it's

play14:24

very very important to create any kind

play14:26

of story with a ey another example that

play14:29

really stands out to me is the

play14:31

commemorative coin design so they give

play14:34

here the logo of chat GPT and it creates

play14:37

a coin and then there is some iteration

play14:40

here about design and this is the final

play14:44

design which I think is absolutely

play14:47

stunning and it's crazy how consistent

play14:51

it is look at these different elements

play14:54

that are created around the logo also

play14:57

the text in here is correct very very

play15:00

nice you can see here we have a brush we

play15:02

have a camera with headphones a brain

play15:05

smiling person have even here a little

play15:08

brain neuron so lots of information here

play15:12

that is really useful for this

play15:14

commemorative coin design amazing here's

play15:18

another thing that's really stunning

play15:19

it's called 3D object synthesis so here

play15:22

you have two input images of a rotating

play15:26

object fairly complex as the low logo of

play15:29

open Ai and then here you have the 3D

play15:34

Reconstruction from six generated images

play15:37

so you can see the rotating logo I'm not

play15:41

quite sure if this is actually creating

play15:43

a 3D model or just a rotation video but

play15:48

regardless of that the result looks

play15:51

really amazing and even if it wouldn't

play15:53

be a 3D model you could use this video

play15:56

information to create a 3D model model

play15:59

from that down here we have a sea lion

play16:02

sitting on this round shape and again

play16:06

the result is absolutely stunning we

play16:09

have here it's sticking very clearly to

play16:12

the shapes of the original sea line it

play16:15

looks very like a SE line everything

play16:18

here Works amazing here we have another

play16:20

example it's pretty amazing this is a

play16:23

multi-line rendering of a robot text so

play16:27

here can basically see this kind of

play16:29

phone conversation with the robot hands

play16:33

and the text actually in these speech

play16:35

bubbles in the right position with the

play16:38

right text even the text overlapped by

play16:41

the fingers that is absolutely stunning

play16:43

and down here we have our keyboard

play16:47

really really amazing another stunning

play16:50

example here is lecture summarization so

play16:53

here we have a 45 minute video and this

play16:58

is creating a summary here of everything

play17:02

that is said in this kind of

play17:04

presentation making actually points for

play17:07

you listing everything out to you that

play17:11

of course save a lot of time on actually

play17:14

watching the full video now one thing I

play17:16

would like to see here is a Time code

play17:18

Behind these points so I actually know

play17:20

where they are in the video so I can

play17:22

quote from the video if I want to and

play17:25

again in this example the question

play17:27

remains if this is actually done from

play17:29

the video or from the audio of the video

play17:31

or it is done from the subtitles that

play17:34

have been created from the video but

play17:37

because jpt understands actual audio

play17:40

input it might actually be from the

play17:43

audio taken from this video another

play17:45

thing I found quite stunning is the

play17:48

poster creation for the movie detective

play17:51

now here two photos are provided of

play17:55

actual people with their names in here

play17:57

with the information that this

play17:58

description of the design of the poster

play18:02

and here you can see these characters

play18:04

this doesn't look 100% good so the faces

play18:06

look a little bit strange but it's

play18:08

actually a poster design which is very

play18:10

good with the title of the film also in

play18:13

here and then there are some

play18:14

readjustments here through prompt and

play18:17

here you can have the final version and

play18:20

even though these kind of phes are a

play18:23

little bit skewed you can actually

play18:25

recognize the original characters you

play18:27

can see the name names up here of the

play18:30

characters the movie title down here you

play18:32

can also see that different fonts have

play18:35

been used for that and also that the

play18:38

characters are actually in different

play18:39

colors separating them from each other

play18:41

adding to the story is expressed now

play18:43

this is not the best the sign of a movie

play18:46

poster I've ever seen but it is a very

play18:49

good start and actually also

play18:51

recognizable characters from these

play18:53

images which is stunning another example

play18:55

we can find here is meeting notes from

play18:58

multiple speakers that is really amazing

play19:01

so you have here an audio recording we

play19:03

can listen to that for a little

play19:06

bit okay good morning here's our first

play19:10

morning morning I'll be your project

play19:12

manager for today this project my name

play19:14

is Mark will be giving this presentation

play19:16

for you to kick the project

play19:19

off it's my that's the agenda for today

play19:22

of course you can see that the audio is

play19:25

not super good there is some background

play19:27

noise there's some noise for the

play19:28

recording device but still the AI could

play19:32

figure out what is said in the summary

play19:35

but then also make a transcription with

play19:38

identifying who is speaking at that

play19:41

moment that is absolutely interesting

play19:43

you can also see here with the names

play19:45

Mark Durk and Mark again so down here we

play19:49

have even savior so there is different

play19:51

characters in The I can understand who

play19:54

these different characters are based on

play19:56

the sound of the voice that's really

play19:58

incredible

play19:59

there's also other things that chat gbd4

play20:02

can do like creating these kind of

play20:04

amazing images now this one has made

play20:07

quite the round on social media because

play20:10

it looks very stunning and actually like

play20:12

a frame from an actual presentation or

play20:15

lecture but this is AI generated it has

play20:18

some very very interesting details in

play20:21

here first of all the text is very good

play20:23

and correct it looks like written on a

play20:25

chalk board it's also overlaid in Parts

play20:29

by the hand and the head of the

play20:31

character you also have these moving

play20:34

chalk boards like you would have inside

play20:37

of a lecture hall or university so

play20:40

that's also very interesting details now

play20:43

there are some small mistakes here for

play20:45

example you can see that where this

play20:48

cleaning line is there is also the line

play20:51

here on the neck this might also be from

play20:53

the microphone so you could excuse that

play20:56

the hand looks a little bit strange but

play20:58

still still it's pretty good so overall

play21:01

this looks stunningly good and

play21:03

surprising that all of the information

play21:05

is in there in a correct way another

play21:07

example from hayon is this where he

play21:11

uploaded a

play21:12

CSV style Excel sheet so that the AI can

play21:18

create from that you can see here it's

play21:20

analyzing it's working really fast and

play21:23

then actually creating these charts from

play21:25

the information in multiple ways to

play21:28

explain what the information means and

play21:31

then also analyzing the information for

play21:35

you as you can see here with these key

play21:38

insights here and you can actually use

play21:40

this as a presentation send it to other

play21:42

people and of course this will help you

play21:45

especially in cases of information where

play21:47

you don't even know how to read or what

play21:50

to do with this information this can

play21:53

help you figure out what the information

play21:55

means but also the next steps to take

play21:59

with that information to help you

play22:01

strategically in another example Sawyer

play22:04

Hood ask chat GPT to write a software

play22:08

for him that is like Microsoft Paint so

play22:12

here you can see he has this kind of

play22:14

exif file he's starting it h it makes

play22:17

this kind of weird motion I think this

play22:19

is what he set up himself and then as

play22:22

you can see he can click he has

play22:24

different functions like a brush you can

play22:27

make rectangles you can make circles

play22:30

with that this is a software written by

play22:33

chat GPT running on his desktop and you

play22:36

can also see that he writes here that he

play22:39

pluged chipd 40 into his windows 9x

play22:44

simulator to create the paint x file for

play22:49

him which is interesting because it

play22:51

means that this can also write code for

play22:55

vintage OS systems another example also

play22:59

here by Sawyer hood is that he had chat

play23:03

GPT create an HTML file that simulates a

play23:07

Facebook Chat as you can see he's

play23:09

opening this up here with an API this is

play23:12

locally hosted interestingly enough of

play23:14

course because it's an HTML file that

play23:17

he's running here and then when it's

play23:19

loading he can interact with this kind

play23:22

of text here now when this is loading up

play23:25

you can see that the AI is starting the

play23:27

conversation and he puts in some text

play23:30

and can then actively chat with the AI

play23:34

through this API in an interface that is

play23:36

designed to look like the Facebook

play23:38

manager did you get a little bit of a

play23:40

Terminator vibe from all of this because

play23:43

I sure did it was a little bit uncanny V

play23:48

uh just because I have never seen

play23:49

anything like that before with this kind

play23:52

of emotional voice and interaction but

play23:55

you can also see how this can massively

play23:58

improve our future as a teacher as a

play24:02

assistant I'm so happy to see that

play24:05

different companies come up with these

play24:07

Solutions at the same time meaning this

play24:10

is moving massively forward in a very

play24:13

rapid movement now I hope of course that

play24:15

the open-source Community can catch up

play24:18

to these abilities they already have

play24:20

amazing models out there and it's just

play24:23

incredible what you can do here speaking

play24:26

to the AI adds a completely new layer

play24:29

especially if it is emotional especially

play24:32

if it's also talking to you in a

play24:35

psychological way where it is supporting

play24:37

you giving you hints pushing you forward

play24:40

to do better to help you learn things

play24:43

like that so all of that is amazing and

play24:46

the camera ability is of course the most

play24:50

important part for me because a lot of

play24:53

information out there a lot of things we

play24:56

try to learn have to do not just with

play24:59

things that we see but more importantly

play25:02

with things that we find very hard to

play25:05

describe so having the AI being able to

play25:08

see what we see and then help us with

play25:10

this information is a massive step

play25:14

forward into the ability to interact

play25:17

with the eii get the eii actually to do

play25:20

what we want it to do let me know what

play25:23

you think about that in the comments

play25:25

thanks for watching leave a like or

play25:27

subscribe to my channel if you you want

play25:28

to see more like that see you soon bye

play25:31

oh you're still here so uh This is the

play25:33

End screen there's other stuff you can

play25:35

watch like this or that's really cool

play25:38

and yeah I hope I see you soon uh leave

play25:41

a like if you haven't yet and well um

play25:43

yeah

Rate This

5.0 / 5 (0 votes)

Related Tags
Chat GPTAI AssistantMultimodal ModelEmotional AIVoice InteractionCamera InputEducational ToolLanguage TranslationImage CreationCode Analysis