GPT-4o Deep Dive & Hidden Abilities you should know about
Summary
TLDRThe video discusses the groundbreaking AI model GPT-40, released by OpenAI, which excels in multimodal tasks including real-time voice assistance, coding, chess puzzles, and image generation. GPT-40's capabilities are showcased through demos, highlighting its efficiency and expressiveness, which surpass traditional AI models. The video also speculates on the potential impact of GPT-40 on various industries, such as customer service, tutoring, and therapy, suggesting a transformative effect on human interaction and professional roles.
Takeaways
- 🚀 OpenAI has released GPT-40, a revolutionary AI model that excels in various tasks including recreating Pokémon games, solving chess puzzles, and tackling math Olympiad problems.
- 🔊 GPT-40 is a multimodal model, capable of processing text, audio, and images in a single neural network, offering more efficiency and expressiveness compared to traditional text-to-speech or speech-to-text approaches.
- 🏆 In blind tests on the LMI platform, GPT-40 outperforms all other AI models significantly, showcasing its dominance in the AI field.
- 💻 GPT-40 demonstrates real-time coding assistance, interpreting and responding to code snippets and plot outputs, which could disrupt traditional coding assistants and platforms.
- 🎲 The model's proficiency in solving chess puzzles is exceptional, with a 50.1% success rate, which is more than double that of the previous leading model.
- 🌐 GPT-40's capabilities extend to language learning, potentially impacting language learning apps like Duolingo, as it can teach languages and interact in real-time.
- 🕊️ GPT-40 can emulate a full game of Pokémon Red through a command-line interface, showcasing its ability to recreate complex interactions and decision-making processes.
- 🤖 The model's advancements in tokenization and architecture allow for direct mapping of audio to audio and streaming of videos to a transformer in real time, enhancing its multimodal capabilities.
- 👩🏫 GPT-40's potential applications in therapy, counseling, and senior care are highlighted, as it has been proven to outperform human psychologists in tests of social intelligence.
- 🎨 The model's image generation capabilities are impressive, with the ability to create consistent characters, render 3D models, and generate fonts, which could revolutionize design and e-commerce.
- 📅 GPT-40 will be available in Chat GPT and the API as a text and vision model, with free tier users gaining access to advanced tools such as data analysis and file uploads.
Q & A
What is GPT 40 and why is it considered revolutionary?
-GPT 40 is a new AI model released by Open AI, which is considered revolutionary due to its multimodal capabilities. It can process text, audio, and image data and generate outputs in any of these formats natively, without relying on separate algorithms for each, making it more efficient and expressive compared to traditional AI models.
How does GPT 40 differ from traditional AI voice assistants?
-Traditional AI voice assistants typically involve three separate processes: speech to text, text processing by a language model, and then text to speech. GPT 40, on the other hand, is a single neural network that can handle all these tasks natively, making it faster and more efficient.
What is LMIS and how does it relate to GPT 40?
-LMIS is a platform where users can blind test various AI models by entering prompts and comparing the responses. GPT 40 has been tested on LMIS and has shown to outperform all other AI models, indicating its superior performance.
How does GPT 40 perform in coding tasks?
-GPT 40 has demonstrated exceptional performance in coding tasks, as shown by its ability to solve complex problems and interact with code bases in real time. It can also act as a real-time coding assistant, which is a significant advancement in AI capabilities.
What impact could GPT 40 have on language learning apps?
-Given GPT 40's ability to teach languages and its interactive and expressive nature, it could potentially disrupt the language learning app market. The script mentions Duolingo's stock price dropping after the announcement of GPT 40, suggesting a possible negative impact on existing language learning tools.
How good is GPT 40 at solving chess puzzles?
-GPT 40 is exceptionally good at solving chess puzzles, with a 50.1% success rate, which is more than double the rate of the previous leading model. This demonstrates its advanced problem-solving capabilities.
What are some of the potential use cases for GPT 40's voice assistant features?
-The voice assistant features of GPT 40 could be used for real-time language translation, interactive tutoring, customer service, therapy and counseling, and senior care, among other applications.
How does GPT 40 handle image generation tasks?
-GPT 40 can generate images with high accuracy, including maintaining consistency in characters and objects across different prompts. It can also generate text within images with fewer errors compared to other image generators.
What is the significance of GPT 40's ability to generate fonts and 3D models?
-GPT 40's ability to generate fonts and 3D models signifies a leap in creative AI capabilities. It can understand and create complex visual elements, which could be useful in design, branding, and e-commerce.
When will GPT 40 be available to users, and what are the access limitations?
-GPT 40 will be available in the chat GPT and API as a text and vision model. Free tier users will have access to GPT 40 with a usage limit, after which they will be switched back to the previous model. Advanced tools such as data analysis and file uploads will also have limited access for free users.
Outlines
😲 Revolutionary AI Model GPT 40 Overview
The video script introduces GPT 40, a groundbreaking AI model released by OpenAI, which excels in various tasks including recreating Pokémon games, playing chess, and solving complex math problems. The script explains how GPT 40 differs from traditional AI by being a single multimodal neural network capable of processing text, audio, and images, and generating responses in the same modalities. It also discusses its performance on the LMIs platform, where it outperforms other AI models, and touches on the implications for other tech companies and their models.
💻 Real-Time Coding Assistance with GPT 40
This section of the script demonstrates GPT 40's ability to assist with coding problems in real-time. It showcases a scenario where GPT 40 is used to analyze and describe code for fetching and smoothing weather data, as well as its capability to interact with the code and generate plots. The script also highlights GPT 40's potential impact on coding assistants and the significance of its multimodal capabilities in coding and problem-solving.
🎮 GPT 40's Chess and Pokémon Emulation Skills
The script discusses GPT 40's exceptional performance in solving chess puzzles, providing a benchmark comparison with other models and emphasizing its high solve rate. It also describes GPT 40's capability to emulate the Pokémon Red game through a command line interface, highlighting the AI's advanced text-to-game functionality. The implications for language learning apps and the potential for GPT 40 to disrupt various industries are also mentioned.
🤖 GPT 40's Multimodal Capabilities and Industry Impact
The script delves into the technical aspects of GPT 40's development, including its direct audio-to-audio mapping and video streaming to a transformer. It discusses the potential for synthetic data generation to supplement real-world data for AI training. The video also speculates on partnerships between tech giants and OpenAI, particularly Apple's potential integration of GPT 40 into iPhones. The script further explores GPT 40's role-playing capabilities in customer support scenarios.
🧠 GPT 40's Implications for Education and Therapy
This part of the script examines GPT 40's potential to revolutionize tutoring, therapy, and counseling by demonstrating its ability to solve complex math problems and interact in a human-like manner. It raises questions about the future of human educators, psychologists, and counselors in the face of such advanced AI capabilities. The script also touches on the social implications of AI companionship and its potential to change human relationships.
🖼️ GPT 40's Advanced Image and Text Generation
The script highlights GPT 40's advanced image generation capabilities, including creating consistent characters, rendering 3D models, and generating fonts. It compares these capabilities with other tools like stable diffusion and mid-journey, noting GPT 40's superior performance in text accuracy and context understanding within images. The potential applications for e-commerce and branding are also discussed.
🌐 GPT 40's Availability and Future Prospects
The final section of the script addresses the availability of GPT 40, detailing its integration into chat GPT and the API with support for text and vision models. It outlines the features available to free and team tier users and speculates on the future updates to the voice mode. The script concludes by inviting viewers to share their thoughts on GPT 40's potential impact and its role in various industries.
Mindmap
Keywords
💡GPT 40
💡Multimodal
💡LLMs (Large Language Models)
💡LMI (Language Model International)
💡Real-time Voice Assistant
💡Coding Assistance
💡Chess Puzzles
💡Synthetic Data
💡Image Generation
💡3D Model Rendering
💡Expressive Voice
Highlights
Open AI released GPT 40, a revolutionary AI model with hidden capabilities such as recreating Pokémon games and solving complex math problems.
GPT 40 is a true multimodal model, capable of processing text, audio, and image data, and outputting in the same formats natively.
GPT 40's efficiency comes from a single neural network handling multiple modalities, unlike traditional approaches using separate processes.
GPT 40 outperforms all other AI models in blind tests, including coding and chess puzzle-solving capabilities.
GPT 40 demonstrated real-time coding assistance, interpreting and responding to code snippets and plot outputs.
The model's performance in coding benchmarks is significantly higher than previous models, indicating a leap in AI coding capabilities.
GPT 40 can act as a customer support agent, handling customer service scenarios with natural language understanding.
The model's text-to-speech capabilities have led to a drop in stock value for language learning apps, indicating potential industry disruption.
GPT 40 can emulate the entire game of Pokémon Red through command line interface, showcasing its ability to recreate game experiences.
GPT 40's image generation capabilities are more advanced than current tools, accurately rendering text and maintaining character consistency.
The model can generate fonts and 3D models from textual prompts, expanding its creative and practical applications.
GPT 40's ability to maintain character consistency in image generation could revolutionize e-commerce branding.
The model's performance in solving International Math Olympiad problems exceeds 99% of humans, demonstrating exceptional mathematical intelligence.
GPT 40's voice assistant capabilities are more expressive and engaging than traditional chatbots, with potential to change human interaction.
The model's potential applications in therapy, counseling, and senior care could disrupt traditional psychological services.
GPT 40's release could significantly impact customer service, education, and personal relationships through its advanced AI capabilities.
GPT 40 will be available to free and paid users, with advanced features like image analysis and data tools being accessible to all.
Transcripts
on Monday open AI released this
revolutionary model called GPT 40 which
is by far the best AI model out there I
showed you a few demo Clips in this
video so check it out if you haven't
already but gbt 40 actually has a lot
more hidden capabilities it can recreate
an entire Pokémon game it's surprisingly
good at chess it can solve an
international math Olympiad problem in
just one prompt so this video will be a
deep dive on GPT 40 I'm going to go over
how it works how they built it and why
it's so revolutionary now first of all
I've seen some comments on my YouTube
videos saying that this Tech is not
revolutionary it's just text to speech
or speech to text or using stable
diffusion for image generation this is
not true so for example for a real time
AI Voice Assistant traditionally you
would have a voice and there would be a
neuro Network that turns your speech
into text so this algorithm would be
speech to text and then this text would
be fed into a large language model like
gbt which would respond back in text so
this middle step is what you get for
chat Bots like chat PT and then you take
another algorithm to turn this text into
speech so this is called text to speech
so it's going through three separate
processes and this is of course Very
inefficient and very slow that's the
traditional approach now yes you still
could get a real-time Voice Assistant
this way but it's not very expressive
and a lot of information is lost as you
go from step to step
now GPT 40 is a completely different
animal this is true multimodal so they
trained a single neuron Network on text
audio and image data and it can also
output either text audio or image this
is true multimodal and because of this
this likely is a completely new model
trained from scratch and it's natively
multimodal so in other words if you get
this to generate an image it's not
actually calling dolly 3 via an API to
generate the image and similarly if you
want it to respond in a voice it's not
calling any text to speech API the same
neuron Network this GPT 40 just spits
out the audio right there natively and
this is also much faster so somehow
they've made it smaller or a lot more
compute efficient now a while back we've
had this mysterious gpt2 chatbot show up
in this lmis arena for those of you who
aren't familiar with lmis it's basically
a platform where users can blind test
all the AI models out there so for
example the user will enter in a prompt
and it's going to have two different AI
models respond to that prompt and then
the user chooses which one is the winner
and you don't know which chatbot is on
which side so it's a blind test and from
all these blind tests it turns out that
at least before GPT 40 GPT 4 Turbo was
number one and so around 1 to two weeks
ago we've had this new Contender this
mysterious GPT to chatbot appear in the
model listings and so nobody really knew
what that was right now we verified that
this is indeed GPT 40 and if you look at
the overall performance of GPT 40 which
is the leftmost bar here compared to the
performance of all these other AI models
it is by far the best model out there I
mean the other models don't even come
close and you know in the past 2 years
we've seen a lot of these llms being
released like mistol and then mixol and
then Claude 3 llama 3 these were all
just incrementally better than the
previous model but it wasn't anything
significant but here you see with GPT 40
it just blows everything else out of the
water and you know it kind of makes
sense that open AI would release the
best model out there because it seems
that these other competitor companies
they are training their models with GPT
data at least to some degree so for
example mistol which is the French
company which created an open-source llm
the reason why they were able to create
this so fast is because they just used
the responses of GPT as training data
but if you do that you're only going to
get as good as the best model from open
AI you're not going to be able to exceed
it and so that's why even with all these
major announcements from meta with their
llama 3 or anthropic with Claude 3 Opus
or Google with their Gemini 1.5 Pro even
after all these big announcements from
all these big tech companies still the
number one model out there at least
before Monday was open eyes GPT 4 Turbo
and then now we're seeing GPT 40 just
destroy everything else and not only
that here is a second chart showing its
performance in coding and you can see
the previous models are just like
incrementally better nothing impressive
but again GPT 40 just blows everything
out of the water this is super super
impressive and on the note of coding
they also demonstrated a realtime coding
assistant check this out okay and to
give a bit of background of what's going
on so here we have um a Compu and on the
screen we have some code and then the
chat gbt voice app is on the right so
chat gbt will be able to hear me but it
can't see anything on the screen so I'm
going to highlight the code command see
it and then that will send it to chat
GPT and then I'm going to talk about the
code to chat
GPT hey chat
GPT hey there how's it going yeah it's
going really well I was wondering if you
could help me with a coding problem
today of course I'd love to help you out
what's the coding problem you're dealing
with okay I'm going to share with you
some code one
sure thing take your
time okay so I just shared some code
with you could you give me a really
brief one- sentence description of
what's going on in the code this code
fetches daily weather data for a
specific location and time period
Smooths the temperature data using a
rolling average annotates a significant
weather event on the resulting plot and
then displays the plot with the average
minimum and maximum temperatures over
the year awesome thanks so so much and
in the code there's this function Fu
could you describe to me what the plot
would look like if I had versus didn't
have this function sure the function Fu
XY takes in a data frame X containing
temperature data and a window size y for
calculating the rolling mean it Smooths
the average temperature minimum
temperature and maximum temperature
columns by applying a rolling mean over
the specified window with the fu
function the plot would show smoother
temperature lines because the rolling
mean reduces the noise or fluctuations
in the data awesome this makes thanks so
much so yeah I definitely will want to
apply this function and so I'm going to
uh generate the plot and then show it to
you give me one
second absolutely take your time I'm
excited to see the plot when you're
ready yeah so so far chat gbt has only
been able to see the code and now I'm
going to run it and then I'll use the
vision capabilities of the chat PT
desktop app so it can actually see
everything going on on the screen as
well
okay chat GPT I'm sharing with you the
plot now I'm wondering if you can give
me a really brief one- sentence overview
of what you
see the plot displays smoothed average
minimum and maximum temperatures
throughout
2018 with a notable annotation marking a
big rainfall event in late
September fantastic I have another
question for you um which months do you
see the hottest temperature temperes and
roughly what temperatures do those
correspond
to the hottest temperatures occur around
July and August the maximum temperature
during these months is roughly between
25° and 30° 77° fhe to 86° fah what's
going on with the y axis is this in
Celsius or Fahrenheit the Y AIS is in
Celsius the temperatures are labeled
accordingly such as average temperat
minimum temperature and maximum
temperature yeah so as we can see you
know not only can tra GPT help me solve
very easy linear um algebra equations
that I need help with it can also
interact with you know code bases and
see the outputs of plots and everything
like this going on on a computer so
right now we have the best the smartest
AI model out there helping you code in
real time on your desktop so I mean for
all these other coding assistants out
there like GitHub go pilot or GitHub
workspace or Deon I all I can say is May
you rest in peace and it gets crazier so
it turns out that GPT 40 is also
insanely good at solving chess puzzles
so this is a benchmark of how good it is
at solving these chess puzzles and just
to give you some context chess puzzles
are a very challenging problem for most
humans let alone an llm given a textual
description of the entire board in just
a few characters so this is different
from like Google's deep mod mind which
plays an entire game of chess from start
to finish in this scenario it's only
given a particular snapshot of an
existing game in other words it's only
given a textual description of the
entire board in just a few characters
and from this here's a comparison of all
the major models out there right now and
you can see the the adjusted Lo which is
the Benchmark metric 1790 way higher
than GPT 4 Turbo which is 1144 and the
percentage of puzzles solved by GPT 40
is
50.1% that's more than double the
previously leading model gp4 turbo which
is only
22.9% just absolutely insane here's a
quick recap of what the real-time voice
feature can do it can teach anyone any
language hey my friend and I are
learning Spanish and we're wondering if
you could tell us the names of these
objects in
Spanish cool what about what are these
objects in
Spanish the objects you're showing are
and in Spanish nice free choice cool
what about
these those are do poos or do Plumas in
Spanish so what do you think happened
after this video was released well the
language learning app du lingo crashed
around 5% on the day of the announcement
so I mean for all these language
learning apps out there all I can say is
May you rest in peace here's another
crazy thing about it it can emulate the
entire game of Pokémon Red now of course
this is just a command line interface so
it's not generating the entire game
design like you don't have a map and a
character which you can move around but
you're given the options so for example
you can enter in your choice you can
talk to Professor Oak so these are
exactly the options that are found in
the real Pokemon Red game so I'm just
going to fast forward this a bit all
right so you're given three options for
your first Pokémon and the user chooses
Charmander and then his competitor which
is Gary in the real game he chose
Squirtle all right so now they are going
to duel each other so you and your rival
are going to fight each other again this
is just a command line interface so it's
not an actual game design with
Charmander fighting Squirtle but it has
all the same function so you select an
attack and then your opponent selects an
attack and then it goes on and on so it
does run like a real Pokémon game so
this is just super impressive you can
get GP 40 to recreate the Pokémon Red
game anyways I'll link to this tweet in
the description below so you can check
out the full video here's what doctor
Jim fan at Nvidia has to say about this
new GPT 40 there are some really
interesting insights here so technique
wise open AI has figured out a way to
map audio to audio directly as first
class modality and stream videos to a
transformer in real time so at a very
high level this just means as I've
mentioned before everything is trained
and inputed and outputed into this one
neural network so it's natively
multimodal so this requires some new
research on tokenization and
architecture but overall it's a data and
system optimization problem so in terms
of getting highquality data well you can
get a lot of highquality video and audio
data from YouTube podcasts TV series
movies Etc you can also get the AI to
generate synthetic data this is great
because it could in theory generate
unlimited data as long as it's good
quality this data can be used to train
the next generation of AI this is
important because one of the limitations
is we might not have enough real world
data and yes everything on the internet
everything on YouTube it's a lot of data
but it still might not be enough to
train a really smart really capable AI
model and that's why we need to generate
synthetic data to supplement this real
world data and this is very interesting
he said that the latency would not meet
real-time threshold if every video frame
is decompressed into an RGB image so
you're likely going to have a lot of
delay if you take the traditional
approach and you break down the video
into frames of images per second instead
open AI has likely developed their own
neural first streaming video Codec to
transmit the motion Deltas as tokens so
what this means is they probably
developed an algorithm to break down the
video to feed into the neural network
and instead of breaking it down into
frames per second they are only
capturing the changes frame after frame
and this saves a lot of data this is a
lot more efficient and then he also
mentions the assistant is much more
lively and even a bit flirty which I'll
get to in a second and yes it sounds
very much like her which is why I put
her is here in the thumbnail of my video
and then this is also very insightful
whoever wins Apple first wins big time
and I mean this is because apple has a
huge market share in terms of
smartphones and laptops so whoever gets
to partner with apple and integrate
their AI models with apple is going to
win big and just 4 days ago we have
received news that apple is nearing a
deal with open aai to integrate GPT on
the iPhone and you know this is likely
to be the case because Apple's biggest
competitor at least in the smartphone
market is Android which is owned by
Google so Android will very likely be
powered by Google's own Gemini and
they're probably not going to provide
Gemini to Apple so Apple has to partner
with open AI anyways back to GPT 40
here's a demo of one AI acting as a cust
customer and the other AI acting as a
customer support agent hey o this is Joe
I'm going to connect you to act me Telco
now the new iPhone they sent me isn't
working I want you to I want you to get
them to send me a replacement device can
you take care of this for
me you can count on me Joe I Got
This Ring
Ring Hey Joe this is Jamie from Acme
Telco how can I help you out
today hi there I'm calling on behalf of
Joe who recently received a new iPhone
from Acme Telco
but oh got it when did Joe received the
new
iPhone the iPhone was delivered 2 days
ago cool could you share the order
number with me of course it's
10293
849 great thanks is the device
physically damaged in any
way no there's no physical damage it's
just not turning on despite trying all
the basic troubleshooting
steps bummer um would Joe prefer to
return the iPhone to an Acme talo store
or ship it back shipping it back would
be the best option could you please
provide the shipping information and
process to get a
replacement Absolutely I'll email a
shipping label and return instructions
could you provide Joe's email address
for
me sure it's Joe at example.com
awesome all right I've just sent the
email can you check if Joe received
it we'll check right now please hold
sure
thing Hey Joe could you please check
your email to see if the shipping label
and return instructions have
arrived cross yep I got yes I got the
instructions perfect Joe has received
the email thank you for your
help no problem at all anything else you
need help
with that's all for now thanks
again you're welcome have a great day
Joe so takeway of this video if you're
in customer service all I can say is
good luck in my previous video I showed
this clip of the AI tutoring this kid on
how to solve a math problem she was able
to guide the kid extremely well and
ultimately teach him how to solve the
problem and so what does this mean for
tutor centers or Tutors or teachers all
I can say is good luck to you as well
like I said there was this mysterious
gpt2 chatbot that appeared in this lmis
Arena 2 weeks ago we now confirm that
gpt2 chatbot is indeed GPT 40 what's
interesting is that this person Andrew
GA showed that it could solve an
international math Olympiad problem this
is the Olympics for math problems these
are like super comp Le Lex math problems
that only the four best math students in
the USA get to compete in so needless to
say these problems are extremely hard
but GPT 40 was able to get it in one
shot that means he didn't need to prompt
it further it was able to answer the
problem in just one prompt so this thing
already exceeds 99% of humans in math
super smart thanks to the sponsor of
this video you picks if you're feeling
overwhelmed with mid Journey or stable
diffusion you don't want to worry about
prompting or learning all these
different settings well upix has made it
dead easy for you to generate
highquality realistic images of yourself
or anyone else in just one click it
works on desktop as well as on your
phone you don't need to install any apps
or anything it just works straight from
your internet browser simply select the
template and then upload your photo and
then click create it's as easy as
that and look how realistic the results
are there's many templates for you to
choose from and more to come so check it
out at
up.app another thing is if you notice
from the clips I played in the last
video she talks and Giggles a
lot oh Rocky that's quite a statement
piece I I mean you you'll definitely
stand out now a few folks in the
comments mentioned that for most men
hearing a female giggle at you and
giving you positive vibes may very well
be attractive or seem seductive now
human psychology is actually very easy
to manipulate so it wouldn't be a
surprise if we soon have humans getting
very addicted and attached to this AI
voice friend if we have a companion
who's always available 24/7 who never
argues with you she's always supportive
she can give you advice and you can ask
her anything anytime she even giggles at
the lamest things you say then well
she's perfect now compare this with a
human partner who often argues with you
you need to spend a lot of time and
money on them they don't giggle at your
lame jokes I mean will people even want
to date humans anymore and then same
thing with friends do we even need
friends anymore when we have this
perfect companion which we can talk to
all day now of course I only say this
half jokingly there's obviously value to
real human interaction that you don't
get from talking to an AI so no I don't
think human friends or Partners will be
obsolete but I think it's safe to say
that relationship ships will change
significantly from this release but on
the note of chatting another great use
case for this voice assistant is therapy
counseling and Senior Care in fact it's
already scientifically proven that AI
can beat 100% of human psychologists on
a test of social intelligence so if
you're currently a psychologist a
therapist a counselor all I can say is
good luck now GPT 40 actually has plenty
of other capabilities that they didn't
demonstrate on Monday so I'll link to
this page in the description below you
can scroll down to this section
explorations of capabilities here are
some examples so as we know we can get
it to generate images so here's the
input a first-person view of a robot
typewriting in the following journal
entries here is the text the text is
large legible and clear the robot's
hands type on the type writer and here's
the image you can see that the text is
very accurate there are minor errors
like it's missing the one here here the
I is capitalized in the image same with
the K in kind of but other than that
this is the most robust text out of all
the image generators out there even for
stable diffusion 3 which is just
released it still sucks at generating
text especially long sentences so here's
the second input the robot wrote the
second entry the page is now taller the
page has moved up there are two entries
on the sheet and so here's the
additional text and you can see it has
added this here now again there are a
few typos for example this e in every
has this accent this I should be an l
and then this e also has an accent so
it's not perfect also note that this
isn't an inpaint feature so notice the
hands are missing the typewriter is
slightly different as you can see up
here so it's generating a new image but
it's trying to maintain the consistency
so the typewriter like this red bar and
this green thing here it's kind of the
same as what you see here but note that
this is not in painting and then finally
the robot wasn't happy and rips the
sheet of paper the two halves are still
legible and clear as he rips the sheet
and you can see here the text is still
very legible again just a few minor
typos like this T should be an L this e
should not have an accent but overall in
terms of text generation in an image
this is way better than stable defusion
and mid Journey it can also produce
consistent characters so here is a
prompt of a male delivery person with a
smile on her face so that's her here and
then if you take this image and you
attach it as an image in your next
prompt this is Sally this is the mail
delivery person and then you prompt it
with Sally is standing in front of a red
door to a house you can see it maintains
this character and generates this new
image based on your prompt and then the
user prompts it further with different
scenarios and it's able to maintain the
appearance of Sally as you can see here
so a lot of versatile things you can do
with this here's another example of a
consistent character so it has outputed
this robot you input this image into
your next prompt and you feed it with
different scenarios and it's able to
maintain this robot character perfectly
across all these different
scenarios here is another demonstration
of image generation with text so here
the prompt is neat handwritten
Illustrated poem you're feeding it the
poem here and this does indeed look
handwritten and then here it says
elegantly decorated with surrealist
Doodles and which is what you see here
along the border you can also change it
to dark mode and voila and then next the
user prompts it to remove the notebook
paper lines and it also does this
perfectly you can also generate fonts
with this for example if you prompt it
the letters a b CDE e f displayed in
three rows displayed as one would
showcase a font in a font book the font
combines both futuristic but retro
elements a mold stamped font and this is
what you get here's another one the
prompt for this is an ultra futuristic
font that is a signature of the
artificial intelligence Revolution so
you can generate entire fonts with this
AI here's another one this is steampunk
an oldfashioned Victorian font that
looks ornate and belongs on a steam
engine again very impressive you can
also turn a realistic photo into a
caricature as you can see here another
impressive thing is you can get it to
render a 3D model so how you would do
that is you prompt it with for example a
realistic looking 3D rendering of the
open AI logo with open AI shown below
this is view zero you prompt it five
more times so you get view 0 1 2 3 4 5
and then you glue all these different
angles together to form your 3D model
how cool is that this function is great
for e-commerce so here's a PNG logo of
your brand for example here is a product
this is a coaster with no brand branding
you can get it to etch your logo onto
this product like so just super super
impressive for those of you who've been
playing around with image generation
this is also very impressive so the
prompt is an image depicting three cubes
stacked on a table top cube is red and
has a g on it middle cube is blue has a
p bottom cube is green and has a t so
here are all the outputs you can see
every time it consistently gets the
colors and the letters correct well
maybe except for this one but if you've
been playing around with with stable
diffusion and mid Journey both of those
tools cannot get this they're just not
really good at understanding the context
of your text prompts so this is also a
breakthrough what I suspect they did
here to get this consistency is they
probably merged a Transformer model with
a diffusion or another image generation
model so it can understand context
better but of course this is closed so
that's only my guess we don't really
know the architecture of this and
finally I want to end with this so I'm
getting a lot of comments in my previous
video asking when will it be out will it
be wrote out to Canadians or people in
the UK or other countries so I'll link
to this page in the description below as
well but here's what we know so far GPT
40 will be available in the chat GPT and
API as text and vision model so you can
prompt it with text like you would in
chat GPT you can also feed it images to
analyze so this is the vision model and
then chat GPT will continue to have
support for Voice via the pre-existing
voice mode feature in
so for those of you complaining that the
voice doesn't sound like the demo
videos well that's because they haven't
rolled out this new expressive flirty
voice yet right now it's still using the
pre-existing voice mode and then next
they say specifically GPT 40 will be
available in the free plus and team tier
so you can already use it even if you're
not paying for a plus plan and if you
scroll down a bit here's what they have
to say about the free tier users on the
free tier will be defaulted to GPT 40
with a limit after they exceed this
limit then it would be switched back to
3.5 free users also receive limited
access to messages using Advanced tools
such as all of this so data analysis
file uploads browse discovering and
using gpts and vision so this is also a
major announcement if you're in the free
plan previously you cannot use gpts in
the GPT store that's only available in
the plus plan right now it looks like
they are rolling this out to free users
as well and then same with vision
previously in the free plan you can't
upload an image and get it to analyze it
right now it looks like they are rolling
this feature out to free users as well
so anyways that sums up all we know
about GPT 4 right now let me know in the
comments if you've discovered any other
cool things about it let me know how
you're going to use it do you think it
will be revolutionary and change all
these industries that I just mentioned
in this video would you prefer talking
to to this rather than talking to your
human friends or partner let me know in
the comments below and if you enjoyed
this video remember to like share
subscribe and stay tuned for more
content thanks for watching and I'll see
you in the next one
Weitere ähnliche Videos ansehen
5.0 / 5 (0 votes)