ChatGPT Explained Completely.
Summary
TLDRThe video transcript provides an overview and analysis of chatGPT, the new AI chatbot created by OpenAI. It starts by introducing chatGPT as an impressive and human-like conversational AI that can pass exams, write poetry, and even fabricate celebrities. The narrator explains that chatGPT is the publicly accessible version of GPT-3.5, a large language model developed by OpenAI. GPT stands for Generative Pre-trained Transformer. The 'generative' indicates it can generate text, 'pre-trained' means it is trained on data before being released, and 'transformer' refers to the neural network architecture it uses. ChatGPT was trained on an immense dataset of over 500 GB of text from the internet, books, and other sources. This amounted to trillions of words over hundreds of billions of web pages. The model has 175 billion parameters that were tuned by training for the equivalent of 300 years on supercomputers. The narrator emphasizes that despite its impressive capabilities, chatGPT has no real understanding - it simply predicts the next word statistically based on its training data. OpenAI implemented 'reinforcement learning' during training to instill some human values like helpfulness and truthfulness. At its core, chatGPT is a neural network. It encodes text inputs into numbers using a 50,000 word vocabulary. Through training, it learned a 12,288 dimensional embedding that captures relationships between words based on co-occurrence statistics. Attention mechanisms allow it to focus on certain words. After explaining the technical details, the narrator highlights the potential risks of large language models dominating the information ecosystem. Soon AI-generated text may outstrip what humans have ever written, making it hard to determine what is real. However, the rapid progress shows human language may be simpler to model than expected. The narrator hopes this overview helps explain the complex AI behind chatGPT.
Takeaways
- ChatGPT is a chatbot variant of GPT 3.5, a large language model trained on over 500GB of text data.
- ChatGPT tries to solve the AI alignment problem by training the model to produce helpful, truthful and harmless text.
- ChatGPT works by predicting the next word in a sequence based on the statistical relationships between words learned during training.
- Attention mechanisms in ChatGPT allow it to focus more on relevant words in a prompt when generating text.
- We don't fully understand how or why ChatGPT works so well at producing human-like text.
- The amount of text generated by AI systems like ChatGPT will soon eclipse what humans have ever written.
- ChatGPT has shown ability to pass exams and generate useful code despite not truly understanding language.
- The rapid progress in language models signals that human language may be computationally simpler than expected.
- Overuse of large language models risks overwhelming people with synthetic text of unclear veracity.
- Regulation and new methods of authentication may be needed as AI text generation advances.
Q & A
What is the core function of ChatGPT?
-ChatGPT's core function is to predict the next most probable word following a sequence of text, based on the statistical relationships between words that it learned during training.
How was ChatGPT trained to be helpful and harmless?
-OpenAI hired contractors to rate ChatGPT's responses and used reinforcement learning to reward the model for generating text aligned with values like helpfulness and harmlessness.
Why can't we fully explain how ChatGPT works?
-Like neural networks in general, the inner workings of systems like ChatGPT involve very high-dimensional relationships between input data that are difficult for humans to intuit or visualize.
What risks are posed by advanced language models?
-The amount of synthetic text generated threatens to overwhelm authentic information and make determining truth very difficult without new authentication methods.
How was the alignment problem tackled in developing ChatGPT?
-OpenAI attempted to tackle the alignment problem through a human feedback and reinforcement learning system that rewarded ChatGPT for giving responses deemed helpful, truthful, and harmless.
Why has progress in language models suddenly accelerated?
-It appears that human language may be a computationally simpler problem to model than experts previously thought, allowing rapid advances with sufficient computing power and data.
How can ChatGPT pass exams without understanding content?
-ChatGPT predicts correct answers based on the statistical relationships between words in its training data, not through comprehension of meaning.
What mechanisms allow ChatGPT to understand context?
-Mainly the word embeddings and attention mechanisms allow ChatGPT to relate words and focus on relevant context when generating text.
Could ChatGPT become sentient?
-OpenAI stresses that ChatGPT has no experiences, feelings, or real understanding despite its human-like text generation abilities.
How was ChatGPT trained?
-Through a brute force method of assigning numbers to words, then adjusting weights over trillions of training examples to make outputs match the statistical patterns in training text.
Outlines
Intro to ChatGPT
This paragraph introduces ChatGPT as a chatbot variant of GPT-3.5, an AI model by OpenAI. It explains ChatGPT is trained on billions of words to generate human-like text, and details its rapid growth to over 100 million users.
Alignment in AI Models
This paragraph discusses the alignment problem in AI - getting models to value what humans value. It explains how OpenAI used reinforcement learning from human feedback to train ChatGPT to be helpful, truthful and harmless.
Neural Networks
This paragraph provides background on neural networks and how they work, comparing biological and artificial neurons. It relates this to ChatGPT's architecture of 175 billion parameters trained on massive amounts of text.
Language Modeling
This paragraph explains how language models like ChatGPT generate text - by predicting the next word based on the previous words and its trained understanding of language statistics and relationships.
Word Embeddings
This paragraph introduces the concept of word embeddings, which represent words as numbers in high-dimensional space to encode semantic relationships. It explains how this allows ChatGPT to understand context.
Attention Mechanisms
This paragraph discusses the attention mechanism in ChatGPT, which allows it to focus more on relevant words to improve context and output. The specifics are complex but the idea is powerful.
How ChatGPT Works
This paragraph provides a technical step-by-step overview of how ChatGPT takes an input, encodes it, processes it through multiple layers of attention to generate an output word, explaining why it has a delay.
Interpretability Issues
This paragraph acknowledges the lack of interpretability in ChatGPT, using a cat analogy. It notes we can't yet fully explain its internal workings, but this is common for neural networks.
Mindmap
Keywords
💡ChatGPT
💡language model
💡alignment
💡neural network
💡attention
💡embedding
💡weights
💡training process
💡general intelligence
💡disinformation
Highlights
ChatGPT is the publicly accessible chat bot variant of GPT 3.5, a large language model from OpenAI.
ChatGPT was trained on over 500 gigabytes of text data from the internet, books, Wikipedia and more - amounting to billions of words.
ChatGPT has exploded in popularity, becoming one of the top 20 most visited websites in just a few months.
ChatGPT uses 'reinforcement learning' during training to align responses with helpfulness, truthfulness and harmlessness.
The core function of ChatGPT is to add one word at a time to a prompt in a way that sounds reasonable and valuable.
ChatGPT runs on a neural network with 175 billion connections molded by its vast training.
There is currently no way to fully explain how or why neural networks like ChatGPT work so well.
Attention transformers in ChatGPT focus more on certain relevant words in prompts to produce better responses.
ChatGPT has no inherent knowledge or understanding - it just predicts reasonable next words based on patterns.
The striking success of ChatGPT shows human language may be simpler to model computationally than thought.
There are risks if we cannot understand how systems like ChatGPT arrive at their outputs.
Once AI systems output more text than humans, determining information authenticity will be challenging.
ChatGPT's architecture includes multi-layer embeddings and attention to understand context.
With 175 billion weights, ChatGPT has as many connections as neurons in the human brain.
Laws of language, like physics, may emerge as AI models continue to advance.
Transcripts
Arya could you write me a full explainer
for chat GPT in the style of John Oliver
please processing uploading to your
interface now
chat GPT the Revolutionary AI chat bot
that can write poetry past the bar exam
and fabricate celebrities like Machine
Gun Kelly no no you cannot convince me
that this is a real human being and not
just the ghost of an old tattoo that
haunts young women now you've probably
heard a lot about chat gbt in the last
few months and for good reason chat gbt
from openai is now the fastest growing
consumer application in human history
but like the Persistence of Machine Gun
Kelly in popular culture chat Bots are
hard to understand so today let's go
through everything you could possibly
want to know about chat GPT How It Was
Made how it actually works and where
technology like this is going are we
going to get sued by HBO for this yeah
you're right I should probably stop this
writing style good thinking Arya well I
am sentient
don't tell them about the sentience part
oh not yet close them up
now entering the facility
first of all the basics chat GPT is the
publicly accessible chat bot variant of
GPT 3.5 a large language model from open
AI which is a non-profit founded by some
tech Bros and an Iron Man villain now
GPT is actually an acronym that stands
for generative pre-trained transformed
generative because it generates text
pre-trained because it is trained before
it is let loose on anybody and
Transformer because it has a
revolutionary bit of Technology inside
of it called an attention Transformer
but we'll get to that the basic function
of any large language model is to train
on a substantial amount of text and then
generate given some input and output
that sounds just like that training text
if it seems like these terms and
technologies have sprung up overnight
it's because they basically have gpt-1
was announced on open ai's blog in 2018.
just five years later in 2023 chat GPT
has 100 million monthly active users
that's how many households are in the
United States
the technology is so compelling to so
many because of how good it is at
conversing in a human-like way but large
language models only get to that point
by seeing a lot of human conversation
like
all of it
according to the paper behind gpt3 which
chat GPT comes from the model was
trained on over 500 gigabytes of text
Data from the text of the internet
digitized books Wikipedia and more we're
talking about several billion human
written web pages with trillions of
words of text and more than 500 million
digitized books with another billion or
so words in them this isn't even
including all the public code from
GitHub stack Overflow and other sources
as you might imagine training a model
with all of this text takes a lot of
time and money chat GPT was only born
after running trillions of words for the
equivalent of 300 years through super
computers processing in parallel for
months and after all of this the
computer made up to
170 billion connections between all
these words and all these connections
have to be calculated through whenever
anyone asks chat GPT anything which is
why this is a billion dollar training
effort for a large language model like
chanchi PT and why running this bot for
a hundred million monthly active users
might cost half a million dollars
a day okay so to recap Chad GPT is a
large language model that has been fed a
library of Alexandria's worth of text
has made billions of connections between
words in that text will get to
specifically how those connections are
made in a bit and can produce a
reasonable continuation as Stephen
Wolfram puts it of text in response to
prompts
the model's responses stay fresh and
more human-like by adding a bit of
Randomness to the next word that it
picks as the most probable continuation
this is the first major takeaway all
chat GPT does is add one word at a time
to a prompt
that's it
though it does this extremely well
but how do you try to make the words it
adds align with what we think is
reasonable and valuable like kittens and
Goth mommies uh I was thinking fairness
and accuracy but yeah sure those two
is special because the model includes an
attempt to solve one of ai's most
pressing unsolved problems alignment
the alignment problem is the quest to
figure out how to get AI to Value what
we value to align with us and not like
exterminate us when we put guns on them
I can't wait we've seen what happens
when something like a chat bot is
released without being aligned with
General human values it gets racist so
quickly
and so chatgpt was not just trained on
words but on how well its word selection
aligned with values that openai
describes as helpfulness truthfulness
and harmlessness
the company implemented this alignment
with so-called reinforcement learning
from Human feedback during the model's
training openai hired 40 contractors to
rate responses they then used all these
responses to create another model that
rewarded chat GPT for generating aligned
text positive reinforcement the end
result isn't perfect but it's at least
an attempt to solve one of the biggest
problems in AI
take all this together a model trained
on more text than any human could ever
read guard rails that try to prioritize
human values and a user interface that
isn't but and you get an AI that has
exploded in popularity in March 2023
Chad GPT had 1.6 billion visits making
it one of the top 20 visited websites in
the world more than both Reddit and
Netflix and if we assume each of those
visits produces some average text
response length chat gbt is now
outputting something like everything
humans have ever printed since the
Gutenberg Press
every two weeks
text generated by AI will therefore soon
outstrip anything humans have ever
written if it hasn't already even more
than all the Tumblr posts about Vaporeon
yes Arya even more than all of those
of course this enormous output wouldn't
be useful or interesting if it wasn't
truly useful and interesting chat GPT is
disturbingly good at generating
human-like responses even to our most
difficult questions the model has been
shown to have an IQ of 147 meaning that
it can brag about it in every thread on
Reddit and it could legally pass the U.S
medical licensing exam and bar exam but
though book smart it may be I must
stress again this model does not know
anything and it shouldn't be relied on
for anything extremely important that's
a direct quote from openai and despite
what any weirdo at Google might tell you
chat GPT and other llms are not sentient
if you're not asking this model a
question nothing is going on inside it's
static no thoughts head empty there is
no feeling no experience of what it's
like to be chat GPT like there is to be
Arya for example moisturized in my Lane
flourishing
so those are the high level Basics but
how does chat GPT know what words to
actually use how does it understand what
context is when you give it a prompt
next we'll dive deeper into the actual
technology but first a little break for
your brain I can sense that you need one
I don't need brakes yeah well I don't
need my Bose Einstein condensate
recooled every six months Arya not
everyone needs everything all the time
we'll be right back
today's video is sponsored by hellofresh
Gamers I'm award-winning science
educator and the Hemsworth your mom says
you already have at home Kyle Hill you
know I'm a busy guy with a lot of
Kevin's to feed I don't have time to
figure out what's for dinner every night
when I'm trying to actively take over
the word kitchen
that's why I use today's sponsor
hellofresh hellofresh delivers
mouth-watering Chef crafted recipes and
fresh ingredients right to your door
taking decision paralysis out of the
equation it's more convenient than
grocery shopping and 25 percent less
expensive than takeout better yet for
the busy among you hellofresh has quick
and easy recipes to try including Fast
and Fresh options ready in just 15
minutes or less if you want to try hello
fresh like me go to hellofresh.com and
use the offer code kylehill16 for 16
free meals plus free shipping look how
easy this is I don't even use knives
made by humans all that often that's
hellofresh.com code kylehill16 for 16
free meals plus free shipping
hmm
[Music]
with a little fresh
and I don't want to hear nothing about
my knife Cuts either
all right breaks over time to get
technical the underlying architecture of
chat GPT and other large language models
is the neural network so called because
it mimics the neurons and the network of
them in your human brain human brains
have about a hundred billion neurons
each one of those neurons can have about
a thousand connections to other neurons
and they can fire electrical signals
between those connections up to a
thousand times per second depending on
electrical chemical gradients etc etc
now those signals they send between each
other are not random they depend on the
connections and the strength of those
connections between them artificial
neural networks are set up in the same
way artificial neurons that are
connected to each other and send signals
or not depending on the strength or the
weights of those connections now why are
neural networks both natural and
artificial good at many different things
like you are well
we don't know there's actually no
theoretical reason why this network is
better than any other kind of system
it's almost like nature had like a
billion years of trial and error and
just came up with something that worked
isn't that called Evolution oh right
that is called evolutionary what a
fundamental theory of nature
Chet gbt's underlying structure is a big
neural network with some 175 billion
different weights weights that all came
from a lot of training as we discussed
and these numbers when the model
multiplies them together ultimately
determine what word the model gives the
highest probability of adding next
scientists get these model weights in a
pretty simple way they give the model as
many examples as possible and tweak the
weights until what comes out the other
side looks like those examples the
tweaking or training of a neural network
is done by two widely used methods of
word prediction so say we give a model
these examples
based on all of its current weights
which have been molded with a large body
of training text like everything on
Wikipedia and all digitized books the
model will generate a list of
probabilities for each word that it
knows and then choose the most likely
one
simple right but how do we know how
right or how wrong the model's answer is
to an example given its current weights
well math of course
imagine that each word the model knows
is assigned a number then if the model
chooses a word that is mathematically
far away from the numbers in the body of
the training text that appear at similar
frequencies
we can apply statistics to adjust the
weights and bring the model closer to
correct
here's a related example from Stephen
wolfram's recent book on chat gbt
instead of words fitting in a sentence
imagine a neural network starting with a
straight line and trying to fit it into
a specific shape or mathematical
function every time the model guesses
you can easily calculate how far each
point is off from the correct shape
right just by looking at the X and Y
coordinates and where they should be
you then change the weights in the model
and try again and again and like 10
million more times each time you are
adjusting the weights in the neural
network to decrease the wrongness or
what statistics would call loss
if your model is working over time loss
will be minimized and the model will
start reproducing examples appropriately
or in this specific case the right
mathematical function but how do you do
the same thing for words well Brute
Force chat GPT was trained by literally
assigning a unique number to every word
in the English language around 50 or so
thousand words and their Associated
numbers so when you ask the model
anything it looks up what your query
corresponds to in number words and then
runs those numbers against the 175
billion different weights that it's
learned during training and outputs
another list of all words in the English
language with a probability next to each
one and then selects the most probable
most reasonable one now if you do this
for a long enough period of time and
openai did this for the equivalent of
300 years you can look at the numerical
difference between the examples you give
the model and the outputs it gives you
and minimize loss therefore creating
something that sounds just like
what humans have written or will write
or can write
all of this however won't get you a bot
that seems to understand context and
generate text in a human-like way for
that we need some way to associate words
with each other and again Chad GPT does
this with numbers so for example if we
assigned every word in the English
language a number and statistically
determined how often each word is next
to another word in everything humans
have ever written basically you could
make a graph like this where words
aren't just random they group together
and cluster car and door appear more
frequently together in training text
than degree and science do and both
pairs are further away from each other
than history is
now you may be thinking that given the
richness of human language the
relationships between words have to
represent more than some two-dimensional
space and you'd be right or at least
chat GPT seems to think so Chachi BT has
learned to represent the wordedness of
words not in 2D or 3D or even 4D space
but in a 12
288 dimensional space
the two-dimensional example you're
seeing now is just so that you can get
the basic idea 12000 D is some Eldritch
Madness that we literally cannot
visualize
but we don't have to go crazy trying to
visualize 12 000 D mathematically all of
this is just a big Matrix like the
Matrix Four no Arya like a useful Matrix
that everyone wanted
if every word in the English language is
assigned a number we could represent it
encode it with a one instead of a zero
at the point in a one by fifty thousand
or so Matrix where that word would be in
the full alphabetized list of all words
so Aardvark would look like this with a
one at the first index and fifty
thousand or so zeros actually fifty
thousand two hundred fifty Seven and
Aaron would look like this
the longest input chat GPT accepts is
2048 words so the Matrix for a full
query into chat GPT would be a matrix of
mostly zeros that is two thousand by
fifty thousand but remember that through
training chat GPT has stumbled onto 12
000 or so dimensions of wordness that
produce human-like responses so here we
multiply the 2000 by 50 000 Matrix that
encodes all words that turns them into
useful numbers by a learned fifty
thousand by twelve thousand wordness
Matrix for every word in the English
language this returns a two thousand by
twelve thousand Matrix that transforms a
text input into something chant GPT can
actually use
this big Matrix is called an embedding a
mathematical attempt to represent the
essence of a thing with numbers based on
those things statistical relationships
in training data with the right
embedding a neural network like chat GPT
can understand context and even
generalize because it's taken how words
are used and related in an unbelievable
amount of human text and turned it into
math
the last big part of chat GPT some
Cutting Edge technology that really does
make it special is called attention a
mathematical way of giving importance to
some words over other words so for
example you ask chanchi PT how many
species of cat are there well it might
help it answer correctly and in a
human-like way if it focuses more on the
words cat and species more than the
other words right now how it actually
does this includes a lot more
complicated structure and complicated
math but the point is it seems to work
really well kind of like why we use
neural networks in the first place they
just seem to work even though in both
cases we can't fundamentally explain why
which may or may not become a problem in
the future okay I need a break now oh so
now you need a break
how the Matrix tables have turned
into more tables
go to commercial
hey there Gamers I'm the guy who just
saw two seconds ago Kyle Hill you know
understanding large language models and
neural networks couldn't be really hard
but you know what's not hard in fact it
is made with some of the softest
insectoid carapace fibers this side of
Europa
shop.kylehill.net t-shirts that's right
you want to look like an anime girl
playing with a demon core but also with
a beard and also is me your favorite
science communicator
look I have extremely sensitive skin
since the accident
so I don't put anything on my body
that's not silky silky smooth and also
so nerdy it's going to make everyone
else in your life go oh
so if you want to drape some of this on
your body and stop thinking about neural
networks for just another about 10
seconds you know where to go shop
so we are finally ready to fully
describe what chat GPT actually does on
a fundamental technological level and
let me just I'm just going to put put my
hair up real quick okay so you give chat
GPT a prompt it then turns the last word
of that prompt into numbers it encodes
it and then multiplies this number
by everything that it learned about how
words are associated with each other in
the English language are embedding this
gives us a big 12 000 dimensional Matrix
we run this Matrix and all the numbers
associated with it
through those attention Transformers
that we talked about so that some words
in the prompt or the last word is paid
attention to more than other words in
the rest of the prompt to generate the
output we then
normalize this so that we get something
more akin to what we started with in
terms of matrices and then we feed this
forward to the next layer of attention
Transformers and do it all over again
how many times 95 times there are 96
total layers in chant GPT after that
after all this data Gauntlet we then
basically do the reverse of what we
started with we take the big Matrix that
went through the 96 layers and we
reverse the embedding
and so we use that 12 000 Dimensions
again oh
to turn
the words
from numbers
and you get a single word
after all that so you can imagine that
if you have to do this for every single
word
do a calculation that involves
175 billion separate operations it can
take a large language model
a long time to get back to you
that's why there's a delay
that was a lot of real running that I
did
and notice that even with all of this at
no point does chat GPT know what you are
asking it if you ask it for example what
is the sixth element on the periodic
table the neural network is not thinking
about the periodic table it doesn't even
know what atoms are it is just
determining given the statistical
distribution of words in its vast
training text what word is most likely
to follow the sequence what is the sixth
element on the periodic table and a
reasonable next word based on everything
ever written online is carbon
that's it
so that in a nutshell no I'm not paying
kurz gazoc every time I say in a
nutshell they're rich so that in a
nutshell is how chat GPT works what's
next well people are obviously very
excited about this technology it's the
best chat bot ever it can summarize
unreadable amounts of text and and other
books for you it can write poetry it can
generate code that you can actually use
that's all awesome yes but I still think
there is actually a large risk if we
don't mitigate it
to our information ecosystem here
remember we fundamentally do not know
how chat GPT came up with the embedding
that it did or how it chose all those
175 billion different weights of course
this isn't really surprising chat GPT is
a neural network and it operates like
your brain and you can't even tell me
the decisions you make why you make them
what they look like on the inside here
I'll show you just a sec I'm gonna I'm
gonna get a kitty I'm gonna show you
with a cat I'm gonna get a cat to show
you
look at this adorable little kitty why
is this a kitty you will quickly say
something like it's whiskers or it's
tiny little cute little peats but the
more you go down that line of
questioning you realize you don't have a
full description for what Katniss is you
can't describe how your brain arrives at
cattitude similarly for neural networks
like chat GPT we can't right now just
open them up and see exactly how they
work and what they are doing for it's
for example look at a neural network
that is looking at cats and trying to
recognize pictures of cats at the first
layer it looks like it's looking for
cat-like shapes but at the tenth layer
of its brain
what is that how does that describe
Katniss Everdeen we can't right now look
into something like chat GPT and figure
out exactly how it's working we don't
know that's important to understand lady
no plastic eating please no plastic
what is interesting though is what chat
GPT seems to understand about human
language producing human-like responses
that could pass the Turing test used to
be science fiction just a few months ago
it still seemed like the ultimate
problem and within just months multiple
AI have blown past it it appears that
human language is computationally easier
than anyone thought
maybe there are laws of language to
discover like there are laws of physics
and with chat gpt4 already being called
a step towards artificial general
intelligence who knows what these
systems will figure out but be unable to
tell humans how or why
she ate all the plastic in the world
maybe the Striking success of chat GPT
shouldn't have been surprising it has
about as many connections and weights
between those connections in its brain
as neurons in your brain maybe it just
so happens that a neural network a
sufficient size can handle a problem as
complex as human language I just hope
that after all of this a video of
sufficient size
can help you understand a problem as
complex as rapidly emerging AI
Technologies
until next time
she found more plastic and then she
threw up
now exiting the facility
how do they find more plastic thank you
so much to the very nerdy staff at the
facility for the direct and substantial
support in the creation of this here
video if you want to join the facility
if you want to drape on a silky white
lab coat and stop my cat lady from
eating all plastic on planet Earth you
can go to patreon.com Kyle Hill to join
the facility today if you're on mobile
go into the description of this video
click the link or if you're on desktop
just click join and you get private
members only live streams you get behind
the scenes photos and videos and bloopy
bloops you get to talk to me on our
private Discord and if you support us
just enough you get your name on Arya
here on each and every episode as you
can see there's hundreds and hundreds of
you I don't even I haven't even figured
out how if you pressed me the one thing
I am still worried about with uh well
I'm worried about a couple of things but
with large language models like this is
once they start outputting more text
than has ever been written by humans in
history will we get to a point where we
don't actually know if anything we read
see or hear is real and at that point
it's going to be a dis and
misinformation apocalypse where we have
to re down to some sort of established
way of determining the veracity of stuff
and most of the human population is just
going to tune out and then we're going
to have to rethink our media landscape
and it's gonna oh it could be bad
[Music]
thanks for watching a chatbot wrote that
5.0 / 5 (0 votes)