Geoffrey Hinton | On working with Ilya, choosing problems, and the power of intuition
Summary
TLDRThe script features an interview with a renowned scientist discussing his journey from England to Carnegie Mellon, his early disappointment with traditional studies of the brain, and his shift to artificial intelligence (AI). He reflects on his experiences and collaborations in AI research, highlighting the development of neural networks and backpropagation. The discussion covers the evolution of AI, the importance of large models and multimodal data, and the potential societal impacts of AI advancements. The scientist emphasizes the importance of intuition, selecting talented students, and the future direction of AI research, including potential risks and ethical considerations.
Takeaways
- 🧠 The importance of environment: Moving from England to Carnegie Mellon, the speaker noticed a stark difference in work culture and environment, with students working late into the night believing their work would shape the future of computer science.
- 🔬 Disappointment in initial studies: The speaker found the study of the brain at Cambridge disappointing because it only covered basic neuron functions, leading him to switch to philosophy and eventually AI.
- 📘 Influential readings: Books by Donald Hebb and John von Neumann were pivotal in shaping the speaker's understanding and interest in how the brain learns and computes.
- 🤖 Collaborations: Key collaborations with Terry Sejnowski and Peter Brown were significant in the speaker's research, providing valuable insights and advancements in neural networks and speech recognition.
- 👨🎓 Notable students: Ilia, a standout student, impressed the speaker with his intuitive understanding of AI and neural networks, leading to impactful collaborations.
- 💡 Scale and innovation: Ilia's belief in the importance of scaling models was initially met with skepticism but proved crucial as larger models demonstrated significant improvements.
- 🧩 Understanding through prediction: The speaker believes that predicting the next symbol in language models forces a deeper understanding of context and reasoning.
- 🌐 Multimodal learning: Integrating multiple forms of data (text, images, video) into AI models will enhance their understanding and reasoning capabilities.
- ⚡ GPU revolution: The transition to using GPUs for training neural networks significantly accelerated AI research and development.
- 🔄 Digital immortality: Digital systems can share weights and knowledge efficiently, unlike human brains, leading to superior learning and knowledge dissemination.
Q & A
What was the speaker's initial impression of the academic environment at Carnegie Mellon compared to England?
-The speaker found the environment at Carnegie Mellon to be very different and refreshing compared to England. Students at Carnegie Mellon were working late into the night because they believed their work was shaping the future of computer science, which was a stark contrast to the pub-going culture after 6:00 PM in England.
Why was the speaker disappointed with his initial studies in physiology and philosophy?
-The speaker was disappointed because his studies in physiology only taught him about how neurons conduct action potentials, which didn't explain how the brain works as a whole. Similarly, philosophy didn't provide insights into how the mind worked, which was his ultimate interest.
What inspired the speaker to pursue AI research?
-The speaker was inspired to pursue AI research after reading books by Donald Hebb and John von Neumann. Hebb's interest in learning connection strengths in neural nets and von Neumann's interest in brain computation intrigued the speaker and led him to Edinburgh to study AI.
How did the speaker's collaboration with Terry Sejnowski come about?
-The speaker's collaboration with Terry Sejnowski began when they interacted frequently despite the distance between Pittsburgh and Baltimore. They would take turns visiting each other's city about once a month to work on Boltzmann machines, sharing a conviction that this was how the brain worked.
What was the significance of the speaker's collaboration with Peter Brown?
-The collaboration with Peter Brown was significant because Peter, a statistician, taught the speaker about speech recognition and hidden Markov models. This collaboration was fruitful, with the speaker feeling that he learned more from Peter than Peter did from him.
How did Ilya Sutskever's initial interaction with the speaker influence their future collaboration?
-Ilya Sutskever's initial interaction with the speaker demonstrated his eagerness and intuition for AI. Despite not understanding the paper on backpropagation initially, Ilya's question about why the gradient wasn't given to a sensible function optimizer showed his deep thinking, which led to a productive collaboration.
What was the speaker's view on the importance of scale in AI models?
-The speaker believed that while new ideas like Transformers helped, the real shift in AI performance was due to the scale of data and computation. He mentioned that they didn't anticipate computers becoming a billion times faster, and with larger scale, models could achieve more without needing as many new ideas.
How does the speaker perceive the process of predicting the next word in language models?
-The speaker believes that predicting the next word in language models is not just a mechanical process. It requires understanding the context, similar to how humans comprehend and generate language, which involves reasoning.
What role does the speaker see for multimodality in the future of AI models?
-The speaker sees multimodality as a significant advancement for AI models. By incorporating images, video, and sound, models will improve in understanding spatial relationships and concepts that are difficult to grasp from language alone.
What was the speaker's intuition about the use of GPUs for training neural networks?
-The speaker's intuition about using GPUs for training neural networks was based on their efficiency in performing matrix multiplications, which are fundamental to neural network computations. This led to significant speed improvements in training times.
How does the speaker view the relationship between language and cognition?
-The speaker views language as a tool for cognition, where symbols are converted into embeddings that interact to predict subsequent symbols. This process of converting and interacting symbols is seen as central to both understanding and generating language.
What is the speaker's perspective on the potential of AI in healthcare?
-The speaker sees AI in healthcare as a promising application, with the potential to significantly increase the availability and quality of medical care. AI could assist or replace doctors, leading to a situation where everyone could have personalized medical attention.
What is the speaker's approach to selecting research problems?
-The speaker selects research problems based on intuition and a sense that a widely accepted idea might be wrong. He looks for opportunities to challenge conventional wisdom with simple demonstrations that can show why the prevailing view may not be accurate.
What does the speaker consider as the most promising direction in AI research today?
-The speaker believes that training large models on multimodal data is a very promising direction. Even if the models are initially used for simple tasks like predicting the next word, the approach has great potential for future development.
What is the speaker's view on the importance of learning algorithms in achieving human-level intelligence?
-The speaker believes that while backpropagation is a fundamentally correct and successful approach for learning, there may be alternative learning algorithms that could also achieve human-level intelligence. However, he acknowledges that backpropagation has proven to be highly effective.
What achievement from the speaker's career is he most proud of?
-The speaker is most proud of the development of the learning algorithm for Boltzmann machines. He considers it elegant, even if it may not be practical, and it was a project he greatly enjoyed working on.
Outlines
🌟 Early Inspirations and AI Explorations
The speaker reminisces about their early experiences at Carnegie Mellon and the inspiring work ethic of students there. They express disappointment with their initial studies in physiology and philosophy, which led them to pursue AI at Edinburgh. The influence of Donald Hebb and John von Neumann on their interest in neural networks and brain function is highlighted. The speaker also shares their early conviction that the brain learns without pre-programmed logical rules, a belief that motivated their research in AI.
🤝 Collaborations and Intuitions in AI Development
This paragraph delves into the speaker's collaborations, particularly with Terry Sejnowski and Peter Brown, and the insights gained from them. The speaker reflects on the importance of collaborations in developing their understanding of AI, the adoption of the term 'hidden layers' from hidden Markov models, and the evolution of their ideas on how the brain might work. The paragraph also touches on the speaker's initial disappointment with a student, Ilia, who later proved to have profound intuitions about AI, challenging conventional approaches to optimization.
🚀 The Advent of Backpropagation and Scaling in AI
The speaker recounts the early days of backpropagation and their work on neural networks. They discuss the evolution of AI from clever ideas to the realization that scale in data and computation is crucial. The introduction of character-level prediction using large datasets like Wikipedia is highlighted, along with the surprising effectiveness of such models. The speaker also addresses the misconception that AI models are merely predicting the next word, arguing that understanding and reasoning are inherent in the process.
🧠 The Intersection of Neuroscience and AI
In this section, the speaker explores the relationship between neuroscience and AI, discussing the brain's learning processes and the inspiration drawn from it for developing AI algorithms. They emphasize the importance of embeddings and backpropagation in creating language models that can generalize and understand context. The speaker also speculates on the potential for AI to develop creativity and reasoning abilities beyond human levels as models grow larger and more complex.
🔄 The Role of Reinforcement Learning and Analogies in AI
The speaker discusses the impact of reinforcement learning, as exemplified by AlphaGo's move 37, and the potential for AI to develop creative solutions within limited domains. They also touch on the concept of 'fast weights' and the brain's ability to adapt quickly, suggesting that AI could benefit from incorporating similar mechanisms. The speaker further elaborates on the potential of multimodal models to enhance understanding and reasoning in AI.
🌐 The Future of AI and Its Impact on Society
In this paragraph, the speaker contemplates the future applications of AI, particularly in healthcare and materials engineering, while also expressing concern about the potential misuse of AI by bad actors. They acknowledge the balance between the positive and negative impacts of AI and the importance of international cooperation in advancing the field responsibly.
🤖 The Evolution of AI and Lessons from Neuroscience
The speaker reflects on the evolution of AI and the lessons learned from neuroscience, emphasizing the importance of multiple timescales in learning and the potential for AI to incorporate 'fast weights' similar to the brain. They also discuss the impact of their understanding of the brain on AI development and the philosophical implications of AI's ability to reason and understand.
💡 The Intuitive Process of Selecting Talent in AI Research
In this section, the speaker shares insights into their intuitive process of selecting talent for AI research, highlighting the importance of recognizing deep insights and creativity in potential collaborators. They discuss the value of having a variety of student types in a lab and trust their gut intuition when identifying promising individuals.
🛠 The Importance of Intuition and Diverse Approaches in AI
The speaker emphasizes the role of intuition in developing good ideas and the importance of not accepting everything they are told. They advocate for having a strong framework for understanding reality and being able to reject information that doesn't fit. The speaker also discusses the potential for a variety of learning algorithms to achieve human-level intelligence.
🏆 Reflecting on Achievements and the Journey of AI Research
In the concluding paragraph, the speaker reflects on their proudest achievement—the development of the learning algorithm for Boltzmann machines—and their current musings, including the question of whether the brain uses backpropagation. They express a lifelong curiosity about the brain's learning capabilities and acknowledge the unexpected positive outcomes of their research journey.
Mindmap
Keywords
💡Intuition
💡Backpropagation
💡Neural Networks
💡AI
💡Embeddings
💡Multimodal Data
💡Boltzmann Machines
💡Cognitive Dissonance
💡Hidden Markov Models
💡Reinforcement Learning
💡Analogies
Highlights
The importance of selecting talent based on intuition and capability, as demonstrated by the story of Ilya's recruitment.
The cultural difference in work ethic and motivation between England and Carnegie Mellon, highlighting the dedication to future advancements in computer science.
Disappointment with traditional education in physiology and philosophy, leading to a switch to AI and the excitement of simulating and testing theories.
Influence of Donald Hebb's work on learning in neural networks and the early interest in how the brain computes differently from traditional computers.
The early conviction that the brain learns without pre-programmed logical rules, and the pursuit to understand neural net modifications for complex tasks.
The collaboration with Terry Sejnowski and the excitement of working on what was believed to be the future of understanding brain function.
Learning from Peter Brown about speech recognition and the origin of the term 'hidden layers' in neural networks.
Ilia's unique intuition for AI and the significant discussion about the use of gradient in optimization, leading to years of contemplation.
The fun and productive collaboration between the speaker and Ilia, including the story of writing an interface for Matlab in a morning.
The evolution of understanding in AI, from the belief in the necessity of new ideas to the realization of the importance of scale in data and computation.
The development and success of character-level prediction using Wikipedia, showcasing the capabilities of neural networks.
The debate on whether AI models are simply predicting the next word or actually understanding and reasoning like humans.
The potential for large language models to discover analogies and be creative beyond human capabilities.
The impact of multimodality on AI models, suggesting improvements in understanding spatial relationships and reasoning.
The philosophical implications of language in cognition and the evolution of views on how symbols are processed in the brain.
The early adoption and advocacy for using GPUs in training neural networks, highlighting the significant increase in computational efficiency.
The exploration of analog computation and the potential for AI to mimic the brain's efficiency in power usage.
The potential for AI in healthcare, emphasizing the vast need and opportunity for improvement in medical services.
Concerns about the misuse of AI by bad actors for harmful purposes, such as killer robots, public opinion manipulation, and mass surveillance.
Reflections on the process of selecting talent in AI research and the importance of intuition and technical strength.
Transcripts
have
you reflected a lot on how to select
Talent or has that mostly been like
intuitive to you Ilia just shows up and
you're like this is a clever guy let's
let's work together or have you thought
a lot about that can we are we recording
should we should we roll This yeah let's
roll this okay we're good yeah
yeah
okay s is working
so I remember when I first got to K
melon from England in England at a
Research Unit it would get to be 6:00
and you'd all go for a drink in the pub
um at Caril melon I remember after I've
been there a few weeks it was Saturday
night I didn't have any friends yet and
I didn't know what to do so I decided
I'd go into the lab and do some
programming because I had a list machine
and you couldn't program it from home so
I went into the lab at about 9:00 on a
Saturday night and it was swarming all
the students were there and they were
all there because what they were working
on was the future they all believed that
what they did next was going to change
the course of computer science and it
was just so different from England and
so that was very refreshing take me back
to the very beginning Jeff at Cambridge
uh trying to understand the brain uh
what was that like it was very
disappointing so I did physiology and in
the summer term they were going to teach
us how the brain worked and it all they
taught us was how neurons conduct action
potentials which is very interesting but
it doesn't tell you how the brain works
so that was extremely disappointing I
switched to philosophy then I thought
maybe they'd tell us how the mind worked
um that was very disappointing I
eventually ended up going to Edinburgh
to do Ai and that was more interesting
at least you could simulate things so
you could test out theories and did you
remember what intrigued you about AI was
it a paper was it any particular person
that exposed you to those ideas I guess
it was a book I read by Donald Hebb that
influenced me a lot um he was very
interested in how you learn the
connection strengths in neural Nets I
also read a book by John Fon noyman
early on um who was very interested in
how the brain computes and how it's
different from normal computers and did
you get that conviction that this ideas
would work out at at that point or what
would was your intuition back at the
Edinburgh days it seemed to me there has
to be a way that the brain
learns and it's clearly not by having
all sorts of things programmed into it
and then using logical rules of
inference that just seemed to me crazy
from the outset um so we had to figure
out how the brain learned to modify
Connections in a neural net so that it
could do complicated things and Fon
Norman believed that churing believed
that so Forman and churing were both
pretty good at logic but they didn't
believe in this logical approach and
what was your split between studying the
ideas from from
neuroscience and just doing what seemed
to be good algorithms for for AI how
much inspiration did you take early on
so I never did that much study of
Neuroscience I was always inspired by
what I'd learned about how the brain
works that there's a bunch of neurons
they perform relatively simple
operations they're nonlinear um but they
collect inputs they wait them and then
they an output that depends on that
weighted input and the question is how
do you change those weights to make the
whole thing do something good it seems
like a fairly simple question what
collaborations do you remember from from
that time the main collaboration I had
at Carnegie melon was with someone who
wasn't at carnegy melon I was
interacting a lot with Terry sinowski
who was in Baltimore at John's Hopkins
and about once a month either he would
drive to Pittsburg or I drive to
Baltimore it's 250 miles away and we
would spend a weekend together working
on boltimore machines that was a
wonderful collaboration we were both
convinced it was how the brain worked
that was the most exciting research I've
ever done and a lot of technical results
came out that were very interesting but
I think it's not how the brain works um
I also had a very good collaboration
with um Peter Brown who was a very good
statistician and he worked on speech
recognition at IBM and then he came as a
more mature student to kind melon just
to get a PhD um but he already knew a
lot he taught me a lot about spee
and he in fact taught me about hidden
Markov models I think I learn more from
him than he learned from me that's the
kind of student you want and when he Tau
me about hidden Markov models I was
doing back propop with hidden layers
only they weren't called hidden layers
then and I decided that name they use in
Hidden Markov models is a great name for
variables that you don't know what
they're up to um and so that's where the
name hidden in neur NS came from me and
P decided that was a great name for the
hidden hidden L and your all Nets um but
I learned a lot from Peter about speech
take us back to when Ilia showed up at
your at your office I was in my office I
probably on a Sunday um and I was
programming I think and there was a
knock on the door not just any knock but
it won't
cutter it's sort of an urgent knock so I
went and answer to the door and this was
this young student there and he said he
was cooking Fries over the summer but
he'd rather be working in my lab and so
I said well why don't you make an
appointment and we'll talk and so Ilia
said how about now and that sort of was
Ila's character so we talked for a bit
and I gave him a paper to read which was
the nature paper on back
propagation and we made another meeting
for a week later and he came back and he
said I didn't understand it and I was
very disappointed I thought he seemed
like a bright guy but it's only the
chain rule it's not that hard to
understand and he said oh no no I
understood that I just don't understand
why you don't give the gradient to a
sensal a sensible function
Optimizer which took us quite a few
years to think about um and it kept on
like that with a he had very good his
raw intuitions about things were always
very good what do you think had enabled
those uh those intuitions for for Ilia I
don't know I think he always thought for
himself he was always interested in AI
from a young age um he's obviously good
at math so but it's very hard to know
and what was that collaboration between
between the two of you like what part
would you play and what part would Ilia
play it was a lot of fun um I remember
one occasion when we were trying to do a
complicated thing with producing maps of
data where I had a kind of mixture model
so you could take the same bunch of
similarities and make two maps so that
in one map Bank could be close to Greed
and in another map Bank could be close
to River um cuz in one map you can't
have it close to both right cuz River
and greed along wayon so we'd have a
mixture maps and we were doing it in mat
lab and this involved a lot of
reorganization of the code to do the
right Matrix multiplies and only got fed
up with that so he came one day and said
um I'm going to write a an interface for
Matlab so I program in this different
language and then I have something that
just converts it into Matlab and I said
no Ilia um that'll take you a month to
do we've got to get on with this project
don't get diverted by that and I said
it's okay I did it this
morning and that's that's quite quite
incredible and throughout those those
years the biggest shift wasn't
necessarily just the the algorithms but
but also the the skill how did you sort
of view that skill uh over over the
years Ilia got that intuition very early
so Ilia was always preaching that um you
just make it bigger and it'll work
better and I always thought that was a
bit of a copout do you going to have to
have new ideas too it turns out I was
basically right new ideas help things
like Transformers helped a lot but it
was really the scale of the data and the
scale of the computation and back then
we had no idea computers would get like
a billion times faster we thought maybe
they' get a 100 times faster we were
trying to do things by coming up with
clever ideas that would have just solved
themselves if we had had bigger scale of
the data and computation in about
2011 Ilia and another graduate student
called James Martins and
had a paper using character level
prediction so we took Wikipedia and we
tried to predict the next HTML character
and that worked remarkably well and we
were always amazed at how well it worked
and that was using a fancy Optimizer on
gpus and we could never quite believe
that it understood anything but it
looked as though it
understood and that just seemed
incredible can you take us through how
are do models trained to predict the
next word and why is it the wrong way of
of thinking about them okay I don't
actually believe it is the wrong way so
in fact I think I made the first
neuronet language model that used
embeddings and back propagation so it's
very simple data just
triples and it was turning each symbol
into an embedding then having the
embeddings interact to predict the
embedding of the next symbol and from
that predic the next symbol and then it
was back propagating through that whole
process to learn these triples and I
showed it could generalize um about 10
years later Yoshua Benji used a very
similar Network and showed it work with
real text and about 10 years after that
linguist started believing in embeddings
it was a slow process the reason I think
it's not just predicting the next symbol
is if you ask well what does it take to
predict the next symbol particularly if
you ask me a question and then the first
word of the answer is the next symbol um
you have to understand the question so I
think by predicting the next
symbol it's very unlike oldfashioned
autocomplete oldfashioned autocomplete
you'd store sort of triples of words and
then if you sort a pair of words you see
how often different words came third and
that way you can predict the next symbol
and that's what most people think auto
complete is like it's no longer at all
like that um to predict the next symbol
you have to understand what's been said
so I think you're forcing it to
understand by making it predict the next
symbol and I think it's understanding in
much the same way we are so a lot of
people will tell you these things aren't
like us um they're just predicting the
next symbol they're not reasoning like
us but actually in order to predict the
next symbol it's have going to have to
do some reasoning and we've seen now
that if you make big ones without
putting in any special stuff to do
reasoning they can already do some
reasoning and I think as you make them
bigger they're going to be able to do
more and more reasoning do you think I'm
doing anything else than predicting the
next symbol right now I think that's how
you're learning I think you're
predicting the next video frame um
you're predicting the next sound um but
I think that's a pretty plausible theory
of how the brain's learning what enables
these models to learn such a wide
variety of of fields what these big
language models are doing is they
looking for common structure and by
finding common structure they can encode
things using the common structure and
that more efficient so let me give you
an example if you ask
gp4 why is a compost heap like an atom
bomb most people can't answer that most
people haven't thought they think atom
bombs and compost heeps are very
different things but gp4 will tell you
well the energy scales are very
different and the time scales are very
different but the thing that's the same
is that when the compost Heep gets
hotter it generates heat faster and when
the atom bomb produces more NE neutrons
it produces more neutrons faster
and so it gets the idea of a chain
reaction and I believe it's understood
they're both forms of chain reaction
it's using that understanding to
compress all that information into its
weights and if it's doing that then it's
going to be doing that for hundreds of
things where we haven't seen the
analogies yet but it has and that's
where you get creativity from from
seeing these analogies between
apparently very different things and so
I think gp4 is going to end up when it
gets bigger being very creative I think
this idea that it's just just
regurgitating what it's learned just
pasing together text it's learned
already that's completely wrong it's
going to be even more creative than
people I think you'd argue that it won't
just repeat the human knowledge we've
developed so far but could also progress
beyond that I think that's something we
haven't quite seen yet we've started
seeing some examples of it but to a to a
large extent we're sort of still at the
current level of of of science what do
you think will enable it to go beyond
that well we've seen that in more
limited context like if you take Alpha
go in that famous competition with Leo
um there was move 37 where Alpha go made
a move that all the experts said must
have been a mistake but actually later
they realized it was a brilliant move um
so that was created within that limited
domain um I think we'll see a lot more
of that as these things get bigger the
difference with alphao as well was that
it was using reinforcement learning that
that subsequently sort of enabled it to
to go beyond the current state so it
started with imitation learning watching
how humans play the game and then it
would through selfplay develop Way
Beyond that do you think that's the
missing component of the I think that
may well be a missing component yes that
the the self-play in Alpha in Alpha go
and Alpha zero are are a large part of
why it could make these creative moves
but I don't think it's entirely
necessary
so there's a little experiment I did a
long time ago where you your training in
neuronet to recognize handwritten digits
I love that example the mest example and
you give it training data where half the
answers are
wrong um and the question is how well
will it
learn and you make half the answers
wrong once and keep them like that so it
can't average away the wrongness by just
seeing the same example but with the
right answer sometimes and the wrong
answer sometimes when it sees that
example half half of the examples when
it sees the example the answer is always
wrong and so the training data has 50%
error but if you train up back
propagation it gets down to 5% error or
less other words from badly labeled data
it can get much better results it can
see that the training data is wrong and
that's how smart students can be smarter
than their advisor and their advisor
tells them all this stuff
and for half of what their advisor tells
them they think no rubbish and they
listen to the other half and then they
end up smarter than the advisor so these
big neural Nets can actually do they can
do much better than their training data
and most people don't realize that so
how how do you expect this models to add
reasoning in into them so I mean one
approach is you add sort of the
heuristics on on top of them which a lot
of the research is doing now where you
have sort of Shan of thought you just
feedback it's reasoning um in into
itself and another way would be in the
model itself as you scale scale scale it
up what's your intuition around that so
my intuition is that as we scale up
these models I get better at reasoning
and if you ask how people work roughly
speaking we have these
intuitions and we can do reasoning and
we use the reasoning to correct our
intuitions of course we use the
intuitions during the reasoning to do
the reasoning but if the conclusion of
the reasoning conflicts with our in
itions we realize the intuitions need to
be changed that's much like in Alpha go
or Alpha zero where you have an
evaluation function um that just looks
at a board and says how good is that for
me but then you do the Monte Cara roll
out and now you get a more accurate idea
and you can revise your evaluation
function so you can train it by getting
it to agree with the results of
reasoning and I think these large
language models have to start doing that
they have to start training their Raw
intuitions about what should come next
by doing reasoning and realizing that's
not right and so that way they can get
more training data than just mimicking
what people did and that's exactly why
alphao could do this creative move 37 it
had much more training data because it
was using reasoning to check out what
the right next move should have been and
what do you think about multimodality so
we spoke about these analogies and often
the analogies are Way Beyond what we
could see it's discovering analogy that
are far beyond humans and at maybe
abstraction levels that we'll never be
able to to to understand now when we
introduce images to that and and video
and sound how do you think that will
change the models and uh how do you
think it will change the analogies that
it will be able to make um I think it'll
change it a lot I think it'll make it
much better at understanding spatial
things for example from language alone
it's quite hard to understand some
spatial things although remarkably gp4
can do that even before it was
multimodal um but when you make it
multimodal if you have it both doing
vision and reaching out and grabbing
things it'll understand object much
better if it can pick them up and turn
them over and so on so although you can
learn an awful lot from language it's
easier to learn if you multimodal and in
fact you then need less language and
there's an awful lot of YouTube video
for predicting the next frame so or
something like that so I think these
multimodule models are clearly going to
take over um you can get more data that
way they need less language so there's
really a philosophical point that you
could learn a very good model from
language alone but it's much easier to
learn it from a multimodal system and
how do you think it will impact the
model's reasoning I think it'll make it
much better at reasoning about space for
example reasoning about what happens if
you pick objects up if you actually try
picking objects up you're going to get
all sorts of training data that's going
to help do you think the human brain
evolved to work well with with language
or do you think language evolved to work
well with the human brain I think the
question of whether language evolved to
work with the brain or the brain evolved
to work with language I think that's a
very good question I think both happened
I used to think we would do a lot of
cognition without needing language at
all um now I've changed my mind a bit so
let me give you three different views of
language um and how it relates to
cognition there's the oldfashioned
symbolic view which is cognition
consists of having strings of symbols in
some kind of cleaned up logical language
where there's no ambiguity and applying
rules of inference and that's what
cognition is it's just these symbolic
manipulations on things that are like
strings of language symbols um so that's
one extreme view an opposite extreme
view is no no once you get inside the
head it's all vectors so symbols come in
you convert those symbols into big
vectors and all the stuff inside's done
with big vectors and then if you want to
produce output you produce symbols again
so there was a point in machine
translation in about
2014 when people were using neural
recurrent neural Nets and words will
keep coming in and that have a hidden
State and they keep accumulating
information in this hidden state so when
they got to the end of a sentence that
have a big hidden Vector that captures
the meaning of that sentence that could
then be used for producing the sentence
in another language that was called a
thought vector and that's a sort of
second view of language you convert the
language into a big Vector that's
nothing like language and that's what
cognition is all about but then there's
a third view which is what I believe now
which is that you take these
symbols and you convert the symbols into
embeddings and you use multiple layers
of that so you get these very rich
embeddings but the embeddings are still
to the symbols in the sense that you've
got a big Vector for this symbol and a
big Vector for that symbol and these
vectors interact to produce the vector
for the symbol for the next word and
that's what understanding is
understanding is knowing how to convert
the symbols into these vectors and
knowing how the elements of the vector
should interact to predict the vector
for the next symbol that's what
understanding is both in these big
language models and in our
brains and that's an example which is
sort of in between you're staying with
the symbols but you're interpreting them
as these big vectors and that's where
all the work is and all the knowledge is
in what vectors you use and how the
elements of those vectors interact not
in symbolic
rules um but it's not saying that you
get away from the symbols all together
it's saying you turn the symbols into
big vectors but you stay with that
surface structure of the symbols and
that's how these models are working and
that's I seem to be a more plausible
model of human thought too you were one
of the first folks to get idea of using
gpus and I know yansen loves you for
that uh back in 2009 you mentioned that
you told yansen that this could be a
quite good idea um for for training
training neural Nets take us back to
that early intuition of of using gpus
for for training neural Nets so actually
I think in about
2006 I had a former graduate student
called Rick zisy who's a very good
computer vision guy and I talked to him
and a meeting and he said you know you
ought to think about using Graphics
processing cards because they're very
good at Matrix multiplies and what
you're doing is basically all matric
multiplies so I thought about that for a
bit and then we learned about these
Tesla systems that had um four gpus in
and initially we just got um gaming gpus
and discovered they made things go 30
times faster and then we bought one of
these Tesla systems with 4 gpus and we
did speech on that and it worked very
well then in 2009 I gave a talk at nips
and I told a thousand machine learning
researches you should all go and buy
Nvidia gpus they're the future you need
them for doing machine learning and I
actually um then sent mail to Nvidia
saying I told a thousand machine
learning researchers to buy your boards
could you give me a free one and they
said no actually they didn't say no they
just didn't reply um but when I told
Jensen this story later on he gave me a
free
one that's uh that's very very good I I
think what's interesting is um as well
is sort of how gpus has evolved
alongside the the field so where where
do you think we we should go go next in
in the in the compute so my last couple
of years at Google I was thinking about
ways of trying to make analog
computation so that instead of using
like a megawatt we could use like 30
Watts like the brain and we could run
these big language models in analog
hardware and I never made it
work and but I started really
appreciating digital computation so if
you're going to use that low power
analog
computation every piece of Hardware is
going to be a bit different and the idea
is the learning is going to make use of
the specific properties of that hardware
and that's what happens with people all
our brains are different um so we can't
then take the weights in your brain and
put them in my brain the hardware is
different the precise properties of the
individual ual neurons are different the
learning used to make has learned to
make use of all that and so we're mortal
in the sense that the weights in my
brain are no good for any other brain
when I die those weights are useless um
we can get information from one to
another rather
inefficiently by I produce sentences and
you figure out how to change your weight
so you would have said the same thing
that's called distillation but that's a
very inefficient way of communicating
knowledge and with digital systems
they're immortal because once you got
some weights you can throw away the
computer just store the weights on a
tape somewhere and now build another
computer put those same weights in and
if it's digital it can compute exactly
the same thing as the other system did
so digital systems can share weights and
that's incredibly much more efficient if
you've got a whole bunch of digital
systems and they each go and do a tiny
bit of
learning and they start with the same
weights they do a tiny bit of learning
and then they share their weights again
um they all know what all the others
learned we can't do that and so they're
far superior to us in being able to
share knowledge a lot of the ideas that
have been deployed in the field are very
old school ideas uh it's the ideas that
have been around the Neuroscience for
forever what do you think is sort of
left to to to apply to the systems that
we develop so one big thing that we
still have to catch up with Neuroscience
on is the time scales for changes so in
nearly all the neural Nets there's a
fast time scale for changing activities
so input comes in the activities the
embedding vectors all change and then
there's a slow time scale which is
changing the weights and that's
long-term learning and you just have
those two time scales in the brain
there's many time scales at which
weights change so for example if I say
an unexpected word like cucumber and now
5 minutes later you put headphones on
there's a lot of noise and there's very
faint words you'll be much better at
recognizing the word cucumber because I
said it 5 minutes ago so where is that
knowledge in the brain and that
knowledge is obviously in temporary
changes to synapsis it's not neurons are
going cucumber cucumber cucumber you
don't have enough neurons for that it's
in temporary changes to the weights and
you can do a lot of things with
temporary weight changes fast what I
call fast weights we don't do that in
these neural models and the reason we
don't do it is because if you have
temporary changes to the weights that
depend on the input data then you can't
process a whole bunch of different cases
at the same time at present we take a
whole bunch of different strings we
stack them stack them together and we
process them all in parallel because
then we can do Matrix Matrix multiplies
which is much more efficient and just
that efficiency is stopping us using
fast weights but the brain clearly uses
fast weights for temporary memory and
there's all sorts of things you can do
that way that we don't do at present I
think that's one of the biggest things
we have to learn I was very hopeful that
things like graph core um if they went
sequential and did just online learning
then they could use fast weights
um but that hasn't worked out yet I
think it'll work out eventually when
people are using conductances for
weights how has knowing how this models
work and knowing how the brain works
impacted the way you you think I think
there's been one big impact which is at
a fairly abstract level which is that
for many
years people were very scornful about
the idea of having a big random neural
net and just giving a lot of training
data and it would learn to do
complicated things if you talk to
statisticians or linguists or most
people in AI they say that's just a pipe
dream there's no way you're going to
learn to really complicated things
without some kind of innate knowledge
without a lot of architectural
restrictions it turns out that's
completely wrong you can take a big
random neural network and you can learn
a whole bunch of stuff just from data um
so the idea that stochastic gradient
descent to adjust the repeatedly adjust
the weights using a gradient that will
learn things and we'll learn big
complicated things that's been validated
by these big models and that's a very
important thing to know about the brain
it doesn't have to have all this innate
structure now obviously it's got a lot
of innate structure but it certainly
doesn't need innate structure for things
that are easily
learned and so the sort of idea coming
from Chomsky that you won't you won't
learn anything complicated like language
unless it's all kind of wired in already
and just matures that idea is now
clearly nonsense I'm sure shumsky would
appreciate you calling his ideas
nonsense well I think actually I think a
lot of chs's political ideas are very
sensible and I'm was struck by how how
come someone with such sensible ideas
about the Middle East could be so wrong
about
Linguistics what do you think would make
these models simulate consciousness of
of humans more effectively but imagine
you had the AI assistant that you've
spoken to in your entire life and
instead of that being you know like chat
today that sort of deletes the memory of
the conversation and you start fresh all
of the time okay it had
self-reflection at some point you you
pass away and you tell that to to the
assistant do you think I me not me
somebody else tells that toist yeah you
would it would be difficult for you to
tell that to the assistant um do you
think that assistant would would feel at
that point yes I think they can have
feelings too so I think just as we have
this inner theater model for perception
we have an inthat model for feelings
they're things that I can experience but
other people can't um
I think that model is equally wrong so I
think suppose I say I feel like punching
Gary on the nose which I often do let's
try and Abstract that away from the idea
of an inner theater what I'm really
saying to you is um if it weren't for
the inhibition coming from my frontal
loes I would perform an action so when
we talk about feelings we really talking
about um actions we would perform if it
weren't for um con straints and that
really that's really what feelings are
the actions we would do if it weren't
for
constraints um so I think you can give
the same kind of explanation for
feelings and there's no reason why these
things can't have feelings in fact in
1973 I saw a robot having an emotion so
in Edinburgh they had a robot with two
grippers like this that could assemble a
toy car if you put the pieces separately
on a piece of green felt um but if you
put them in a pile his vision wasn't
good enough to figure out what was going
on so it put his grip whack and it
knocked them so they were scattered and
then it could put them together if you
saw that in a person you say it was
crossed with the situation because it
didn't understand it so it destroyed
it that's
profound you uh we spoke previously you
described sort of humans and and and and
the llms as analogy machines what do you
think has been the most powerful
analogies that you found throughout your
life oh in throughout my life um woo I
guess probably an a sort of weak analogy
that's influenced me a lot is um the
analogy between religious belief and
between belief in symbol
processing so when I was very young I
was confronted I came from an atheist
family and went to school and was
confronted with religious belief and it
just seemed nonsense to me it still
seems nonsense to me um and when I saw
symbol processing as an explanation how
people worked um I thought it was just
the same
nonsense I don't think it's quite so
much nonsense now because I think
actually we do do symbol processing it's
just we do it by giving these big
embedding vectors to the symbols but we
are actually symbol processing um but
not at all in the way people thought
where you match symbols and the only
thing is symbol has is it's identical to
another symbol or it's not identical
that's the only property a symbol has we
don't do that at all we use the context
to give embedding vectors to symbols and
then use the interactions between the
components of these embedding vectors to
do thinking but there's a very good
researcher at Google called Fernando
Pereira who said yes we do have symbolic
reasoning and the only symbolic we have
is natural language natural language is
a symbolic language and we reason with
it and I believe that now you've done
some of the most meaningful uh research
in the history of of computer science
can you walk us through like how do you
select the right problems to to work on
well first let me correct you me and my
students have done a lot of the most
meaningful things and it's mainly been a
very good collaboration with students
and my ability to select very good
students and that came from the fact
that were very few people doing neural
Nets in the 70s and 80s and 90s and
2000s and so the few people doing your
nets got to pick the very best students
so that was a piece of luck but my way
of selecting problems is
basically well you know when scientists
talk about how they work they have
theories about how they work which
probably don't have much to do with the
truth but my theory is that
I look for something where everybody's
agreed about something and it feels
wrong just there's a slight intuition
there's something wrong about it and
then I work on that and see if I can
elaborate why it is I think it's wrong
and maybe I can make a little demo with
a small computer program that shows that
it doesn't work the way you might expect
so let me take one example um most
people think that if you add noise to a
neural net is going to work worse um if
for example each time you put a training
example through
you make half of the neurons be silent
it'll work worse actually we know it'll
generalize better if you do that
and you can demonstrate that um in a
simple example that's what's nice about
computer simulation you can show you
know this idea you had that adding noise
is going to make it worse and sort of
dropping out half the neurons will make
it work worse which you will in the
short term but if you train it with like
that in the end it'll work better you
can demonstrate that with a small
computer program and then you can think
hard about why that is and how it stops
big elaborate co- adaptations um but
that I think that that's my method of
working find something that sounds
suspicious and work on it and see if you
can give a simple demonstration of why
it's wrong what sounds suspicious to you
now well that we don't use fast weight
sounds suspicious that we only have
these two time scales that's just wrong
that's not at all like the brain um and
in the long run I think we're going to
have to have many more time scans so
that's an example there and if you had
if you had your group of of students
today and they came to you and they said
so the Hamming question that we talked
about previously you know what's the
most important problem in in in your
field what would you suggest that they
take on and work on on next we spoke
about reasoning time scales what would
be sort of the highest priority Problem
that that you'd give them for me right
now it's the same question I've had for
the last like 30 years or so which is
does the brain do back propagation I
believe the brain is getting gradients
if you don't get gradients your learning
is just much worse than if you do get
gradients but how is the brain getting
gradients and is it
somehow implementing some approximate
version of back propagation or is it
some completely different technique
that's a big open question and if I kept
on doing research that's what I would be
doing research on and when you look back
at at your career now you've been right
about so many things but what were you
wrong about that you wish you sort of
spent less time pursuing a certain
direction okay those are two separate
questions one is what were you wrong
about and two do you wish you'd less
spent less time on it I think I was
wrong about Boltz machines and I'm glad
I spent a long time on it there are much
more beautiful theory of how you get
gradients than back propagation back
propagation is just ordinary and
sensible and it's just a chain rule B
machines is clever and it's a very
interesting way to get gradients and I
would love for that to be how the brain
works but I think it isn't did you spend
much time imagining what would happen
post the systems developing as as well
did you have an idea that okay if we
could make these systems work really
well we could you know democratize
education we could make knowledge way
more accessible um we could solve some
tough problems in in in medicine or was
it more to you about understanding the
Brin yes I I sort of feel scientists
ought to be doing things that are going
to help Society but actually that's not
how you do your best research you do
your best research when it's driven by
curiosity you just have to understand
something um much more recently I've
realized these things could do a lot of
harm as well as a lot of good and I've
become much more concerned about the
effects they're going to have on society
but that's not what was motivating me I
just wanted to understand how on Earth
can the brain learn to do things that's
what I want to know and I sort of failed
as a side effect of that failure we got
some nice engineering
but yeah it was a good good good failure
for the world if you take the lens of
the things that could go really right
what what do you think are the most
promising
applications I think Health Care is
clearly uh a big one um with Health Care
there's almost no end to how much Health
Care Society can absorb if you take
someone old they could use five doctors
fulltime um so when AI gets better than
people at doing things um you'd like it
to get better in areas where you could
do with a lot more of that stuff and we
could do with a lot more doctors if
everybody had three doctors of their own
that would be great and we're going to
get to that point um so that's one
reason why Healthcare is good there's
also just a new engineering developing
new materials for example for better
solar panels or for superc conductivity
or for just understanding how the Body
Works um there's going to be huge
impacts there those are all going to be
be good things what I worry about is Bad
actors using them for bad things we've
facilitated people like Putin or Z or
Trump
using AI for Killer Robots or for
manipulating public opinion or for Mass
surveillance and those are all very
worrying things are you ever concerned
that slowing down the field could also
slow down the positives oh absolutely
and I think there's not much chance that
the field will slow down partly because
it's International and if one country
slows down the other countries aren't
going to slow down so there's a race
clearly between China and the US and
neither is going to slow down so yeah I
don't I mean there was this partition
saying we should slow down for six
months I didn't sign it just because I
thought it was never going to happen I
maybe should have signed it because even
though it was never going to happen it
made a political point it's often good
to ask for things you know you can't get
just to make a point um but I didn't
think we're going to slow down and how
do you think that it will impact the AI
research process uh having uh this
assistance so I think it'll make it a
lot more efficient a research will get a
lot more efficient when you've got these
assistants that help you program um but
also help you think through things and
probably help you a lot with equations
too have you reflected much on the
process of selecting Talent has that
been mostly intuitive to you like when
Ilia shows up at the door you feel this
is smart guy let's work together so for
selecting Talent um sometimes you just
know so after talking to Ilia for not
very long he seemed very smart and then
talking him a bit more he clearly was
very smart and had very good intuitions
as well as being good at math so that
was a no-brainer there's another case
where I was at a NPS conference um we
had a poster and I someone came up and
he started asking questions about the
poster and every question he asked was a
sort of deep insight into what we'd done
wrong um and after 5 minutes I offered
him a postto position that guy was David
McKai who was just brilliant and it's
very sad he died but he was it was very
obvious you'd want him um other times
it's not so obvious and one thing I did
learn was that people are different
there's not just one type of good
student um so there's some students who
aren't that creative but are technically
extremely strong and will make anything
work there's other students who aren't
technically strong but are very creative
of course you want the ones who are both
but you don't always get that but I
think actually in the lab you need a
variety of different kinds of graduate
student but I still go with my gut
intuition that sometimes you talk to
somebody and they're just very very they
just get it and those are the ones you
want what do you think is the reason for
some folks having better intuition do
they just have better training data than
than others or how can you develop your
intuition I think it's partly they don't
stand for nonsense so here's a way to
get bad intuitions believe everything
you're told that's fatal you have to be
able to I think here's what some people
do they have a whole framework for
understanding reality and when someone
tells them something they try and sort
of figure out how that fits into their
framework and if it doesn't they just
reject it and that's a very good
strategy um people who try and
incorporate whatever they're told end up
with a framework that's sort of very
fuzzy and sort of can believe everything
and that's useless so I think actually
having a strong view of the world and
trying to manipulate incoming facts to
fit in with your view obviously it can
lead you into deep religious belief and
fatal flaws and so on like my belief in
boltzman machines um but I think that's
the way to go if you got good intuitions
you can trust you should trust them if
you got bad intuitions it doesn't matter
what you do so you might as well trust
them a very very good very good point
when when you look at the the types of
research that's that's that's being done
today do you think we're putting all of
our eggs in one basket and we should
diversify our ideas a bit more in in the
field or do you think this is the most
promising Direction so let's go all in
on it
I think having big models and training
them on multimodal data even if it's
only to predict the next word is such a
promising approach that we should go
pretty much all in on it obviously
there's lots and lots of people doing it
now and there's lots of people doing
apparently crazy things and that's good
um but I think it's fine for like most
of the people to be following this path
because it's working very well do you
think that the learning algorithms
matter that much or is it just a skill
are there basically millions of ways
that we could we could get to human
level in in intelligence or are there
sort of a select few that we need to
discover yes so this issue of whether
particular learning algorithms are very
important or whether there's a great
variety of learning algorithms that'll
do the job I don't know the answer it
seems to me though that back propagation
there's a sense in which it's the
correct thing to do getting the gradient
so that you change a parameter to make
it work better that seems like the right
thing to do and it's been amazingly
successful there may well be other
learning algorithms that are alternative
ways of getting that same gradient or
that are getting the gradient to
something else and that also work
um I think that's all open and a very
interesting issue now about whether
there's other things you can try and
maximize that will give you good systems
and maybe the brain's doing that because
it's
easier but backprop is in a sense the
right thing to do and we know that doing
it works really
well and one last question when when you
look back at your sort of Decades of
research what are you what are you most
proud of is it the students is it the
research what what makes you most proud
of when you look back at at your life's
work the learning algorithm for
boltimore machines so the learning
algorithm for Boltz machines is
beautifully elegant it's maybe hopeless
in practice um but it's the thing I
enjoyed most developing that with Terry
and it's what I'm proudest of um even if
it's
[Music]
wrong what questions do you spend most
of your time thinking about now is it
the um what what should I watch on
Netflix
Ver más vídeos relacionados
Come PENSANO le MACCHINE? Spiegato dallo Scienziato Nello Cristianini
Walsh X Talk - DBA Candidate Amie Gutierrez
INTELLIGENZA ARTIFICIALE spiegata in 30 minuti 🤖
Max Tegmark | On superhuman AI, future architectures, and the meaning of human existence
Brave New Words - Bill Gates & Sal Khan
Understanding Artificial Intelligence and Its Future | Neil Nie | TEDxDeerfield
5.0 / 5 (0 votes)