Geoffrey Hinton | On working with Ilya, choosing problems, and the power of intuition

Sana
20 May 202445:46

Summary

TLDRThe script features an interview with a renowned scientist discussing his journey from England to Carnegie Mellon, his early disappointment with traditional studies of the brain, and his shift to artificial intelligence (AI). He reflects on his experiences and collaborations in AI research, highlighting the development of neural networks and backpropagation. The discussion covers the evolution of AI, the importance of large models and multimodal data, and the potential societal impacts of AI advancements. The scientist emphasizes the importance of intuition, selecting talented students, and the future direction of AI research, including potential risks and ethical considerations.

Takeaways

  • 🧠 The importance of environment: Moving from England to Carnegie Mellon, the speaker noticed a stark difference in work culture and environment, with students working late into the night believing their work would shape the future of computer science.
  • πŸ”¬ Disappointment in initial studies: The speaker found the study of the brain at Cambridge disappointing because it only covered basic neuron functions, leading him to switch to philosophy and eventually AI.
  • πŸ“˜ Influential readings: Books by Donald Hebb and John von Neumann were pivotal in shaping the speaker's understanding and interest in how the brain learns and computes.
  • πŸ€– Collaborations: Key collaborations with Terry Sejnowski and Peter Brown were significant in the speaker's research, providing valuable insights and advancements in neural networks and speech recognition.
  • πŸ‘¨β€πŸŽ“ Notable students: Ilia, a standout student, impressed the speaker with his intuitive understanding of AI and neural networks, leading to impactful collaborations.
  • πŸ’‘ Scale and innovation: Ilia's belief in the importance of scaling models was initially met with skepticism but proved crucial as larger models demonstrated significant improvements.
  • 🧩 Understanding through prediction: The speaker believes that predicting the next symbol in language models forces a deeper understanding of context and reasoning.
  • 🌐 Multimodal learning: Integrating multiple forms of data (text, images, video) into AI models will enhance their understanding and reasoning capabilities.
  • ⚑ GPU revolution: The transition to using GPUs for training neural networks significantly accelerated AI research and development.
  • πŸ”„ Digital immortality: Digital systems can share weights and knowledge efficiently, unlike human brains, leading to superior learning and knowledge dissemination.

Q & A

  • What was the speaker's initial impression of the academic environment at Carnegie Mellon compared to England?

    -The speaker found the environment at Carnegie Mellon to be very different and refreshing compared to England. Students at Carnegie Mellon were working late into the night because they believed their work was shaping the future of computer science, which was a stark contrast to the pub-going culture after 6:00 PM in England.

  • Why was the speaker disappointed with his initial studies in physiology and philosophy?

    -The speaker was disappointed because his studies in physiology only taught him about how neurons conduct action potentials, which didn't explain how the brain works as a whole. Similarly, philosophy didn't provide insights into how the mind worked, which was his ultimate interest.

  • What inspired the speaker to pursue AI research?

    -The speaker was inspired to pursue AI research after reading books by Donald Hebb and John von Neumann. Hebb's interest in learning connection strengths in neural nets and von Neumann's interest in brain computation intrigued the speaker and led him to Edinburgh to study AI.

  • How did the speaker's collaboration with Terry Sejnowski come about?

    -The speaker's collaboration with Terry Sejnowski began when they interacted frequently despite the distance between Pittsburgh and Baltimore. They would take turns visiting each other's city about once a month to work on Boltzmann machines, sharing a conviction that this was how the brain worked.

  • What was the significance of the speaker's collaboration with Peter Brown?

    -The collaboration with Peter Brown was significant because Peter, a statistician, taught the speaker about speech recognition and hidden Markov models. This collaboration was fruitful, with the speaker feeling that he learned more from Peter than Peter did from him.

  • How did Ilya Sutskever's initial interaction with the speaker influence their future collaboration?

    -Ilya Sutskever's initial interaction with the speaker demonstrated his eagerness and intuition for AI. Despite not understanding the paper on backpropagation initially, Ilya's question about why the gradient wasn't given to a sensible function optimizer showed his deep thinking, which led to a productive collaboration.

  • What was the speaker's view on the importance of scale in AI models?

    -The speaker believed that while new ideas like Transformers helped, the real shift in AI performance was due to the scale of data and computation. He mentioned that they didn't anticipate computers becoming a billion times faster, and with larger scale, models could achieve more without needing as many new ideas.

  • How does the speaker perceive the process of predicting the next word in language models?

    -The speaker believes that predicting the next word in language models is not just a mechanical process. It requires understanding the context, similar to how humans comprehend and generate language, which involves reasoning.

  • What role does the speaker see for multimodality in the future of AI models?

    -The speaker sees multimodality as a significant advancement for AI models. By incorporating images, video, and sound, models will improve in understanding spatial relationships and concepts that are difficult to grasp from language alone.

  • What was the speaker's intuition about the use of GPUs for training neural networks?

    -The speaker's intuition about using GPUs for training neural networks was based on their efficiency in performing matrix multiplications, which are fundamental to neural network computations. This led to significant speed improvements in training times.

  • How does the speaker view the relationship between language and cognition?

    -The speaker views language as a tool for cognition, where symbols are converted into embeddings that interact to predict subsequent symbols. This process of converting and interacting symbols is seen as central to both understanding and generating language.

  • What is the speaker's perspective on the potential of AI in healthcare?

    -The speaker sees AI in healthcare as a promising application, with the potential to significantly increase the availability and quality of medical care. AI could assist or replace doctors, leading to a situation where everyone could have personalized medical attention.

  • What is the speaker's approach to selecting research problems?

    -The speaker selects research problems based on intuition and a sense that a widely accepted idea might be wrong. He looks for opportunities to challenge conventional wisdom with simple demonstrations that can show why the prevailing view may not be accurate.

  • What does the speaker consider as the most promising direction in AI research today?

    -The speaker believes that training large models on multimodal data is a very promising direction. Even if the models are initially used for simple tasks like predicting the next word, the approach has great potential for future development.

  • What is the speaker's view on the importance of learning algorithms in achieving human-level intelligence?

    -The speaker believes that while backpropagation is a fundamentally correct and successful approach for learning, there may be alternative learning algorithms that could also achieve human-level intelligence. However, he acknowledges that backpropagation has proven to be highly effective.

  • What achievement from the speaker's career is he most proud of?

    -The speaker is most proud of the development of the learning algorithm for Boltzmann machines. He considers it elegant, even if it may not be practical, and it was a project he greatly enjoyed working on.

Outlines

00:00

🌟 Early Inspirations and AI Explorations

The speaker reminisces about their early experiences at Carnegie Mellon and the inspiring work ethic of students there. They express disappointment with their initial studies in physiology and philosophy, which led them to pursue AI at Edinburgh. The influence of Donald Hebb and John von Neumann on their interest in neural networks and brain function is highlighted. The speaker also shares their early conviction that the brain learns without pre-programmed logical rules, a belief that motivated their research in AI.

05:04

🀝 Collaborations and Intuitions in AI Development

This paragraph delves into the speaker's collaborations, particularly with Terry Sejnowski and Peter Brown, and the insights gained from them. The speaker reflects on the importance of collaborations in developing their understanding of AI, the adoption of the term 'hidden layers' from hidden Markov models, and the evolution of their ideas on how the brain might work. The paragraph also touches on the speaker's initial disappointment with a student, Ilia, who later proved to have profound intuitions about AI, challenging conventional approaches to optimization.

10:06

πŸš€ The Advent of Backpropagation and Scaling in AI

The speaker recounts the early days of backpropagation and their work on neural networks. They discuss the evolution of AI from clever ideas to the realization that scale in data and computation is crucial. The introduction of character-level prediction using large datasets like Wikipedia is highlighted, along with the surprising effectiveness of such models. The speaker also addresses the misconception that AI models are merely predicting the next word, arguing that understanding and reasoning are inherent in the process.

15:07

🧠 The Intersection of Neuroscience and AI

In this section, the speaker explores the relationship between neuroscience and AI, discussing the brain's learning processes and the inspiration drawn from it for developing AI algorithms. They emphasize the importance of embeddings and backpropagation in creating language models that can generalize and understand context. The speaker also speculates on the potential for AI to develop creativity and reasoning abilities beyond human levels as models grow larger and more complex.

20:08

πŸ”„ The Role of Reinforcement Learning and Analogies in AI

The speaker discusses the impact of reinforcement learning, as exemplified by AlphaGo's move 37, and the potential for AI to develop creative solutions within limited domains. They also touch on the concept of 'fast weights' and the brain's ability to adapt quickly, suggesting that AI could benefit from incorporating similar mechanisms. The speaker further elaborates on the potential of multimodal models to enhance understanding and reasoning in AI.

25:10

🌐 The Future of AI and Its Impact on Society

In this paragraph, the speaker contemplates the future applications of AI, particularly in healthcare and materials engineering, while also expressing concern about the potential misuse of AI by bad actors. They acknowledge the balance between the positive and negative impacts of AI and the importance of international cooperation in advancing the field responsibly.

30:11

πŸ€– The Evolution of AI and Lessons from Neuroscience

The speaker reflects on the evolution of AI and the lessons learned from neuroscience, emphasizing the importance of multiple timescales in learning and the potential for AI to incorporate 'fast weights' similar to the brain. They also discuss the impact of their understanding of the brain on AI development and the philosophical implications of AI's ability to reason and understand.

35:11

πŸ’‘ The Intuitive Process of Selecting Talent in AI Research

In this section, the speaker shares insights into their intuitive process of selecting talent for AI research, highlighting the importance of recognizing deep insights and creativity in potential collaborators. They discuss the value of having a variety of student types in a lab and trust their gut intuition when identifying promising individuals.

40:11

πŸ›  The Importance of Intuition and Diverse Approaches in AI

The speaker emphasizes the role of intuition in developing good ideas and the importance of not accepting everything they are told. They advocate for having a strong framework for understanding reality and being able to reject information that doesn't fit. The speaker also discusses the potential for a variety of learning algorithms to achieve human-level intelligence.

45:12

πŸ† Reflecting on Achievements and the Journey of AI Research

In the concluding paragraph, the speaker reflects on their proudest achievementβ€”the development of the learning algorithm for Boltzmann machinesβ€”and their current musings, including the question of whether the brain uses backpropagation. They express a lifelong curiosity about the brain's learning capabilities and acknowledge the unexpected positive outcomes of their research journey.

Mindmap

Keywords

πŸ’‘Intuition

Intuition refers to the ability to understand or know something immediately, without the need for conscious reasoning. In the context of the video, it is highlighted as a key trait in selecting talent, where the speaker mentions that some individuals, like Ilia, have a strong intuitive grasp of complex concepts. The video suggests that intuition can play a significant role in the field of AI, as it often guides researchers towards innovative ideas and solutions.

πŸ’‘Backpropagation

Backpropagation is a method used in artificial neural networks to calculate the gradient of the loss function with respect to the weights. It is central to the training process of many AI models. The script discusses the significance of backpropagation in understanding how neural networks learn and the speculation about its biological counterpart in the human brain, suggesting that it might be a fundamental process for learning in both AI and neural systems.

πŸ’‘Neural Networks

Neural networks are a set of algorithms designed to recognize patterns. They are inspired by the human brain and are composed of interconnected nodes or 'neurons'. The video script discusses the evolution of neural networks, from their initial disappointing results in simulating the brain to their current state as powerful tools in AI, capable of complex tasks like language processing and pattern recognition.

πŸ’‘AI

AI, or artificial intelligence, is the field of study focused on creating machines that can perform tasks that would typically require human intelligence, such as problem-solving, learning, and understanding language. The video script frequently references AI, discussing its development, the influence of individuals in the field, and the philosophical implications of creating intelligent machines.

πŸ’‘Embeddings

In the context of AI, embeddings are vector representations of words or symbols in a reduced dimensional space. They capture semantic meaning and are used in various NLP tasks. The script mentions embeddings as a crucial part of language models, allowing AI to understand and predict text based on the context provided by these vector representations.

πŸ’‘Multimodal Data

Multimodal data refers to information that is derived from multiple different types of input, such as text, images, and sound. The script discusses the potential of training AI models on multimodal data, suggesting that this approach could lead to more robust and versatile AI systems capable of understanding and predicting a wider range of human experiences and interactions.

πŸ’‘Boltzmann Machines

Boltzmann machines are a class of stochastic artificial neural networks capable of learning from data. The script mentions the speaker's past work on Boltzmann machines, highlighting the elegance of their learning algorithm, even though it may not be how the brain actually learns or the most practical approach in AI.

πŸ’‘Cognitive Dissonance

Cognitive dissonance is the mental discomfort experienced by a person who holds two or more contradictory beliefs. In the script, the concept is alluded to when discussing the process of selecting talent and developing intuition, suggesting that individuals who can reconcile conflicting information effectively may have better intuition.

πŸ’‘Hidden Markov Models

Hidden Markov Models (HMMs) are statistical models that represent a set of possible output sequences, each with an associated statistical model, e.g., a sequence of words in speech recognition. The script mentions HMMs as an example of a technical concept that the speaker learned from a student, demonstrating the collaborative and educational nature of scientific research.

πŸ’‘Reinforcement Learning

Reinforcement learning is a type of machine learning where an agent learns to make decisions by performing actions in an environment to achieve a goal. The script refers to AlphaGo, a program that uses reinforcement learning to play the game of Go, as an example of how AI can surpass human levels of play through self-play and learning from its mistakes.

πŸ’‘Analogies

Analogies are comparisons between two things that are different in most respects but alike in some particular way. The script discusses the ability of AI models to make analogies as a sign of advanced understanding and creativity, suggesting that this capacity could lead to AI systems that are capable of original thought and innovation.

Highlights

The importance of selecting talent based on intuition and capability, as demonstrated by the story of Ilya's recruitment.

The cultural difference in work ethic and motivation between England and Carnegie Mellon, highlighting the dedication to future advancements in computer science.

Disappointment with traditional education in physiology and philosophy, leading to a switch to AI and the excitement of simulating and testing theories.

Influence of Donald Hebb's work on learning in neural networks and the early interest in how the brain computes differently from traditional computers.

The early conviction that the brain learns without pre-programmed logical rules, and the pursuit to understand neural net modifications for complex tasks.

The collaboration with Terry Sejnowski and the excitement of working on what was believed to be the future of understanding brain function.

Learning from Peter Brown about speech recognition and the origin of the term 'hidden layers' in neural networks.

Ilia's unique intuition for AI and the significant discussion about the use of gradient in optimization, leading to years of contemplation.

The fun and productive collaboration between the speaker and Ilia, including the story of writing an interface for Matlab in a morning.

The evolution of understanding in AI, from the belief in the necessity of new ideas to the realization of the importance of scale in data and computation.

The development and success of character-level prediction using Wikipedia, showcasing the capabilities of neural networks.

The debate on whether AI models are simply predicting the next word or actually understanding and reasoning like humans.

The potential for large language models to discover analogies and be creative beyond human capabilities.

The impact of multimodality on AI models, suggesting improvements in understanding spatial relationships and reasoning.

The philosophical implications of language in cognition and the evolution of views on how symbols are processed in the brain.

The early adoption and advocacy for using GPUs in training neural networks, highlighting the significant increase in computational efficiency.

The exploration of analog computation and the potential for AI to mimic the brain's efficiency in power usage.

The potential for AI in healthcare, emphasizing the vast need and opportunity for improvement in medical services.

Concerns about the misuse of AI by bad actors for harmful purposes, such as killer robots, public opinion manipulation, and mass surveillance.

Reflections on the process of selecting talent in AI research and the importance of intuition and technical strength.

Transcripts

play00:00

have

play00:00

you reflected a lot on how to select

play00:04

Talent or has that mostly been like

play00:07

intuitive to you Ilia just shows up and

play00:09

you're like this is a clever guy let's

play00:11

let's work together or have you thought

play00:13

a lot about that can we are we recording

play00:15

should we should we roll This yeah let's

play00:18

roll this okay we're good yeah

play00:20

yeah

play00:24

okay s is working

play00:30

so I remember when I first got to K

play00:32

melon from England in England at a

play00:34

Research Unit it would get to be 6:00

play00:36

and you'd all go for a drink in the pub

play00:39

um at Caril melon I remember after I've

play00:41

been there a few weeks it was Saturday

play00:43

night I didn't have any friends yet and

play00:46

I didn't know what to do so I decided

play00:47

I'd go into the lab and do some

play00:48

programming because I had a list machine

play00:50

and you couldn't program it from home so

play00:52

I went into the lab at about 9:00 on a

play00:53

Saturday night and it was swarming all

play00:57

the students were there and they were

play00:59

all there because what they were working

play01:01

on was the future they all believed that

play01:03

what they did next was going to change

play01:05

the course of computer science and it

play01:07

was just so different from England and

play01:09

so that was very refreshing take me back

play01:12

to the very beginning Jeff at Cambridge

play01:16

uh trying to understand the brain uh

play01:18

what was that like it was very

play01:21

disappointing so I did physiology and in

play01:24

the summer term they were going to teach

play01:25

us how the brain worked and it all they

play01:27

taught us was how neurons conduct action

play01:30

potentials which is very interesting but

play01:32

it doesn't tell you how the brain works

play01:34

so that was extremely disappointing I

play01:36

switched to philosophy then I thought

play01:38

maybe they'd tell us how the mind worked

play01:40

um that was very disappointing I

play01:42

eventually ended up going to Edinburgh

play01:43

to do Ai and that was more interesting

play01:46

at least you could simulate things so

play01:48

you could test out theories and did you

play01:50

remember what intrigued you about AI was

play01:53

it a paper was it any particular person

play01:56

that exposed you to those ideas I guess

play01:59

it was a book I read by Donald Hebb that

play02:01

influenced me a lot um he was very

play02:05

interested in how you learn the

play02:07

connection strengths in neural Nets I

play02:09

also read a book by John Fon noyman

play02:11

early on um who was very interested in

play02:15

how the brain computes and how it's

play02:16

different from normal computers and did

play02:19

you get that conviction that this ideas

play02:22

would work out at at that point or what

play02:25

would was your intuition back at the

play02:27

Edinburgh days it seemed to me there has

play02:31

to be a way that the brain

play02:33

learns and it's clearly not by having

play02:36

all sorts of things programmed into it

play02:39

and then using logical rules of

play02:40

inference that just seemed to me crazy

play02:42

from the outset um so we had to figure

play02:46

out how the brain learned to modify

play02:49

Connections in a neural net so that it

play02:50

could do complicated things and Fon

play02:53

Norman believed that churing believed

play02:55

that so Forman and churing were both

play02:57

pretty good at logic but they didn't

play02:58

believe in this logical approach and

play03:01

what was your split between studying the

play03:03

ideas from from

play03:05

neuroscience and just doing what seemed

play03:08

to be good algorithms for for AI how

play03:11

much inspiration did you take early on

play03:13

so I never did that much study of

play03:15

Neuroscience I was always inspired by

play03:17

what I'd learned about how the brain

play03:19

works that there's a bunch of neurons

play03:21

they perform relatively simple

play03:23

operations they're nonlinear um but they

play03:26

collect inputs they wait them and then

play03:29

they an output that depends on that

play03:31

weighted input and the question is how

play03:33

do you change those weights to make the

play03:34

whole thing do something good it seems

play03:36

like a fairly simple question what

play03:38

collaborations do you remember from from

play03:41

that time the main collaboration I had

play03:43

at Carnegie melon was with someone who

play03:45

wasn't at carnegy melon I was

play03:47

interacting a lot with Terry sinowski

play03:48

who was in Baltimore at John's Hopkins

play03:51

and about once a month either he would

play03:53

drive to Pittsburg or I drive to

play03:54

Baltimore it's 250 miles away and we

play03:57

would spend a weekend together working

play03:58

on boltimore machines that was a

play04:00

wonderful collaboration we were both

play04:01

convinced it was how the brain worked

play04:03

that was the most exciting research I've

play04:05

ever done and a lot of technical results

play04:07

came out that were very interesting but

play04:09

I think it's not how the brain works um

play04:11

I also had a very good collaboration

play04:13

with um Peter Brown who was a very good

play04:17

statistician and he worked on speech

play04:19

recognition at IBM and then he came as a

play04:22

more mature student to kind melon just

play04:24

to get a PhD um but he already knew a

play04:27

lot he taught me a lot about spee

play04:30

and he in fact taught me about hidden

play04:31

Markov models I think I learn more from

play04:33

him than he learned from me that's the

play04:35

kind of student you want and when he Tau

play04:38

me about hidden Markov models I was

play04:41

doing back propop with hidden layers

play04:43

only they weren't called hidden layers

play04:44

then and I decided that name they use in

play04:47

Hidden Markov models is a great name for

play04:49

variables that you don't know what

play04:50

they're up to um and so that's where the

play04:54

name hidden in neur NS came from me and

play04:57

P decided that was a great name for the

play04:59

hidden hidden L and your all Nets um but

play05:03

I learned a lot from Peter about speech

play05:05

take us back to when Ilia showed up at

play05:08

your at your office I was in my office I

play05:11

probably on a Sunday um and I was

play05:14

programming I think and there was a

play05:16

knock on the door not just any knock but

play05:17

it won't

play05:19

cutter it's sort of an urgent knock so I

play05:21

went and answer to the door and this was

play05:23

this young student there and he said he

play05:25

was cooking Fries over the summer but

play05:27

he'd rather be working in my lab and so

play05:29

I said well why don't you make an

play05:30

appointment and we'll talk and so Ilia

play05:32

said how about now and that sort of was

play05:35

Ila's character so we talked for a bit

play05:38

and I gave him a paper to read which was

play05:40

the nature paper on back

play05:42

propagation and we made another meeting

play05:45

for a week later and he came back and he

play05:47

said I didn't understand it and I was

play05:49

very disappointed I thought he seemed

play05:50

like a bright guy but it's only the

play05:52

chain rule it's not that hard to

play05:54

understand and he said oh no no I

play05:56

understood that I just don't understand

play05:58

why you don't give the gradient to a

play06:00

sensal a sensible function

play06:02

Optimizer which took us quite a few

play06:04

years to think about um and it kept on

play06:07

like that with a he had very good his

play06:09

raw intuitions about things were always

play06:11

very good what do you think had enabled

play06:14

those uh those intuitions for for Ilia I

play06:17

don't know I think he always thought for

play06:19

himself he was always interested in AI

play06:21

from a young age um he's obviously good

play06:24

at math so but it's very hard to know

play06:27

and what was that collaboration between

play06:29

between the two of you like what part

play06:32

would you play and what part would Ilia

play06:34

play it was a lot of fun um I remember

play06:37

one occasion when we were trying to do a

play06:41

complicated thing with producing maps of

play06:43

data where I had a kind of mixture model

play06:46

so you could take the same bunch of

play06:47

similarities and make two maps so that

play06:50

in one map Bank could be close to Greed

play06:52

and in another map Bank could be close

play06:54

to River um cuz in one map you can't

play06:57

have it close to both right cuz River

play06:59

and greed along wayon so we'd have a

play07:01

mixture maps and we were doing it in mat

play07:05

lab and this involved a lot of

play07:06

reorganization of the code to do the

play07:08

right Matrix multiplies and only got fed

play07:10

up with that so he came one day and said

play07:12

um I'm going to write a an interface for

play07:15

Matlab so I program in this different

play07:17

language and then I have something that

play07:19

just converts it into Matlab and I said

play07:21

no Ilia um that'll take you a month to

play07:24

do we've got to get on with this project

play07:26

don't get diverted by that and I said

play07:28

it's okay I did it this

play07:32

morning and that's that's quite quite

play07:34

incredible and throughout those those

play07:37

years the biggest shift wasn't

play07:40

necessarily just the the algorithms but

play07:42

but also the the skill how did you sort

play07:45

of view that skill uh over over the

play07:49

years Ilia got that intuition very early

play07:51

so Ilia was always preaching that um you

play07:55

just make it bigger and it'll work

play07:56

better and I always thought that was a

play07:58

bit of a copout do you going to have to

play07:59

have new ideas too it turns out I was

play08:02

basically right new ideas help things

play08:04

like Transformers helped a lot but it

play08:06

was really the scale of the data and the

play08:09

scale of the computation and back then

play08:11

we had no idea computers would get like

play08:13

a billion times faster we thought maybe

play08:15

they' get a 100 times faster we were

play08:17

trying to do things by coming up with

play08:19

clever ideas that would have just solved

play08:21

themselves if we had had bigger scale of

play08:22

the data and computation in about

play08:25

2011 Ilia and another graduate student

play08:28

called James Martins and

play08:30

had a paper using character level

play08:32

prediction so we took Wikipedia and we

play08:35

tried to predict the next HTML character

play08:39

and that worked remarkably well and we

play08:41

were always amazed at how well it worked

play08:43

and that was using a fancy Optimizer on

play08:47

gpus and we could never quite believe

play08:50

that it understood anything but it

play08:52

looked as though it

play08:53

understood and that just seemed

play08:55

incredible can you take us through how

play08:58

are do models trained to predict the

play09:01

next word and why is it the wrong way of

play09:06

of thinking about them okay I don't

play09:08

actually believe it is the wrong way so

play09:12

in fact I think I made the first

play09:13

neuronet language model that used

play09:15

embeddings and back propagation so it's

play09:18

very simple data just

play09:19

triples and it was turning each symbol

play09:23

into an embedding then having the

play09:25

embeddings interact to predict the

play09:27

embedding of the next symbol and from

play09:29

that predic the next symbol and then it

play09:31

was back propagating through that whole

play09:32

process to learn these triples and I

play09:35

showed it could generalize um about 10

play09:38

years later Yoshua Benji used a very

play09:40

similar Network and showed it work with

play09:41

real text and about 10 years after that

play09:44

linguist started believing in embeddings

play09:46

it was a slow process the reason I think

play09:49

it's not just predicting the next symbol

play09:52

is if you ask well what does it take to

play09:54

predict the next symbol particularly if

play09:56

you ask me a question and then the first

play09:59

word of the answer is the next symbol um

play10:03

you have to understand the question so I

play10:06

think by predicting the next

play10:08

symbol it's very unlike oldfashioned

play10:11

autocomplete oldfashioned autocomplete

play10:13

you'd store sort of triples of words and

play10:16

then if you sort a pair of words you see

play10:18

how often different words came third and

play10:20

that way you can predict the next symbol

play10:22

and that's what most people think auto

play10:23

complete is like it's no longer at all

play10:26

like that um to predict the next symbol

play10:28

you have to understand what's been said

play10:30

so I think you're forcing it to

play10:31

understand by making it predict the next

play10:33

symbol and I think it's understanding in

play10:36

much the same way we are so a lot of

play10:38

people will tell you these things aren't

play10:40

like us um they're just predicting the

play10:42

next symbol they're not reasoning like

play10:44

us but actually in order to predict the

play10:47

next symbol it's have going to have to

play10:48

do some reasoning and we've seen now

play10:50

that if you make big ones without

play10:52

putting in any special stuff to do

play10:53

reasoning they can already do some

play10:55

reasoning and I think as you make them

play10:57

bigger they're going to be able to do

play10:58

more and more reasoning do you think I'm

play11:00

doing anything else than predicting the

play11:01

next symbol right now I think that's how

play11:04

you're learning I think you're

play11:06

predicting the next video frame um

play11:08

you're predicting the next sound um but

play11:11

I think that's a pretty plausible theory

play11:13

of how the brain's learning what enables

play11:16

these models to learn such a wide

play11:19

variety of of fields what these big

play11:21

language models are doing is they

play11:23

looking for common structure and by

play11:25

finding common structure they can encode

play11:27

things using the common structure and

play11:29

that more efficient so let me give you

play11:31

an example if you ask

play11:33

gp4 why is a compost heap like an atom

play11:36

bomb most people can't answer that most

play11:39

people haven't thought they think atom

play11:41

bombs and compost heeps are very

play11:42

different things but gp4 will tell you

play11:44

well the energy scales are very

play11:46

different and the time scales are very

play11:48

different but the thing that's the same

play11:51

is that when the compost Heep gets

play11:52

hotter it generates heat faster and when

play11:55

the atom bomb produces more NE neutrons

play11:57

it produces more neutrons faster

play12:00

and so it gets the idea of a chain

play12:02

reaction and I believe it's understood

play12:04

they're both forms of chain reaction

play12:06

it's using that understanding to

play12:08

compress all that information into its

play12:09

weights and if it's doing that then it's

play12:13

going to be doing that for hundreds of

play12:15

things where we haven't seen the

play12:16

analogies yet but it has and that's

play12:18

where you get creativity from from

play12:20

seeing these analogies between

play12:21

apparently very different things and so

play12:23

I think gp4 is going to end up when it

play12:25

gets bigger being very creative I think

play12:27

this idea that it's just just

play12:29

regurgitating what it's learned just

play12:31

pasing together text it's learned

play12:33

already that's completely wrong it's

play12:35

going to be even more creative than

play12:37

people I think you'd argue that it won't

play12:40

just repeat the human knowledge we've

play12:43

developed so far but could also progress

play12:46

beyond that I think that's something we

play12:48

haven't quite seen yet we've started

play12:51

seeing some examples of it but to a to a

play12:53

large extent we're sort of still at the

play12:56

current level of of of science what do

play12:58

you think will enable it to go beyond

play13:00

that well we've seen that in more

play13:01

limited context like if you take Alpha

play13:04

go in that famous competition with Leo

play13:08

um there was move 37 where Alpha go made

play13:11

a move that all the experts said must

play13:13

have been a mistake but actually later

play13:15

they realized it was a brilliant move um

play13:18

so that was created within that limited

play13:20

domain um I think we'll see a lot more

play13:22

of that as these things get bigger the

play13:25

difference with alphao as well was that

play13:28

it was using reinforcement learning that

play13:31

that subsequently sort of enabled it to

play13:33

to go beyond the current state so it

play13:35

started with imitation learning watching

play13:37

how humans play the game and then it

play13:39

would through selfplay develop Way

play13:42

Beyond that do you think that's the

play13:43

missing component of the I think that

play13:46

may well be a missing component yes that

play13:48

the the self-play in Alpha in Alpha go

play13:51

and Alpha zero are are a large part of

play13:54

why it could make these creative moves

play13:56

but I don't think it's entirely

play13:58

necessary

play13:59

so there's a little experiment I did a

play14:01

long time ago where you your training in

play14:03

neuronet to recognize handwritten digits

play14:06

I love that example the mest example and

play14:09

you give it training data where half the

play14:11

answers are

play14:12

wrong um and the question is how well

play14:15

will it

play14:17

learn and you make half the answers

play14:20

wrong once and keep them like that so it

play14:23

can't average away the wrongness by just

play14:25

seeing the same example but with the

play14:27

right answer sometimes and the wrong

play14:28

answer sometimes when it sees that

play14:29

example half half of the examples when

play14:32

it sees the example the answer is always

play14:33

wrong and so the training data has 50%

play14:37

error but if you train up back

play14:40

propagation it gets down to 5% error or

play14:44

less other words from badly labeled data

play14:49

it can get much better results it can

play14:51

see that the training data is wrong and

play14:54

that's how smart students can be smarter

play14:55

than their advisor and their advisor

play14:57

tells them all this stuff

play14:59

and for half of what their advisor tells

play15:01

them they think no rubbish and they

play15:03

listen to the other half and then they

play15:05

end up smarter than the advisor so these

play15:06

big neural Nets can actually do they can

play15:09

do much better than their training data

play15:11

and most people don't realize that so

play15:13

how how do you expect this models to add

play15:16

reasoning in into them so I mean one

play15:19

approach is you add sort of the

play15:20

heuristics on on top of them which a lot

play15:23

of the research is doing now where you

play15:25

have sort of Shan of thought you just

play15:26

feedback it's reasoning um in into

play15:29

itself and another way would be in the

play15:32

model itself as you scale scale scale it

play15:34

up what's your intuition around that so

play15:38

my intuition is that as we scale up

play15:40

these models I get better at reasoning

play15:42

and if you ask how people work roughly

play15:44

speaking we have these

play15:47

intuitions and we can do reasoning and

play15:50

we use the reasoning to correct our

play15:52

intuitions of course we use the

play15:54

intuitions during the reasoning to do

play15:55

the reasoning but if the conclusion of

play15:57

the reasoning conflicts with our in

play15:58

itions we realize the intuitions need to

play16:00

be changed that's much like in Alpha go

play16:03

or Alpha zero where you have an

play16:06

evaluation function um that just looks

play16:09

at a board and says how good is that for

play16:10

me but then you do the Monte Cara roll

play16:13

out and now you get a more accurate idea

play16:17

and you can revise your evaluation

play16:18

function so you can train it by getting

play16:20

it to agree with the results of

play16:22

reasoning and I think these large

play16:23

language models have to start doing that

play16:26

they have to start training their Raw

play16:28

intuitions about what should come next

play16:30

by doing reasoning and realizing that's

play16:32

not right and so that way they can get

play16:35

more training data than just mimicking

play16:37

what people did and that's exactly why

play16:40

alphao could do this creative move 37 it

play16:43

had much more training data because it

play16:44

was using reasoning to check out what

play16:47

the right next move should have been and

play16:49

what do you think about multimodality so

play16:52

we spoke about these analogies and often

play16:54

the analogies are Way Beyond what we

play16:56

could see it's discovering analogy that

play16:59

are far beyond humans and at maybe

play17:01

abstraction levels that we'll never be

play17:03

able to to to understand now when we

play17:06

introduce images to that and and video

play17:09

and sound how do you think that will

play17:11

change the models and uh how do you

play17:14

think it will change the analogies that

play17:16

it will be able to make um I think it'll

play17:19

change it a lot I think it'll make it

play17:21

much better at understanding spatial

play17:23

things for example from language alone

play17:26

it's quite hard to understand some

play17:27

spatial things although remarkably gp4

play17:30

can do that even before it was

play17:32

multimodal um but when you make it

play17:35

multimodal if you have it both doing

play17:38

vision and reaching out and grabbing

play17:40

things it'll understand object much

play17:42

better if it can pick them up and turn

play17:44

them over and so on so although you can

play17:47

learn an awful lot from language it's

play17:50

easier to learn if you multimodal and in

play17:53

fact you then need less language and

play17:55

there's an awful lot of YouTube video

play17:57

for predicting the next frame so or

play17:59

something like that so I think these

play18:01

multimodule models are clearly going to

play18:03

take over um you can get more data that

play18:06

way they need less language so there's

play18:08

really a philosophical point that you

play18:10

could learn a very good model from

play18:12

language alone but it's much easier to

play18:14

learn it from a multimodal system and

play18:16

how do you think it will impact the

play18:18

model's reasoning I think it'll make it

play18:21

much better at reasoning about space for

play18:22

example reasoning about what happens if

play18:24

you pick objects up if you actually try

play18:26

picking objects up you're going to get

play18:27

all sorts of training data that's going

play18:29

to help do you think the human brain

play18:32

evolved to work well with with language

play18:35

or do you think language evolved to work

play18:37

well with the human brain I think the

play18:40

question of whether language evolved to

play18:41

work with the brain or the brain evolved

play18:43

to work with language I think that's a

play18:44

very good question I think both happened

play18:48

I used to think we would do a lot of

play18:50

cognition without needing language at

play18:52

all um now I've changed my mind a bit so

play18:57

let me give you three different views of

play18:59

language um and how it relates to

play19:01

cognition there's the oldfashioned

play19:03

symbolic view which is cognition

play19:05

consists of having strings of symbols in

play19:10

some kind of cleaned up logical language

play19:12

where there's no ambiguity and applying

play19:14

rules of inference and that's what

play19:15

cognition is it's just these symbolic

play19:17

manipulations on things that are like

play19:19

strings of language symbols um so that's

play19:22

one extreme view an opposite extreme

play19:24

view is no no once you get inside the

play19:27

head it's all vectors so symbols come in

play19:30

you convert those symbols into big

play19:32

vectors and all the stuff inside's done

play19:34

with big vectors and then if you want to

play19:36

produce output you produce symbols again

play19:38

so there was a point in machine

play19:40

translation in about

play19:42

2014 when people were using neural

play19:44

recurrent neural Nets and words will

play19:46

keep coming in and that have a hidden

play19:48

State and they keep accumulating

play19:50

information in this hidden state so when

play19:52

they got to the end of a sentence that

play19:55

have a big hidden Vector that captures

play19:56

the meaning of that sentence that could

play19:59

then be used for producing the sentence

play20:00

in another language that was called a

play20:02

thought vector and that's a sort of

play20:04

second view of language you convert the

play20:05

language into a big Vector that's

play20:08

nothing like language and that's what

play20:10

cognition is all about but then there's

play20:12

a third view which is what I believe now

play20:15

which is that you take these

play20:20

symbols and you convert the symbols into

play20:23

embeddings and you use multiple layers

play20:25

of that so you get these very rich

play20:26

embeddings but the embeddings are still

play20:28

to the symbols in the sense that you've

play20:30

got a big Vector for this symbol and a

play20:31

big Vector for that symbol and these

play20:34

vectors interact to produce the vector

play20:36

for the symbol for the next word and

play20:39

that's what understanding is

play20:40

understanding is knowing how to convert

play20:42

the symbols into these vectors and

play20:44

knowing how the elements of the vector

play20:45

should interact to predict the vector

play20:47

for the next symbol that's what

play20:49

understanding is both in these big

play20:50

language models and in our

play20:52

brains and that's an example which is

play20:55

sort of in between you're staying with

play20:57

the symbols but you're interpreting them

play21:00

as these big vectors and that's where

play21:02

all the work is and all the knowledge is

play21:04

in what vectors you use and how the

play21:06

elements of those vectors interact not

play21:08

in symbolic

play21:09

rules um but it's not saying that you

play21:13

get away from the symbols all together

play21:14

it's saying you turn the symbols into

play21:16

big vectors but you stay with that

play21:18

surface structure of the symbols and

play21:20

that's how these models are working and

play21:22

that's I seem to be a more plausible

play21:24

model of human thought too you were one

play21:26

of the first folks to get idea of using

play21:30

gpus and I know yansen loves you for

play21:34

that uh back in 2009 you mentioned that

play21:36

you told yansen that this could be a

play21:38

quite good idea um for for training

play21:41

training neural Nets take us back to

play21:43

that early intuition of of using gpus

play21:46

for for training neural Nets so actually

play21:48

I think in about

play21:50

2006 I had a former graduate student

play21:53

called Rick zisy who's a very good

play21:55

computer vision guy and I talked to him

play21:58

and a meeting and he said you know you

play22:00

ought to think about using Graphics

play22:02

processing cards because they're very

play22:03

good at Matrix multiplies and what

play22:05

you're doing is basically all matric

play22:07

multiplies so I thought about that for a

play22:09

bit and then we learned about these

play22:11

Tesla systems that had um four gpus in

play22:16

and initially we just got um gaming gpus

play22:21

and discovered they made things go 30

play22:22

times faster and then we bought one of

play22:24

these Tesla systems with 4 gpus and we

play22:27

did speech on that and it worked very

play22:30

well then in 2009 I gave a talk at nips

play22:34

and I told a thousand machine learning

play22:36

researches you should all go and buy

play22:37

Nvidia gpus they're the future you need

play22:39

them for doing machine learning and I

play22:42

actually um then sent mail to Nvidia

play22:45

saying I told a thousand machine

play22:46

learning researchers to buy your boards

play22:48

could you give me a free one and they

play22:49

said no actually they didn't say no they

play22:51

just didn't reply um but when I told

play22:54

Jensen this story later on he gave me a

play22:55

free

play22:57

one that's uh that's very very good I I

play23:00

think what's interesting is um as well

play23:02

is sort of how gpus has evolved

play23:05

alongside the the field so where where

play23:07

do you think we we should go go next in

play23:10

in the in the compute so my last couple

play23:13

of years at Google I was thinking about

play23:15

ways of trying to make analog

play23:17

computation so that instead of using

play23:19

like a megawatt we could use like 30

play23:21

Watts like the brain and we could run

play23:23

these big language models in analog

play23:26

hardware and I never made it

play23:29

work and but I started really

play23:32

appreciating digital computation so if

play23:36

you're going to use that low power

play23:38

analog

play23:39

computation every piece of Hardware is

play23:41

going to be a bit different and the idea

play23:43

is the learning is going to make use of

play23:45

the specific properties of that hardware

play23:47

and that's what happens with people all

play23:49

our brains are different um so we can't

play23:52

then take the weights in your brain and

play23:54

put them in my brain the hardware is

play23:56

different the precise properties of the

play23:58

individual ual neurons are different the

play23:59

learning used to make has learned to

play24:01

make use of all that and so we're mortal

play24:04

in the sense that the weights in my

play24:05

brain are no good for any other brain

play24:07

when I die those weights are useless um

play24:10

we can get information from one to

play24:12

another rather

play24:13

inefficiently by I produce sentences and

play24:16

you figure out how to change your weight

play24:18

so you would have said the same thing

play24:20

that's called distillation but that's a

play24:22

very inefficient way of communicating

play24:24

knowledge and with digital systems

play24:27

they're immortal because once you got

play24:29

some weights you can throw away the

play24:31

computer just store the weights on a

play24:32

tape somewhere and now build another

play24:34

computer put those same weights in and

play24:36

if it's digital it can compute exactly

play24:39

the same thing as the other system did

play24:41

so digital systems can share weights and

play24:45

that's incredibly much more efficient if

play24:48

you've got a whole bunch of digital

play24:50

systems and they each go and do a tiny

play24:51

bit of

play24:52

learning and they start with the same

play24:54

weights they do a tiny bit of learning

play24:56

and then they share their weights again

play24:58

um they all know what all the others

play24:59

learned we can't do that and so they're

play25:03

far superior to us in being able to

play25:04

share knowledge a lot of the ideas that

play25:07

have been deployed in the field are very

play25:10

old school ideas uh it's the ideas that

play25:13

have been around the Neuroscience for

play25:15

forever what do you think is sort of

play25:17

left to to to apply to the systems that

play25:19

we develop so one big thing that we

play25:23

still have to catch up with Neuroscience

play25:26

on is the time scales for changes so in

play25:31

nearly all the neural Nets there's a

play25:34

fast time scale for changing activities

play25:35

so input comes in the activities the

play25:38

embedding vectors all change and then

play25:40

there's a slow time scale which is

play25:41

changing the weights and that's

play25:43

long-term learning and you just have

play25:45

those two time scales in the brain

play25:48

there's many time scales at which

play25:49

weights change so for example if I say

play25:53

an unexpected word like cucumber and now

play25:56

5 minutes later you put headphones on

play25:58

there's a lot of noise and there's very

play26:00

faint words you'll be much better at

play26:03

recognizing the word cucumber because I

play26:05

said it 5 minutes ago so where is that

play26:08

knowledge in the brain and that

play26:10

knowledge is obviously in temporary

play26:12

changes to synapsis it's not neurons are

play26:14

going cucumber cucumber cucumber you

play26:16

don't have enough neurons for that it's

play26:18

in temporary changes to the weights and

play26:21

you can do a lot of things with

play26:22

temporary weight changes fast what I

play26:24

call fast weights we don't do that in

play26:26

these neural models and the reason we

play26:28

don't do it is because if you have

play26:31

temporary changes to the weights that

play26:33

depend on the input data then you can't

play26:37

process a whole bunch of different cases

play26:38

at the same time at present we take a

play26:41

whole bunch of different strings we

play26:43

stack them stack them together and we

play26:45

process them all in parallel because

play26:47

then we can do Matrix Matrix multiplies

play26:48

which is much more efficient and just

play26:51

that efficiency is stopping us using

play26:53

fast weights but the brain clearly uses

play26:56

fast weights for temporary memory and

play26:59

there's all sorts of things you can do

play27:00

that way that we don't do at present I

play27:02

think that's one of the biggest things

play27:03

we have to learn I was very hopeful that

play27:04

things like graph core um if they went

play27:08

sequential and did just online learning

play27:11

then they could use fast weights

play27:13

um but that hasn't worked out yet I

play27:16

think it'll work out eventually when

play27:18

people are using conductances for

play27:19

weights how has knowing how this models

play27:23

work and knowing how the brain works

play27:26

impacted the way you you think I think

play27:29

there's been one big impact which is at

play27:33

a fairly abstract level which is that

play27:35

for many

play27:37

years people were very scornful about

play27:40

the idea of having a big random neural

play27:42

net and just giving a lot of training

play27:44

data and it would learn to do

play27:46

complicated things if you talk to

play27:47

statisticians or linguists or most

play27:50

people in AI they say that's just a pipe

play27:53

dream there's no way you're going to

play27:54

learn to really complicated things

play27:56

without some kind of innate knowledge

play27:57

without a lot of architectural

play27:59

restrictions it turns out that's

play28:00

completely wrong you can take a big

play28:03

random neural network and you can learn

play28:04

a whole bunch of stuff just from data um

play28:08

so the idea that stochastic gradient

play28:10

descent to adjust the repeatedly adjust

play28:13

the weights using a gradient that will

play28:16

learn things and we'll learn big

play28:17

complicated things that's been validated

play28:21

by these big models and that's a very

play28:23

important thing to know about the brain

play28:25

it doesn't have to have all this innate

play28:27

structure now obviously it's got a lot

play28:28

of innate structure but it certainly

play28:32

doesn't need innate structure for things

play28:33

that are easily

play28:35

learned and so the sort of idea coming

play28:37

from Chomsky that you won't you won't

play28:39

learn anything complicated like language

play28:41

unless it's all kind of wired in already

play28:43

and just matures that idea is now

play28:46

clearly nonsense I'm sure shumsky would

play28:49

appreciate you calling his ideas

play28:51

nonsense well I think actually I think a

play28:54

lot of chs's political ideas are very

play28:56

sensible and I'm was struck by how how

play28:59

come someone with such sensible ideas

play29:00

about the Middle East could be so wrong

play29:02

about

play29:03

Linguistics what do you think would make

play29:05

these models simulate consciousness of

play29:09

of humans more effectively but imagine

play29:12

you had the AI assistant that you've

play29:14

spoken to in your entire life and

play29:16

instead of that being you know like chat

play29:19

today that sort of deletes the memory of

play29:21

the conversation and you start fresh all

play29:23

of the time okay it had

play29:26

self-reflection at some point you you

play29:28

pass away and you tell that to to the

play29:32

assistant do you think I me not me

play29:35

somebody else tells that toist yeah you

play29:38

would it would be difficult for you to

play29:39

tell that to the assistant um do you

play29:42

think that assistant would would feel at

play29:44

that point yes I think they can have

play29:46

feelings too so I think just as we have

play29:49

this inner theater model for perception

play29:51

we have an inthat model for feelings

play29:53

they're things that I can experience but

play29:55

other people can't um

play29:59

I think that model is equally wrong so I

play30:02

think suppose I say I feel like punching

play30:04

Gary on the nose which I often do let's

play30:07

try and Abstract that away from the idea

play30:10

of an inner theater what I'm really

play30:12

saying to you is um if it weren't for

play30:16

the inhibition coming from my frontal

play30:17

loes I would perform an action so when

play30:20

we talk about feelings we really talking

play30:22

about um actions we would perform if it

play30:25

weren't for um con straints and that

play30:29

really that's really what feelings are

play30:31

the actions we would do if it weren't

play30:32

for

play30:33

constraints um so I think you can give

play30:36

the same kind of explanation for

play30:37

feelings and there's no reason why these

play30:39

things can't have feelings in fact in

play30:42

1973 I saw a robot having an emotion so

play30:46

in Edinburgh they had a robot with two

play30:49

grippers like this that could assemble a

play30:51

toy car if you put the pieces separately

play30:54

on a piece of green felt um but if you

play30:58

put them in a pile his vision wasn't

play31:01

good enough to figure out what was going

play31:02

on so it put his grip whack and it

play31:05

knocked them so they were scattered and

play31:06

then it could put them together if you

play31:08

saw that in a person you say it was

play31:10

crossed with the situation because it

play31:11

didn't understand it so it destroyed

play31:13

it that's

play31:16

profound you uh we spoke previously you

play31:19

described sort of humans and and and and

play31:22

the llms as analogy machines what do you

play31:24

think has been the most powerful

play31:27

analogies that you found throughout your

play31:30

life oh in throughout my life um woo I

play31:36

guess probably an a sort of weak analogy

play31:40

that's influenced me a lot is um the

play31:45

analogy between religious belief and

play31:48

between belief in symbol

play31:50

processing so when I was very young I

play31:52

was confronted I came from an atheist

play31:54

family and went to school and was

play31:56

confronted with religious belief and it

play31:58

just seemed nonsense to me it still

play32:00

seems nonsense to me um and when I saw

play32:03

symbol processing as an explanation how

play32:04

people worked um I thought it was just

play32:08

the same

play32:10

nonsense I don't think it's quite so

play32:12

much nonsense now because I think

play32:15

actually we do do symbol processing it's

play32:17

just we do it by giving these big

play32:19

embedding vectors to the symbols but we

play32:21

are actually symbol processing um but

play32:24

not at all in the way people thought

play32:25

where you match symbols and the only

play32:27

thing is symbol has is it's identical to

play32:29

another symbol or it's not identical

play32:31

that's the only property a symbol has we

play32:33

don't do that at all we use the context

play32:35

to give embedding vectors to symbols and

play32:37

then use the interactions between the

play32:39

components of these embedding vectors to

play32:41

do thinking but there's a very good

play32:44

researcher at Google called Fernando

play32:46

Pereira who said yes we do have symbolic

play32:50

reasoning and the only symbolic we have

play32:52

is natural language natural language is

play32:54

a symbolic language and we reason with

play32:55

it and I believe that now you've done

play32:58

some of the most meaningful uh research

play33:00

in the history of of computer science

play33:03

can you walk us through like how do you

play33:05

select the right problems to to work on

play33:08

well first let me correct you me and my

play33:11

students have done a lot of the most

play33:12

meaningful things and it's mainly been a

play33:15

very good collaboration with students

play33:17

and my ability to select very good

play33:19

students and that came from the fact

play33:21

that were very few people doing neural

play33:23

Nets in the 70s and 80s and 90s and

play33:25

2000s and so the few people doing your

play33:28

nets got to pick the very best students

play33:30

so that was a piece of luck but my way

play33:33

of selecting problems is

play33:35

basically well you know when scientists

play33:37

talk about how they work they have

play33:40

theories about how they work which

play33:41

probably don't have much to do with the

play33:42

truth but my theory is that

play33:45

I look for something where everybody's

play33:48

agreed about something and it feels

play33:50

wrong just there's a slight intuition

play33:52

there's something wrong about it and

play33:54

then I work on that and see if I can

play33:56

elaborate why it is I think it's wrong

play33:58

and maybe I can make a little demo with

play34:00

a small computer program that shows that

play34:04

it doesn't work the way you might expect

play34:06

so let me take one example um most

play34:09

people think that if you add noise to a

play34:11

neural net is going to work worse um if

play34:14

for example each time you put a training

play34:16

example through

play34:19

you make half of the neurons be silent

play34:22

it'll work worse actually we know it'll

play34:26

generalize better if you do that

play34:28

and you can demonstrate that um in a

play34:32

simple example that's what's nice about

play34:34

computer simulation you can show you

play34:36

know this idea you had that adding noise

play34:38

is going to make it worse and sort of

play34:39

dropping out half the neurons will make

play34:41

it work worse which you will in the

play34:42

short term but if you train it with like

play34:45

that in the end it'll work better you

play34:47

can demonstrate that with a small

play34:48

computer program and then you can think

play34:49

hard about why that is and how it stops

play34:53

big elaborate co- adaptations um but

play34:56

that I think that that's my method of

play34:58

working find something that sounds

play35:00

suspicious and work on it and see if you

play35:03

can give a simple demonstration of why

play35:05

it's wrong what sounds suspicious to you

play35:07

now well that we don't use fast weight

play35:10

sounds suspicious that we only have

play35:12

these two time scales that's just wrong

play35:14

that's not at all like the brain um and

play35:17

in the long run I think we're going to

play35:18

have to have many more time scans so

play35:20

that's an example there and if you had

play35:23

if you had your group of of students

play35:25

today and they came to you and they said

play35:26

so the Hamming question that we talked

play35:27

about previously you know what's the

play35:29

most important problem in in in your

play35:31

field what would you suggest that they

play35:33

take on and work on on next we spoke

play35:36

about reasoning time scales what would

play35:38

be sort of the highest priority Problem

play35:40

that that you'd give them for me right

play35:43

now it's the same question I've had for

play35:45

the last like 30 years or so which is

play35:48

does the brain do back propagation I

play35:51

believe the brain is getting gradients

play35:52

if you don't get gradients your learning

play35:54

is just much worse than if you do get

play35:56

gradients but how is the brain getting

play35:58

gradients and is it

play36:01

somehow implementing some approximate

play36:03

version of back propagation or is it

play36:04

some completely different technique

play36:06

that's a big open question and if I kept

play36:09

on doing research that's what I would be

play36:11

doing research on and when you look back

play36:13

at at your career now you've been right

play36:16

about so many things but what were you

play36:18

wrong about that you wish you sort of

play36:20

spent less time pursuing a certain

play36:23

direction okay those are two separate

play36:25

questions one is what were you wrong

play36:26

about and two do you wish you'd less

play36:28

spent less time on it I think I was

play36:31

wrong about Boltz machines and I'm glad

play36:33

I spent a long time on it there are much

play36:35

more beautiful theory of how you get

play36:37

gradients than back propagation back

play36:39

propagation is just ordinary and

play36:40

sensible and it's just a chain rule B

play36:42

machines is clever and it's a very

play36:44

interesting way to get gradients and I

play36:47

would love for that to be how the brain

play36:49

works but I think it isn't did you spend

play36:52

much time imagining what would happen

play36:54

post the systems developing as as well

play36:57

did you have an idea that okay if we

play36:59

could make these systems work really

play37:00

well we could you know democratize

play37:02

education we could make knowledge way

play37:04

more accessible um we could solve some

play37:07

tough problems in in in medicine or was

play37:10

it more to you about understanding the

play37:13

Brin yes I I sort of feel scientists

play37:17

ought to be doing things that are going

play37:18

to help Society but actually that's not

play37:22

how you do your best research you do

play37:23

your best research when it's driven by

play37:25

curiosity you just have to understand

play37:28

something um much more recently I've

play37:32

realized these things could do a lot of

play37:33

harm as well as a lot of good and I've

play37:35

become much more concerned about the

play37:37

effects they're going to have on society

play37:39

but that's not what was motivating me I

play37:41

just wanted to understand how on Earth

play37:42

can the brain learn to do things that's

play37:45

what I want to know and I sort of failed

play37:47

as a side effect of that failure we got

play37:49

some nice engineering

play37:51

but yeah it was a good good good failure

play37:54

for the world if you take the lens of

play37:56

the things that could go really right

play37:59

what what do you think are the most

play38:01

promising

play38:02

applications I think Health Care is

play38:05

clearly uh a big one um with Health Care

play38:09

there's almost no end to how much Health

play38:12

Care Society can absorb if you take

play38:14

someone old they could use five doctors

play38:18

fulltime um so when AI gets better than

play38:21

people at doing things um you'd like it

play38:25

to get better in areas where you could

play38:27

do with a lot more of that stuff and we

play38:30

could do with a lot more doctors if

play38:32

everybody had three doctors of their own

play38:33

that would be great and we're going to

play38:35

get to that point um so that's one

play38:38

reason why Healthcare is good there's

play38:41

also just a new engineering developing

play38:44

new materials for example for better

play38:46

solar panels or for superc conductivity

play38:49

or for just understanding how the Body

play38:52

Works um there's going to be huge

play38:55

impacts there those are all going to be

play38:57

be good things what I worry about is Bad

play39:00

actors using them for bad things we've

play39:02

facilitated people like Putin or Z or

play39:05

Trump

play39:06

using AI for Killer Robots or for

play39:10

manipulating public opinion or for Mass

play39:12

surveillance and those are all very

play39:14

worrying things are you ever concerned

play39:17

that slowing down the field could also

play39:20

slow down the positives oh absolutely

play39:23

and I think there's not much chance that

play39:26

the field will slow down partly because

play39:29

it's International and if one country

play39:31

slows down the other countries aren't

play39:32

going to slow down so there's a race

play39:35

clearly between China and the US and

play39:37

neither is going to slow down so yeah I

play39:39

don't I mean there was this partition

play39:41

saying we should slow down for six

play39:42

months I didn't sign it just because I

play39:44

thought it was never going to happen I

play39:46

maybe should have signed it because even

play39:47

though it was never going to happen it

play39:49

made a political point it's often good

play39:51

to ask for things you know you can't get

play39:53

just to make a point um but I didn't

play39:55

think we're going to slow down and how

play39:57

do you think that it will impact the AI

play39:59

research process uh having uh this

play40:03

assistance so I think it'll make it a

play40:04

lot more efficient a research will get a

play40:06

lot more efficient when you've got these

play40:08

assistants that help you program um but

play40:11

also help you think through things and

play40:12

probably help you a lot with equations

play40:14

too have you reflected much on the

play40:17

process of selecting Talent has that

play40:19

been mostly intuitive to you like when

play40:22

Ilia shows up at the door you feel this

play40:24

is smart guy let's work together so for

play40:27

selecting Talent um sometimes you just

play40:30

know so after talking to Ilia for not

play40:32

very long he seemed very smart and then

play40:35

talking him a bit more he clearly was

play40:36

very smart and had very good intuitions

play40:38

as well as being good at math so that

play40:41

was a no-brainer there's another case

play40:43

where I was at a NPS conference um we

play40:47

had a poster and I someone came up and

play40:50

he started asking questions about the

play40:52

poster and every question he asked was a

play40:54

sort of deep insight into what we'd done

play40:56

wrong um and after 5 minutes I offered

play40:59

him a postto position that guy was David

play41:01

McKai who was just brilliant and it's

play41:04

very sad he died but he was it was very

play41:07

obvious you'd want him um other times

play41:10

it's not so obvious and one thing I did

play41:12

learn was that people are different

play41:15

there's not just one type of good

play41:17

student um so there's some students who

play41:21

aren't that creative but are technically

play41:23

extremely strong and will make anything

play41:26

work there's other students who aren't

play41:28

technically strong but are very creative

play41:31

of course you want the ones who are both

play41:32

but you don't always get that but I

play41:34

think actually in the lab you need a

play41:36

variety of different kinds of graduate

play41:38

student but I still go with my gut

play41:41

intuition that sometimes you talk to

play41:43

somebody and they're just very very they

play41:45

just get it and those are the ones you

play41:48

want what do you think is the reason for

play41:51

some folks having better intuition do

play41:54

they just have better training data than

play41:56

than others or how can you develop your

play42:00

intuition I think it's partly they don't

play42:03

stand for nonsense so here's a way to

play42:06

get bad intuitions believe everything

play42:08

you're told that's fatal you have to be

play42:12

able to I think here's what some people

play42:14

do they have a whole framework for

play42:15

understanding reality and when someone

play42:17

tells them something they try and sort

play42:20

of figure out how that fits into their

play42:22

framework and if it doesn't they just

play42:24

reject it and that's a very good

play42:28

strategy um people who try and

play42:30

incorporate whatever they're told end up

play42:33

with a framework that's sort of very

play42:35

fuzzy and sort of can believe everything

play42:38

and that's useless so I think actually

play42:41

having a strong view of the world and

play42:44

trying to manipulate incoming facts to

play42:46

fit in with your view obviously it can

play42:48

lead you into deep religious belief and

play42:51

fatal flaws and so on like my belief in

play42:53

boltzman machines um but I think that's

play42:56

the way to go if you got good intuitions

play42:58

you can trust you should trust them if

play43:00

you got bad intuitions it doesn't matter

play43:03

what you do so you might as well trust

play43:05

them a very very good very good point

play43:09

when when you look at the the types of

play43:12

research that's that's that's being done

play43:15

today do you think we're putting all of

play43:17

our eggs in one basket and we should

play43:19

diversify our ideas a bit more in in the

play43:22

field or do you think this is the most

play43:24

promising Direction so let's go all in

play43:26

on it

play43:28

I think having big models and training

play43:30

them on multimodal data even if it's

play43:33

only to predict the next word is such a

play43:35

promising approach that we should go

play43:37

pretty much all in on it obviously

play43:39

there's lots and lots of people doing it

play43:40

now and there's lots of people doing

play43:42

apparently crazy things and that's good

play43:45

um but I think it's fine for like most

play43:47

of the people to be following this path

play43:48

because it's working very well do you

play43:50

think that the learning algorithms

play43:54

matter that much or is it just a skill

play43:56

are there basically millions of ways

play43:59

that we could we could get to human

play44:01

level in in intelligence or are there

play44:03

sort of a select few that we need to

play44:05

discover yes so this issue of whether

play44:08

particular learning algorithms are very

play44:10

important or whether there's a great

play44:12

variety of learning algorithms that'll

play44:13

do the job I don't know the answer it

play44:16

seems to me though that back propagation

play44:19

there's a sense in which it's the

play44:20

correct thing to do getting the gradient

play44:23

so that you change a parameter to make

play44:24

it work better that seems like the right

play44:27

thing to do and it's been amazingly

play44:30

successful there may well be other

play44:32

learning algorithms that are alternative

play44:34

ways of getting that same gradient or

play44:36

that are getting the gradient to

play44:37

something else and that also work

play44:40

um I think that's all open and a very

play44:43

interesting issue now about whether

play44:45

there's other things you can try and

play44:47

maximize that will give you good systems

play44:50

and maybe the brain's doing that because

play44:51

it's

play44:52

easier but backprop is in a sense the

play44:55

right thing to do and we know that doing

play44:57

it works really

play44:59

well and one last question when when you

play45:02

look back at your sort of Decades of

play45:04

research what are you what are you most

play45:05

proud of is it the students is it the

play45:07

research what what makes you most proud

play45:09

of when you look back at at your life's

play45:11

work the learning algorithm for

play45:14

boltimore machines so the learning

play45:16

algorithm for Boltz machines is

play45:17

beautifully elegant it's maybe hopeless

play45:21

in practice um but it's the thing I

play45:25

enjoyed most developing that with Terry

play45:27

and it's what I'm proudest of um even if

play45:31

it's

play45:31

[Music]

play45:36

wrong what questions do you spend most

play45:39

of your time thinking about now is it

play45:41

the um what what should I watch on

play45:43

Netflix

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
AI ResearchNeural NetworksBrain ScienceMachine LearningTech InnovationsEducational InsightsExpert InterviewsPioneering DiscoveriesScientific ExplorationInnovative Thinking