Geoffrey Hinton is a genius | Jay McClelland and Lex Fridman

Lex Clips
27 Sept 202118:51

Summary

TLDR视频脚本讲述了科学家杰夫·辛顿(Jeff Hinton)在神经网络和深度学习领域的贡献。辛顿与同事一起创建了PDP研究小组,推动了交互激活模型的研究。他们意识到,为了优化神经网络的学习,应该从生物学启发转向定义问题并调整连接权重来解决问题。这导致了梯度下降和反向传播算法的发展,后者通过从输出层向隐藏层反向传播错误信号来调整权重。辛顿的直觉式思考和对深度学习工作原理的直观解释,以及他对计算智能和认知机制的深刻理解,对视频讲述者产生了深远影响。视频还探讨了人类智能与计算智能的关系,以及如何通过构建超越生物限制的计算智能来扩展人类智能的可能性。

Takeaways

  • 👨‍🔬 杰夫·辛顿(Jeff Hinton)是深度学习和神经网络领域的先驱,他对于反向传播算法(Backpropagation)的发展做出了重要贡献。
  • 🔄 反向传播算法的核心思想是定义一个目标函数,并通过梯度下降(Gradient Descent)来调整连接权重,以最小化错误或损失。
  • 📈 梯度下降的概念在工程领域已经为人所熟知,并且有一个称为delta规则的算法,由斯坦福大学的Woodrow和Hoff发明。
  • 🧠 保罗·沃尔姆哈特(Paul Werbos)推广了delta规则,使其能够应用于多层神经网络,从而形成了广义delta规则,也就是后来的反向传播算法。
  • 🤔 辛顿当时对反向传播算法的效果持怀疑态度,并提出了基于balsa机器概念的替代算法,但最终反向传播算法被证明更为有效。
  • 🌟 辛顿在1981年发表了两篇开创性的论文,一篇关于transformers的概念,另一篇关于语义认知的早期研究,对认知科学产生了深远影响。
  • 📚 辛顿在1977年未发表的一篇论文中描述了神经网络如何进行递归计算,这一想法他持续探索多年。
  • 💡 辛顿的思考方式对理解人类数学认知产生了深远影响,他倾向于用直观的方式解释深度学习的工作原理,而不是使用复杂的数学方程。
  • 🧵 辛顿的工作与布尔逻辑和理论物理相联系,特别是他在80年代中期关于Boltzmann机的研究,将布尔传统带入了更连续的概率和分级约束满足领域。
  • 🧬 辛顿希望对推理本身做出贡献,不仅仅是人类的推理,而是更广泛地理解推理的机制。
  • 🧠 人类智能只是智能的一种表现形式,辛顿和其他研究者正在探索的计算智能领域可能超越人类自身的生物学限制。
  • 🚀 深度学习在谷歌大脑、OpenAI和DeepMind等项目中展现出巨大潜力,通过大规模计算解决了诸如围棋和蛋白质折叠等问题。

Q & A

  • 杰弗里·辛顿(Jeff Hinton)在神经网络领域有哪些贡献?

    -杰弗里·辛顿是深度学习和神经网络领域的先驱之一。他提出了反向传播(Backpropagation)算法,这是一种用于训练神经网络的监督学习技术。此外,他还对玻尔兹曼机(Boltzmann Machine)进行了研究,这是一种随机递归神经网络,能够学习数据的概率分布。辛顿的工作对理解人类认知和推理有深远的影响。

  • 什么是反向传播算法,它在神经网络中的作用是什么?

    -反向传播算法是一种用于训练多层神经网络的监督学习技术。它通过计算损失函数关于网络参数的梯度,并利用这些梯度来更新网络的权重。反向传播包括两个过程:正向传播和误差反向传播。在正向传播中,输入数据通过网络传播以产生输出;在误差反向传播中,计算输出和目标值之间的误差,并将这个误差信号反向传播回网络,以调整权重,减少误差。

  • 玻尔兹曼机是如何与布尔逻辑和概率理论相结合的?

    -玻尔兹曼机是一种随机神经网络,它结合了布尔逻辑和概率理论。布尔逻辑是关于推理和命题逻辑的数学理论,而概率理论处理不确定性和随机性。在玻尔兹曼机中,神经元的状态是随机确定的,反映了概率性,而网络的学习过程则涉及到逻辑和推理,因为它试图找到数据的概率分布。

  • 杰弗里·辛顿在1981年的两篇论文中提出了哪些前瞻性的想法?

    -在1981年,杰弗里·辛顿发表了两篇具有前瞻性的论文。一篇是关于变换器(transformers)的概念,这是一种在自然语言处理中非常重要的模型,用于处理序列数据。另一篇是关于语义认知的早期论文,这篇论文启发了对认知的语义方面的研究。

  • 在深度学习中,为什么需要定义一个目标函数?

    -在深度学习中,定义一个目标函数是为了量化模型的性能,即模型输出与实际值之间的差异。这个函数通常被称为损失函数或误差函数。通过最小化目标函数,可以调整网络的权重,使得模型的预测尽可能接近于真实数据,从而提高模型的准确性。

  • 梯度下降法在神经网络训练中的作用是什么?

    -梯度下降法是一种优化算法,用于最小化神经网络的损失函数。通过计算损失函数相对于网络权重的梯度,梯度下降法逐步调整权重,以减少预测误差。这是训练神经网络中的一个关键步骤,因为它指导了网络如何通过学习数据来改进其性能。

  • 杰弗里·辛顿的研究对理解人类认知有哪些影响?

    -杰弗里·辛顿的研究对理解人类认知有重要影响。他的工作不仅推动了神经网络和深度学习领域的发展,而且提供了关于大脑如何处理信息、如何进行推理和决策的新见解。特别是,他对玻尔兹曼机的研究,将逻辑和概率理论结合起来,为理解人类大脑的工作方式提供了新的视角。

  • 为什么说反向传播算法是深度学习中的关键技术?

    -反向传播算法是深度学习中的关键技术,因为它允许对多层神经网络进行有效的训练。在多层网络中,直接计算从输入到输出所有层的权重变化是非常复杂的。反向传播算法通过利用链式法则,从输出层开始,逐层反向计算误差梯度,从而简化了这一过程,使得训练复杂的深层网络成为可能。

  • 在深度学习中,为什么需要考虑计算的规模?

    -在深度学习中,计算的规模至关重要,因为模型的性能往往随着模型大小和训练数据量的增加而提高。较大的模型可以捕捉更复杂的特征,而更多的数据可以提供更丰富的信息,帮助模型更好地泛化。此外,大规模计算还可以利用并行处理和分布式计算的优势,加快训练过程。

  • 杰弗里·辛顿的哪些思维方式对您产生了影响?

    -杰弗里·辛顿的直观和视觉化的思维方式对提问者产生了深远的影响。辛顿倾向于通过图形和几何隐喻来解释深度学习的概念,而不是仅仅依赖于数学方程。这种直观的解释方式有助于更深刻地理解复杂的概念,并且使得这些概念更加易于理解和记忆。

  • 如何理解杰弗里·辛顿在1977年未发表的论文中描述的神经网络进行递归计算的想法?

    -在1977年的未发表的论文中,杰弗里·辛顿提出了一个关于神经网络进行递归计算的想法。这个想法涉及到在调用子程序时保存当前状态,并在子程序执行完毕后恢复这一状态,以便继续之前的计算。辛顿提出,可以通过快速调整连接权重来保存和恢复状态,这种方法后来成为了深度学习中处理序列数据的重要概念。

  • 为什么说深度学习的发展与跨学科合作有关?

    -深度学习的发展与跨学科合作密切相关,因为这个领域结合了计算机科学、神经科学、认知科学、数学和物理学等多个学科的理论和方法。跨学科合作促进了不同领域知识的融合,推动了深度学习算法和模型的创新。例如,玻尔兹曼机的提出就是逻辑、概率理论和物理学结合的产物。

Outlines

00:00

🔬 神经网络与反向传播的起源

第一段主要讲述了科学家杰夫·辛顿(Jeff Hinton)与反向传播算法的关联。辛顿在1980年代回到圣地亚哥进行学术休假,并与他人共同创立了PDP(Parallel Distributed Processing)研究小组,专注于神经网络的学习机制。辛顿提出了一种新的方法论,即通过定义问题并调整连接权重来解决问题,而不是单纯从生物学中寻找灵感。这种方法最终被称为梯度下降,其中罗梅尔哈特(Rummelhart)和辛顿共同推广了这一算法,即反向传播算法,它通过从输出层反向传播误差信号来调整隐藏层的连接权重。

05:01

🤖 辛顿的直觉式思考与算法创新

第二段深入讨论了辛顿的思考方式和他对人工智能领域的贡献。辛顿以其直觉式思考和创新算法而闻名,其中包括1981年关于变换器的构想和一篇早期的语义认知论文。辛顿在1977年未发表的论文中提出了一种神经网络进行递归计算的方法,这种方法通过快速改变连接权重来保存和恢复计算状态。此外,辛顿的思考方式对人类数学认知的理解产生了深远影响,他倾向于用图形和直观的比喻来解释深度学习的原理,而不是依赖复杂的方程式。

10:02

🧠 辛顿对计算智能的贡献

第三段探讨了辛顿对计算智能领域的贡献,特别是他在早期至中期80年代对玻尔兹曼机的研究。辛顿试图将布尔逻辑传统与连续概率、分级约束满足领域相结合,以探索推理的本质。辛顿的工作不仅连接了逻辑和理论物理,还启发了对认知和智能更深层次的理解。此外,辛顿的思考方式和对计算智能的贡献被认为是跨学科研究的典范,他的工作推动了对智能本质的探索,超越了人类自身的生物学限制。

15:04

🌟 计算智能的未来与人类智能的扩展

第四段讨论了计算智能的未来,以及如何通过构建不受生物学限制的计算智能来扩展人类智能。提到了谷歌大脑、OpenAI和DeepMind等机构在大规模计算方面的成就,如在围棋游戏和蛋白质折叠问题上的突破。尽管目前人工智能在神经元和突触的数量上还未超过人脑,但它们的发展速度非常快,正在迅速追赶。这一段强调了计算智能的潜力,以及它可能带来的对人类智能的增强和扩展。

Mindmap

Keywords

💡反向传播

反向传播(Backpropagation)是一种在神经网络中用于训练模型的算法。它通过计算损失函数关于网络参数的梯度,并利用这些梯度来更新网络的权重。在视频中,反向传播算法被称为“generalized delta rule”,是梯度下降法的一种扩展,用于多层神经网络的训练。

💡梯度下降

梯度下降是一种优化算法,用于最小化目标函数。在深度学习中,它用于调整神经网络的权重,以减少预测与实际值之间的误差。视频中提到,梯度下降在连续神经网络中已经被理解,并且与反向传播算法紧密相关。

💡杰弗里·辛顿(Jeff Hinton)

杰弗里·辛顿是人工智能领域的先驱之一,以其在神经网络和深度学习方面的贡献而闻名。视频中提到,辛顿在1981年发表了关于变换器(transformers)和语义认知的开创性论文,对深度学习的发展产生了深远影响。

💡玻尔兹曼机

玻尔兹曼机是一种随机神经网络模型,能够学习输入数据的概率分布。它结合了逻辑和统计力学的概念,用于模拟物理系统的热动力学行为。视频中提到,辛顿在早期到中期80年代对玻尔兹曼机的研究,是连接布尔逻辑传统和概率、连续约束满足问题的一种方式。

💡深度学习

深度学习是机器学习的一个分支,它使用多层神经网络来模拟人类学习过程,解决复杂的模式识别问题。视频中讨论了深度学习在图像分类、语言处理等方面的应用,并提到了梯度下降和反向传播在深度学习中的重要性。

💡PDP研究小组

PDP研究小组是由杰弗里·辛顿、詹姆斯·马丁和大卫·鲁梅尔哈特等人组成的研究团队,他们共同推动了并行分布处理(Parallel Distributed Processing)理论的发展。视频中提到,这个小组的成立标志着深度学习和神经网络研究的一个重要里程碑。

💡激活模型

激活模型是神经网络中用于处理信息的模型,它决定了神经网络如何处理输入信号并产生输出。视频中提到的交互激活模型(interactive activation model)是PDP研究小组早期工作的一部分,展示了神经网络如何处理复杂的认知任务。

💡

💡损失函数

损失函数(Loss Function)是衡量神经网络预测值与真实值之间差异的函数,用于指导网络权重的调整。在视频中,损失函数与梯度下降和反向传播算法紧密相关,通过最小化损失函数来优化神经网络的性能。

💡目标函数

目标函数(Objective Function)是优化问题中希望最大化或最小化的函数。在深度学习中,目标函数通常是损失函数,用于指导网络学习。视频中提到,辛顿强调了定义目标函数并调整连接权重以解决特定问题的重要性。

💡计算智能

计算智能(Computational Intelligence)是一个广泛的研究领域,涵盖了机器学习、人工智能、神经网络等,旨在模拟或扩展人类智能。视频中提到,这个领域不仅关注人类智能,还探索超越人类认知限制的可能性。

💡递归计算

递归计算是一种在程序中重复调用同一过程或函数的技术,用于解决需要重复处理数据的问题。视频中提到,辛顿在1977年的一篇未发表的论文中描述了神经网络如何进行递归计算,这是他对深度学习和认知计算贡献的一部分。

Highlights

杰夫·辛顿(Jeff Hinton)作为博士后参与了会议的组织,后来成为助理教授,并有机会回到圣地亚哥进行学术休假。

辛顿、赫特(Heart)和作者三人一起创立了PDP研究小组,专注于神经网络的研究。

辛顿提出应该从问题解决的角度出发,而不是仅仅依赖生物学启发来调整连接权重。

辛顿、赫特和作者开始思考如何通过定义目标函数和梯度下降来最小化误差或损失。

梯度下降在连续神经网络中的概念已经被工程学界所熟悉,其中delta规则由斯坦福大学的伍德罗·伯尼(Woodrow Bernie)和霍夫(Hoff)发明。

罗梅尔哈特(Rommelhart)推广了delta规则,使其能够改变隐藏层单元之间的连接。

辛顿当时认为推广的delta规则可能效果不佳,并有自己的替代算法。

最终,推广的delta算法被命名为反向传播(Back Propagation),因为它通过从输出层向隐藏层反向传播误差信号来调整权重。

辛顿引入了定义目标函数和梯度下降的概念,从而优化和计算连接权重。

辛顿在1981年发表了两篇开创性的论文,一篇关于变换器(transformers)的概念,另一篇关于语义认知的早期研究。

辛顿在1977年未发表的一篇论文中描述了神经网络如何进行递归计算。

辛顿的思考方式对作者理解人类数学认知产生了深远影响,他倾向于用直观的图示而非方程式来解释深度学习的原理。

辛顿的思考方式与费曼(Feynman)类似,结合了视觉和直观的元素,对工程领域的人如吉姆·凯勒(Jim Keller)产生了影响。

辛顿希望贡献于推理本身的理解,而不仅仅是人类推理,他将布尔逻辑传统与连续概率、分级约束满足领域相结合。

辛顿在早期到中期80年代关于玻尔兹曼机的工作是他连接布尔传统的方式,也是将逻辑与理论物理相结合的产物。

作者认为人类是智能的一种表现形式,而智能的认知机制比人类本身更为宏大。

作者提出了“计算智能”(computational intelligence)一词,用以描述像辛顿和DeepMind等研究人员所工作的领域。

深度学习领域的兴奋之处在于,我们可能能够构建超越人类神经系统限制的计算智能,从而扩展人类智能的机制。

谷歌大脑(Google Brain)、OpenAI和DeepMind等正在进行的大规模计算展示了AI在围棋游戏和蛋白质折叠问题解决方面取得的进展。

Transcripts

play00:03

but just to say something more about

play00:06

the scientist and and the back

play00:08

propagation idea that you mentioned um

play00:12

so

play00:13

in in nineteen

play00:20

hinton had been there as a postdoc and

play00:23

organized that conference he'd actually

play00:25

gone away and gotten an assistant

play00:26

professorship and then

play00:29

um there was this opportunity to bring

play00:30

him back so jeff hinton was back

play00:34

on a sabbatical san diego in san diego

play00:38

and uh remember heart and i had decided

play00:40

we wanted to do this

play00:42

you know we thought it was really

play00:44

exciting and

play00:46

um our the papers on the interactive

play00:48

activation model that i was telling you

play00:50

about had just been published and we

play00:52

both sort of saw a huge potential for

play00:54

this work and

play00:55

and and jeff was there and so the three

play00:58

of us

play00:59

uh

play01:00

started a research group which we called

play01:02

the pdp research group

play01:04

and uh several other people

play01:07

came um francis crick who was at the

play01:10

sock institute heard about it from jeff

play01:14

um and uh because jeff was known among

play01:16

brits to be brilliant and francis was

play01:18

well connected with his british connect

play01:20

friends so

play01:21

francis crick came and a heck of a group

play01:24

of people wow okay and uh

play01:26

uh several as paul spalensky um was one

play01:30

of the other postdocs he was still there

play01:31

as a postdoc

play01:32

and um

play01:34

a few other people but anyway

play01:40

jeff

play01:43

talk to us about

play01:45

learning

play01:46

and

play01:47

how we should think about

play01:52

how

play01:53

you know learning occurs in a neural

play01:56

network and he said

play01:58

the problem

play02:00

with the way you guys have been

play02:02

approaching this is that you've been

play02:04

looking for inspiration from biology

play02:07

to tell you how

play02:09

what the rules should be for how the

play02:11

synapses should change the strengths of

play02:13

their connections how the connections

play02:15

should form

play02:18

he said that's the wrong way to go about

play02:20

it what you should do is you should

play02:24

think in terms of

play02:27

how you can

play02:29

adjust connection weights

play02:31

to solve

play02:33

a problem

play02:34

so you define your problem

play02:37

and then you figure out

play02:40

how the adjustment of the connection

play02:41

weights will solve the problem

play02:45

and

play02:46

removal heart heard that

play02:48

and

play02:49

said to himself okay so i'm going to

play02:52

start thinking about it that way i'm

play02:54

going to

play02:57

essentially

play02:59

imagine that i have some objective

play03:00

function

play03:01

some goal of the computation i want my

play03:05

machine to correctly classify all of

play03:07

these images

play03:10

and i can score that i can measure how

play03:12

well they're doing on each image and i

play03:14

get some measure of law error or loss

play03:17

it's typically called in deep learning

play03:20

and um

play03:22

i'm going to figure out how to adjust

play03:24

the connection weights so as to minimize

play03:27

my loss or reduce the error

play03:31

uh and

play03:33

that's called

play03:34

you know gradient descent

play03:37

and uh

play03:38

engineers were already uh familiar with

play03:40

the concept of gradient descent

play03:43

uh and in fact um

play03:45

there was an algorithm

play03:47

called the delta rule

play03:49

that had been invented

play03:51

by

play03:53

a professor in the engineering

play03:56

electoral engineering department at

play03:57

stanford

play03:59

woodrow bernie woodrow and a

play04:00

collaborator named hoff i don't never

play04:02

met him anyway so

play04:04

so

play04:05

gradient descent in continuous

play04:08

neural networks with multiple

play04:10

neuron-like processing units was already

play04:12

understood

play04:17

for a single layer of connection weights

play04:19

we have some inputs over a set of

play04:21

neurons we want the output to produce a

play04:24

certain pattern

play04:25

we can define the difference between our

play04:28

target and what the narrow network is

play04:30

producing and we can figure out how to

play04:32

change the connection weights to reduce

play04:33

that error

play04:34

so what rommelhard did was to generalize

play04:38

that

play04:39

so as to be able to change the

play04:41

connections from earlier layers of units

play04:43

to the ones

play04:45

at a hidden layer between the input and

play04:47

the output

play04:48

and

play04:49

so he first called the algorithm the

play04:52

generalized delta rule because it's just

play04:54

an extension of the gradient descent

play04:57

idea

play04:58

and interestingly enough

play05:00

hinton was thinking that this wasn't

play05:03

going to work very well

play05:05

so hinton had his own alternative

play05:08

algorithm at the time

play05:11

based on uh the concept of the balsa

play05:13

machine that he was pursuing so the

play05:15

paper on the balsa machine came out in

play05:17

learning in bolster machines came out in

play05:19

1985

play05:21

but it turned out that

play05:23

backprop worked better than the bolster

play05:26

machine learning algorithm

play05:28

so this generalized delta algorithm

play05:30

ended up being called back propagation

play05:33

as you say back prop yeah

play05:35

and

play05:37

the you know probably that name is

play05:40

opaque to maybe what what does that mean

play05:44

what it what it meant was that in order

play05:46

to figure out what the changes you

play05:49

needed to make to the

play05:50

connections from the input to the hidden

play05:52

layer

play05:54

you had to

play05:55

back propagate the error signals

play05:59

from the output layer through the

play06:01

connections

play06:02

from the hidden layer to the

play06:04

output to get the signals

play06:07

that would be the error signals for the

play06:09

hidden layer

play06:10

and that's how rimmel hard formulated it

play06:12

was like well we know what the air

play06:14

signals are at the output layer let's

play06:16

see if we can get a signal at the hidden

play06:18

layer that tells each hidden unit what

play06:20

its air signal is essentially so it's

play06:23

back propagating through the connections

play06:27

from the hidden to the output to get the

play06:29

signals to tell the hidden units how to

play06:31

change their weights from the input and

play06:33

that's why it's called back problems

play06:37

yeah but uh so it

play06:39

came from hinton having

play06:42

introduced the concept of you know

play06:44

define your objective function figure

play06:47

out how to

play06:48

take the derivative so that you can um

play06:51

adjust the connections so that they make

play06:53

progress towards your goal so stop

play06:55

thinking about biology for a second and

play06:57

let's start to think about optimization

play06:59

and computation yeah a little bit more

play07:02

so

play07:03

what about

play07:04

jeff hinton

play07:05

what um

play07:07

you've gotten a chance to work with him

play07:09

in that little

play07:11

the set of people involved there is

play07:14

quite incredible the small set of people

play07:15

under the the pdp flag

play07:18

it's just given the amount of impact

play07:21

those ideas have had over the years it's

play07:23

kind of incredible to think about but

play07:25

you know

play07:26

just like you said uh like yourself

play07:28

jeffrey hinton is seen as one of the

play07:31

not just like a seminal figure in ai but

play07:33

just a brilliant person just a like the

play07:36

horsepower of the mind is pretty high up

play07:38

there for him because he's

play07:41

just a great thinker so what kind of

play07:43

ideas have you

play07:45

um

play07:46

learned from him have you influenced

play07:49

each other on have you debated over what

play07:51

stands out to you

play07:52

in in the in the full space of ideas

play07:55

here

play07:56

at the intersection of computation and

play07:57

cognition

play07:59

well

play08:00

so um

play08:02

jeff has said many things to me that had

play08:04

a profound impact on my thinking

play08:08

um and he's written several articles

play08:10

which

play08:13

were way ahead of their time

play08:17

he

play08:21

he had two papers in 1981 just to give

play08:24

one example

play08:28

one of which was essentially the idea of

play08:31

transformers

play08:33

and another of which

play08:36

was a

play08:37

early paper on semantic cognition which

play08:41

inspired

play08:45

him and rommel hart and me

play08:47

[Music]

play08:49

throughout

play08:50

the 80s and

play08:52

uh um

play08:55

you know still uh i think sort of

play08:59

grounds my own thinking about

play09:02

um the semantic aspects of of cognition

play09:06

um

play09:07

he also

play09:10

uh in a in a small paper that was never

play09:13

published that he wrote in 1977 you know

play09:15

before he actually arrived at ucsd or

play09:17

maybe a couple of years even before that

play09:19

i don't know

play09:20

uh when he was a phd student he he um

play09:24

described how a neural network could

play09:27

uh do recursive computation

play09:30

and um

play09:33

uh

play09:34

it was a very clever idea that he's

play09:36

continued to explore over time which was

play09:39

sort of the idea that

play09:41

um

play09:44

when you when you call a subroutine you

play09:46

need to save the state that you had

play09:50

when you called it so you can get back

play09:52

to where you were when you're finished

play09:53

with the subroutine and and the idea was

play09:57

that you would save the state

play09:59

of the calling routine by making fast

play10:02

changes to connection weights

play10:04

and then

play10:05

when you

play10:08

finished with the subroutine call those

play10:10

fast changes in the connection weights

play10:12

would allow you to go back to where you

play10:13

had been before

play10:15

and reinstate the previous context so

play10:17

that you could continue on with the

play10:20

the

play10:20

the top level of the computation

play10:23

anyway that was part of the idea and um

play10:26

i always thought okay that's really you

play10:28

know he just

play10:29

he had extremely creative ideas that

play10:32

were

play10:33

quite a lot ahead of his time and

play10:35

many of them in the 1970s and early

play10:38

early 1980s

play10:42

so

play10:44

another thing about jeff hinton's way of

play10:47

thinking which um has profoundly

play10:50

influenced my

play10:52

um

play10:54

uh effort to understand human

play10:56

mathematical cognition

play10:57

is

play11:00

that he doesn't write too many equations

play11:03

and people tell stories like oh in

play11:06

in the hints and lab meetings you don't

play11:08

get up at the board and write equations

play11:10

like you do in everybody else's machine

play11:11

learning lab

play11:13

what you do is you draw a picture

play11:17

and and you know he he explains

play11:21

aspects of the way deep learning works

play11:24

by

play11:25

putting his hands together and showing

play11:27

you the shape of a ravine

play11:29

and um

play11:31

using that as a geometrical metaphor for

play11:34

the

play11:35

what's happening as this gradient

play11:37

descent process you're coming down the

play11:39

wall of a ravine if you take too big a

play11:41

jump you're gonna jump to the other side

play11:43

and um so that's why we have to turn

play11:46

down the learning rate uh for example um

play11:49

and

play11:50

it it

play11:52

speaks to me of the

play11:55

fundamentally intuitive character of

play12:00

deep insight

play12:02

together with

play12:05

a commitment to really understanding

play12:09

um

play12:12

in a way that's

play12:14

absolutely ultimately explicit and clear

play12:18

uh

play12:19

but also intuitive

play12:21

yeah there's certain people like that

play12:23

here's an example

play12:24

some kind of weird mix of uh visual

play12:28

and intuitive and all those kinds of

play12:30

things feynman is another example

play12:31

different style thinking but very unique

play12:34

and when you when you're around those

play12:35

people for me in the engineering realm

play12:38

uh there's a guy named jim keller

play12:40

who's a chip designer engineer

play12:43

every time i talk to him

play12:46

it doesn't matter what we're talking

play12:47

about just

play12:48

having experienced that unique way of

play12:51

thinking transforms you and makes your

play12:53

work much better

play12:55

and that's that's the magic you look at

play12:57

daniel kahneman you look at the great

play12:59

collaborations throughout the history of

play13:01

science

play13:02

that's the magic of that it's not always

play13:04

the exact ideas that you talk about

play13:06

but it's the process of generating those

play13:08

ideas being around that spending time

play13:11

with that human being you can come up

play13:13

with some brilliant work especially when

play13:15

it's cross-disciplinary as it was a

play13:17

little bit in your case yeah jeff

play13:20

yeah um

play13:22

jeff is uh a descendant of

play13:26

the logician bool

play13:28

he comes from a long line of

play13:31

english

play13:33

and

play13:34

together with the

play13:37

um

play13:38

deeply intuitive thinking ability that

play13:40

he has he also

play13:43

um

play13:44

has

play13:46

you know it's been clear he's he's

play13:48

described this to me um and i think he's

play13:50

mentioned it from time to time in other

play13:53

interviews with that he's had with

play13:55

people um you know he's

play13:57

he's wanted to be able to sort of think

play14:00

of himself as contributing to the

play14:04

to the

play14:05

understanding of

play14:09

reasoning itself not just human

play14:12

reasoning like bull like is about logic

play14:15

right it's about

play14:16

what can we conclude from what else and

play14:19

how do we formalize that and

play14:22

um as a computer scientist uh

play14:26

logician

play14:27

philosopher you know um the goal is to

play14:32

understand how we derive truths from

play14:35

other

play14:36

from givens and things like this and and

play14:39

the work that jeff was doing in the

play14:43

um early to mid 80s um

play14:47

on something called the boltzmann

play14:49

machine was uh his way of

play14:52

connecting with that boolean tradition

play14:54

and bringing it

play14:55

into the more continuous probabilistic

play14:58

graded constraint satisfaction realm

play15:01

um and it was

play15:04

it was um

play15:06

beautiful uh

play15:08

a set of ideas linked with theoretical

play15:10

physics um and

play15:13

um

play15:15

as well as with logic and um

play15:18

it it's always been i mean i've always

play15:21

been inspired by the bolson machine too

play15:23

it's it's like well if the neurons are

play15:25

probabilistic rather than

play15:27

you know deterministic in their

play15:29

computations then

play15:31

you know that that maybe this uh somehow

play15:33

is part of the

play15:36

um

play15:38

serendipity or you know

play15:40

adventitiousness of the moment of

play15:42

insight right

play15:44

the it it might not have occurred at

play15:46

that particular instant it might be sort

play15:48

of partially the result of a stochastic

play15:50

process

play15:51

and uh

play15:52

and and that too is part of the magic of

play15:54

the emergence of uh

play15:56

some of these things well you're right

play15:58

with the bullying lineage and the the

play16:00

dream of computer science

play16:02

is uh somehow

play16:04

i mean i certainly think of humans this

play16:06

way that humans are one particular

play16:08

manifestation of intelligence

play16:11

that there's something bigger going on

play16:13

and you're trying to you're hoping to

play16:14

figure that out

play16:16

the mechanisms of intelligence the

play16:17

mechanisms of cognition are much bigger

play16:19

than just humans yeah so i think of um

play16:23

i've i started using the phrase

play16:25

computational intelligence at some point

play16:28

as to characterize the

play16:30

the field that i thought you know people

play16:33

like jeff hinton

play16:35

um and many of the

play16:37

of the people i know at deepmind um

play16:41

are are working in and where i i feel

play16:44

like i'm

play16:46

um

play16:50

you know i'm a i'm a kind of a

play16:52

human-oriented

play16:54

computational intelligence researcher in

play16:56

that i'm actually kind of interested in

play16:59

the human solution

play17:00

but at the same time i

play17:03

uh

play17:04

i feel like

play17:06

that's that's where um a huge amount of

play17:09

the

play17:10

the excitement of deep learning actually

play17:13

lies is in the idea that

play17:17

you know we may be able to even go

play17:20

beyond what we can achieve with our own

play17:23

nervous systems when we

play17:25

build

play17:26

um computational intelligences that are

play17:30

um

play17:32

you know not limited in the ways that we

play17:34

are by our own biology perhaps allowing

play17:37

us to scale the very mechanisms of human

play17:41

intelligence just increase its power

play17:43

through scale

play17:45

yes

play17:46

and and i think that that you know

play17:48

obviously that's the

play17:53

that's being played out massively at

play17:56

google brain at open ai and to some

play17:59

extended deep mind as well um

play18:01

i guess i shouldn't say to some extent

play18:04

uh the the the massive scale of the um

play18:08

computations that uh

play18:10

are used to

play18:12

succeed at games like go or to solve the

play18:14

protein folding problems that they've

play18:16

been solving and so on still not as many

play18:19

uh synapses and neurons as the human

play18:21

brain so we still got

play18:24

we're still still beating them on that

play18:26

we humans are beating the ais but

play18:29

they're catching up pretty quickly

play18:50

you

Rate This

5.0 / 5 (0 votes)

Related Tags
深度学习认知科学神经网络学习算法反向传播梯度下降杰夫·辛顿计算智能人工智能认知机制布尔逻辑
Do you need a summary in English?