Geoffrey Hinton | On working with Ilya, choosing problems, and the power of intuition

Sana
20 May 202445:46

Summary

TLDR这段视频剧本记录了一位科学家对人工智能和大脑工作方式的深入探讨。他分享了自己在卡内基梅隆大学的经历,以及如何被那里学生的热情和对未来的信念所感染。他讨论了自己对神经科学和哲学的失望,以及最终转向人工智能领域的过程。他提到了Donald Hebb和John von Neumann的影响,以及与Terry Sejnowski和Peter Brown的合作。这位科学家还分享了他对直觉、人才选拔和大型语言模型的看法,以及它们如何能够超越人类的创造力。此外,他还谈到了使用GPU加速神经网络训练的早期想法,并对当前AI领域的研究方向提出了自己的见解。

Takeaways

  • 🤖 选择人才:对话中提到,选择人才有时是直觉性的,就像Ilia突然出现,被认为聪明而被选择一起工作。
  • 🧠 大脑与AI:讨论了早期对大脑工作原理的失望,以及转向哲学和人工智能(AI)的探索过程。
  • 📚 启发性书籍:提到了Donald Hebb和John von Neumann的书籍对他产生了影响,特别是关于神经网络和大脑计算方式的内容。
  • 🔗 神经网络与学习:强调了神经网络中权重变化的重要性,以及它是如何帮助机器执行复杂任务的。
  • 🤝 重要合作:提到了与Terry Sejnowski和Peter Brown的重要合作,以及从他们那里学到的知识和经验。
  • 🚀 创新与直觉:Ilia的直觉在研究中发挥了重要作用,他对于算法的直观理解推动了研究的进步。
  • 🛠️ 算法与数据规模:讨论了算法的重要性,同时也指出了数据规模和计算能力的增长对AI发展的影响。
  • 🗣️ 语言模型:解释了语言模型不仅仅是预测下一个词,而是需要理解整个语境。
  • 🎨 创造力与AI:展望了大型语言模型在创造力方面的潜力,以及它们如何通过发现不同领域的共通结构来编码信息。
  • 🔄 GPU在神经网络训练中的应用:分享了使用GPU进行神经网络训练的早期直觉和经历,以及这对领域的巨大推动作用。

Q & A

  • 在卡内基梅隆大学,学生们对工作的热情如何影响了你?

    -卡内基梅隆大学的学生们对工作充满热情,他们相信自己正在进行的工作将会改变计算机科学的未来。这种热情与作者在英格兰的经历形成了鲜明对比,给他带来了非常新鲜和振奋的感觉。

  • 为什么作者最初在剑桥学习生理学时感到失望?

    -作者在剑桥学习生理学时感到失望,因为他们只教授了神经元如何传递动作电位,这并没有解释大脑是如何工作的。这使得作者转而对哲学产生兴趣,希望能找到关于心智如何工作的答案。

  • 是什么启发了你对人工智能的兴趣?

    -作者对人工智能的兴趣受到了Donald Hebb的书籍的影响,Hebb对如何在神经网络中学习连接强度非常感兴趣。同时,John von Neumann的书籍也对他产生了影响,Neumann对大脑如何计算以及它与普通计算机的不同非常感兴趣。

  • 在Edinburgh时期,你是否认为大脑学习的方式是正确的?

    -在Edinburgh时期,作者认为大脑学习的方式肯定不是通过事先编程和使用逻辑推理规则。他认为必须找到大脑如何学习修改神经网络中的连接的方法,以便能够进行复杂操作。

  • 你如何描述与Terry Sejnowski的合作?

    -与Terry Sejnowski的合作非常愉快,尽管他在巴尔的摩的约翰霍普金斯大学,但他们每月会聚在一起一个周末,共同研究Boltzmann机器。他们坚信这是大脑的工作方式,并且这次合作产生了很多技术成果。

  • 与Peter Brown的合作给你带来了哪些收获?

    -与Peter Brown的合作非常有益,他是一位出色的统计学家,曾在IBM从事语音识别工作。他教给作者很多关于语音的知识,并且向作者介绍了隐马尔可夫模型,这让作者对隐藏层的命名有了灵感。

  • Ilia在你办公室的出现是如何的?

    -Ilia在一个周日出现在作者的办公室,他敲门的方式很紧急。Ilia表达了想要在作者的实验室工作的意愿,而不是去做饭。他们的对话以及随后的会议,让Ilia成为了作者的合作伙伴。

  • Ilia对梯度下降算法的直觉是什么?

    -Ilia不理解为什么人们不给梯度一个合理的函数优化器。这个直觉花了他们好几年时间去思考,但最终证明这是一个很好的直觉。

  • 你认为大规模数据和计算能力的增长对AI的发展有什么影响?

    -大规模数据和计算能力的提升使得AI模型能够更好地工作,而不需要新的想法。例如,Transformers模型就帮助了很多。但规模的增长也显示了数据和计算的重要性。

  • 为什么你认为预测下一个词不仅仅是预测那么简单?

    -预测下一个词意味着需要理解所提出的问题。这与旧式的自动完成不同,后者只是基于词频统计。要预测下一个词,模型需要理解上下文,这需要一定程度的推理。

  • 你认为大型语言模型如何能够学习广泛的领域?

    -大型语言模型通过寻找共同的结构来编码事物,这使得它们能够更有效地学习。通过发现不同事物之间的类比,它们能够压缩信息到权重中,从而产生创造力。

  • 你认为多模态模型将如何改变AI的未来?

    -多模态模型将使AI在理解空间方面变得更好,例如通过视觉和触觉来理解物体。这将使模型更容易学习,并且需要的语言更少,最终可能导致模型接管更多的领域。

  • 你认为语言和大脑的进化是如何相互影响的?

    -语言和大脑可能是相互进化的。一方面,语言可能已经进化到能够适应大脑的工作方式;另一方面,大脑也可能进化到了能够有效处理语言。

  • 你如何看待使用GPU来训练神经网络的早期直觉?

    -在2006年左右,作者的前研究生Rick Zisy建议使用图形处理卡,因为它们擅长进行矩阵乘法。这使得他们开始尝试使用GPU,并发现它们能够显著加快训练速度。

  • 你认为AI领域下一步的计算发展应该是什么?

    -作者在Google的最后几年思考了如何实现模拟计算,以减少能耗,使大型语言模型能够在模拟硬件上运行。尽管目前还没有成功,但作者认为这是一个值得探索的方向。

  • 你认为神经网络中的快速权重变化对模型有什么影响?

    -快速权重变化,或称为快速权重,是大脑中用于临时记忆的一种机制。虽然目前的神经网络模型没有使用快速权重,但作者认为这是未来需要学习的一个重要方面。

  • 你如何看待神经网络中的多时间尺度变化?

    -大脑中有多个时间尺度用于权重变化,而目前的神经网络只有快速活动变化和慢速权重变化两个时间尺度。作者认为我们需要更多时间尺度来更好地模拟大脑的学习过程。

  • 你认为神经网络能否模拟人类意识?

    -如果一个AI助手能够持续地与人类互动,并且具有自我反思的能力,那么它可能会产生类似人类意识的体验。作者认为,这些AI可以拥有感觉,就像人类一样。

  • 你认为神经网络能否发展出类似人类的感觉?

    -作者认为,如果我们将感觉理解为在没有约束的情况下我们会采取的行动,那么神经网络完全可以拥有类似的感觉。

  • 你认为神经网络能否模拟人类的推理过程?

    -神经网络通过训练可以发展出直觉,并通过推理来校正这些直觉。作者认为,随着模型规模的扩大,神经网络将能够更好地进行推理。

  • 你认为神经网络在哪些领域有最有前途的应用?

    -作者认为,医疗保健是一个有巨大潜力的领域,因为AI可以帮助提供更多的医疗服务。此外,新材料的开发和对人类身体的了解也将受到AI的显著影响。

  • 你如何看待AI在社会中可能带来的负面影响?

    -作者担心AI技术可能被用于不良目的,例如制造杀手机器人、操纵公众舆论或进行大规模监控。他认为这些潜在的负面影响是非常令人担忧的。

  • 你认为AI研究过程中的伦理问题应该如何处理?

    -作者认为,尽管放慢AI领域的发展速度可能有助于处理伦理问题,但由于这是一个国际性的竞争领域,放慢发展速度的可能性不大。因此,需要在发展的同时考虑伦理问题。

  • 你如何描述你选择人才的过程?

    -作者认为选择人才有时是直观的,例如当Ilia出现在他的办公室时,他很快就意识到这是一个聪明的家伙,值得一起工作。但有时也需要通过深入的对话和交流来评估一个人的潜力。

  • 你认为如何培养良好的直觉?

    -作者认为,培养良好直觉的方法之一是不接受你被告知的一切,而是尝试将新信息融入你自己的世界观框架中。如果你的框架很坚定,那么你可以信任你的直觉。

  • 你认为当前的AI研究是否过于集中在某些特定方向上?

    -作者认为,尽管有很多人在研究大型模型和多模态数据,但这是一个非常有前景的方向,值得深入研究。同时,也有许多人在尝试不同的方法,这是好事。

  • 你认为哪些学习算法是最重要的?

    -作者认为,反向传播是一种正确的事情,因为它涉及到梯度的获取,使得参数的改变能够使模型工作得更好。尽管可能有其他学习算法也能达到同样的效果,但反向传播已经被证明是非常成功的。

  • 回顾你的研究生涯,你最自豪的是什么?

    -作者最自豪的是开发Boltzmann机器的学习算法,尽管它在实践中可能并不完美,但它非常优雅,是他最享受开发的东西。

  • 你目前最关注的问题是什么?

    -作者目前最关注的问题包括大脑是否使用反向传播来获取梯度,以及是否存在其他获取梯度的方法,或者大脑是否使用了完全不同的技术。

Outlines

00:00

🤖 人工智能与神经科学:从剑桥到卡内基梅隆的探索之旅

在这段对话中,讲述了一位研究者从剑桥到卡内基梅隆的学术旅程,以及他对人工智能和神经科学的热情。他分享了自己对大脑工作原理的失望,以及转向哲学和人工智能领域的决定。他提到了Donald Hebb和John von Neumann的影响,以及早期对神经网络权重调整的思考。此外,还回忆了与Terry Sejnowski和Peter Brown的合作,以及他们对神经网络和隐藏马尔可夫模型的贡献。

05:04

👨‍💻 伊利娅的到来:直觉与合作的火花

这段对话讲述了一个名叫伊利娅的年轻人如何通过直觉和直接的方式加入了研究者的实验室。伊利娅对反向传播的理解提出了深刻的问题,这引出了他们之间富有成效的合作关系。他们一起工作,探索复杂的机器学习模型和数据映射技术。这段经历不仅展示了伊利娅的直觉和天赋,也反映了研究者对人才选拔的看法,即有时直觉足以发现潜力。

10:06

🧠 神经网络与语言模型:理解与创新的边界

这里讨论了神经网络语言模型的工作原理,以及它们如何超越简单的符号预测,实现更深层次的理解。研究者认为,通过预测下一个符号,模型被迫进行推理,这与人类理解语言的方式相似。他还强调了大规模数据和计算能力对模型性能的影响,以及他们如何通过训练模型预测字符来取得显著成果。此外,探讨了模型如何通过寻找不同领域的共同结构来提高效率和创造力。

15:07

🔮 未来展望:人工智能的创造力与跨领域应用

研究者分享了他对人工智能未来的看法,特别是关于其在不同领域的应用和创新潜力。他认为,大型语言模型通过发现不同事物之间的共同点,可以提高效率并促进创造性思维。他还讨论了人工智能在围棋等特定领域超越现有知识的能力,以及多模态模型如何增强对空间和物体的理解,从而提高推理能力。

20:08

🚀 计算能力与硬件发展:从GPU到模拟计算的探索

在这部分中,研究者回顾了他关于使用GPU加速神经网络训练的早期直觉,以及这一决策如何与Nvidia的发展历程相吻合。他还探讨了未来计算的方向,包括尝试开发模拟计算硬件,以实现与大脑类似的低功耗运行。此外,他比较了数字系统和生物大脑在知识共享和权重存储方面的不同。

25:10

🌐 神经科学与人工智能:跨学科的启示

研究者分享了他从神经科学中获得的洞见,以及这些知识如何影响他对人工智能的看法。他讨论了大脑中不同时间尺度的权重变化,以及这与当前神经网络模型的对比。他认为,引入快速权重变化(即临时记忆)是神经网络模型需要学习的重要方面。此外,他还提到了图计算等新型硬件对在线学习可能带来的影响。

30:11

🧐 直觉与创新:科学探索中的怀疑与发现

这段对话强调了怀疑精神在科学探索中的重要性。研究者分享了他如何选择研究问题,即寻找那些被普遍接受但实际上可能存在问题的观点。他通过编写简单的计算机程序来验证这些观点,并深入思考其背后的原理。此外,他还提到了他对Boltzmann机器的看法,以及他如何从直觉出发选择研究问题。

35:11

🛠️ 人工智能的社会责任与未来应用

研究者讨论了人工智能对社会的潜在影响,包括在医疗保健、新材料工程和理解人体方面的积极应用。他也表达了对人工智能被用于不良目的的担忧,如制造杀手机器人、操纵公众舆论或进行大规模监控。同时,他承认尽管存在风险,但人工智能的发展不太可能放缓,因为这是一个国际性的竞争领域。

40:11

🤝 人才选拔与团队建设:直觉与多样性的重要性

这段对话中,研究者分享了他关于选拔人才的经验和直觉。他认为,有时候通过简短的交谈就能感受到某人的聪明才智和直觉。他还强调了团队中需要不同类型的人才,包括技术能力强和创造力强的学生。他提倡信任自己的直觉,并认识到不是所有优秀的直觉都是显而易见的。

45:12

🎓 学术传承与个人成就:反思与展望

在这部分中,研究者反思了他一生中最自豪的成就,即开发Boltzmann机器的学习算法。尽管这些算法在实践中可能并不完美,但他对其优雅性表示赞赏。此外,他还分享了目前他花费时间思考的问题,以及他对研究和人才选拔的看法。

Mindmap

Keywords

💡人工智能

人工智能(AI)是指由人造系统所表现出来的智能,它能够执行通常需要人类智能才能完成的复杂任务。在视频中,人工智能是整个讨论的核心,讨论了如何通过神经网络和机器学习算法来模拟人脑的工作方式,以及人工智能在理解语言、进行预测和推理方面的应用。

💡神经网络

神经网络是一种受人脑结构启发的计算模型,由大量的节点(或称为神经元)相互连接构成。视频中提到,神经网络通过调整神经元之间的连接强度来进行学习,这是人工智能领域的关键技术之一。

💡梯度下降

梯度下降是一种优化算法,用于最小化一个函数,通常在机器学习和人工智能中用于最小化损失函数,从而找到模型参数的最佳值。视频中提到了梯度下降在训练大型神经网络中的重要性。

💡深度学习

深度学习是机器学习的一个子领域,它使用多层神经网络来模拟更加复杂的数据表示。视频中讨论了深度学习如何使得机器能够进行语言理解、图像识别等任务。

💡语言模型

语言模型是自然语言处理中的一个概念,用于预测文本序列中单词的出现概率。视频中提到了语言模型如何通过预测下一个符号或单词来理解语言,这是人工智能中的一个重要应用。

💡多模态学习

多模态学习是指系统能够处理并整合来自不同数据源(如文本、图像、声音等)的信息。视频中提到了多模态学习的重要性,以及它如何帮助人工智能模型更好地理解世界。

💡隐马尔可夫模型

隐马尔可夫模型(HMM)是一种统计模型,它用来描述一个包含隐藏未知参数的马尔可夫过程。在视频中,隐马尔可夫模型被提及为一种帮助理解语音识别和自然语言处理的工具。

💡特征向量

特征向量是指在数学和计算机科学中用于表示对象或数据点的向量。在视频中,特征向量被用来描述如何将语言中的单词转换为可以由神经网络处理的数值形式。

💡认知科学

认知科学是研究人类认知和思维过程的跨学科领域。视频中提到了认知科学与人工智能的联系,特别是在理解人脑如何工作以及如何模拟这些过程方面。

💡创新

创新是指创造新的或改进现有的方法、产品或解决方案。视频中讨论了人工智能如何通过创新算法和模型来推动技术进步,以及这些创新如何可能超越当前的科学理解。

Highlights

在卡内基梅隆大学实验室的周六晚上,学生们都在为计算机科学未来而工作,这与作者在英格兰的经历截然不同。

作者对大脑的理解感到失望,转而对哲学和人工智能产生兴趣。

Donald Hebb和John von Neumann的书籍对作者的AI研究产生了重大影响。

作者在爱丁堡学习AI时,相信大脑学习的方式不是通过逻辑规则的推理。

作者与Terry Sejnowski的合作是令人兴奋的研究经历,尽管后来认为那不是大脑的工作方式。

Peter Brown向作者介绍了隐马尔可夫模型,对作者的研究产生了重要影响。

Ilia的到来和对反向传播的质疑推动了深度学习算法的发展。

Ilia的直觉和独立思考对他在AI领域的成功至关重要。

作者认为,通过预测下一个符号,模型被迫进行理解,这与人类的理解方式相似。

大规模数据和计算能力的提升,使得简单的算法也能取得显著的效果。

使用字符级预测的模型在理解文本方面表现出色,超出了作者的预期。

作者认为,大型语言模型通过寻找共同结构来编码信息,这使得它们能够创造性地进行工作。

AlphaGo的自我对弈是其能够进行创造性走法的部分原因。

多模态模型的发展将使AI在空间理解方面更加出色。

语言和认知之间的关系是复杂的,可能大脑和语言共同进化。

使用GPU进行神经网络训练是作者早期的创新之一,对领域产生了深远影响。

作者认为,大脑可能使用临时权重变化来进行快速学习和记忆。

作者认为,大型随机神经网络仅通过数据学习复杂任务是可能的,这与之前的观点相反。

作者对Boltzmann机器的学习算法感到自豪,尽管它在实践中可能并不成功。

作者建议学生研究大脑是否使用反向传播或其他技术来获取梯度。

作者认为,尽管存在被滥用的风险,AI在医疗保健等领域的应用前景广阔。

Transcripts

play00:00

have

play00:00

you reflected a lot on how to select

play00:04

Talent or has that mostly been like

play00:07

intuitive to you Ilia just shows up and

play00:09

you're like this is a clever guy let's

play00:11

let's work together or have you thought

play00:13

a lot about that can we are we recording

play00:15

should we should we roll This yeah let's

play00:18

roll this okay we're good yeah

play00:20

yeah

play00:24

okay s is working

play00:30

so I remember when I first got to K

play00:32

melon from England in England at a

play00:34

Research Unit it would get to be 6:00

play00:36

and you'd all go for a drink in the pub

play00:39

um at Caril melon I remember after I've

play00:41

been there a few weeks it was Saturday

play00:43

night I didn't have any friends yet and

play00:46

I didn't know what to do so I decided

play00:47

I'd go into the lab and do some

play00:48

programming because I had a list machine

play00:50

and you couldn't program it from home so

play00:52

I went into the lab at about 9:00 on a

play00:53

Saturday night and it was swarming all

play00:57

the students were there and they were

play00:59

all there because what they were working

play01:01

on was the future they all believed that

play01:03

what they did next was going to change

play01:05

the course of computer science and it

play01:07

was just so different from England and

play01:09

so that was very refreshing take me back

play01:12

to the very beginning Jeff at Cambridge

play01:16

uh trying to understand the brain uh

play01:18

what was that like it was very

play01:21

disappointing so I did physiology and in

play01:24

the summer term they were going to teach

play01:25

us how the brain worked and it all they

play01:27

taught us was how neurons conduct action

play01:30

potentials which is very interesting but

play01:32

it doesn't tell you how the brain works

play01:34

so that was extremely disappointing I

play01:36

switched to philosophy then I thought

play01:38

maybe they'd tell us how the mind worked

play01:40

um that was very disappointing I

play01:42

eventually ended up going to Edinburgh

play01:43

to do Ai and that was more interesting

play01:46

at least you could simulate things so

play01:48

you could test out theories and did you

play01:50

remember what intrigued you about AI was

play01:53

it a paper was it any particular person

play01:56

that exposed you to those ideas I guess

play01:59

it was a book I read by Donald Hebb that

play02:01

influenced me a lot um he was very

play02:05

interested in how you learn the

play02:07

connection strengths in neural Nets I

play02:09

also read a book by John Fon noyman

play02:11

early on um who was very interested in

play02:15

how the brain computes and how it's

play02:16

different from normal computers and did

play02:19

you get that conviction that this ideas

play02:22

would work out at at that point or what

play02:25

would was your intuition back at the

play02:27

Edinburgh days it seemed to me there has

play02:31

to be a way that the brain

play02:33

learns and it's clearly not by having

play02:36

all sorts of things programmed into it

play02:39

and then using logical rules of

play02:40

inference that just seemed to me crazy

play02:42

from the outset um so we had to figure

play02:46

out how the brain learned to modify

play02:49

Connections in a neural net so that it

play02:50

could do complicated things and Fon

play02:53

Norman believed that churing believed

play02:55

that so Forman and churing were both

play02:57

pretty good at logic but they didn't

play02:58

believe in this logical approach and

play03:01

what was your split between studying the

play03:03

ideas from from

play03:05

neuroscience and just doing what seemed

play03:08

to be good algorithms for for AI how

play03:11

much inspiration did you take early on

play03:13

so I never did that much study of

play03:15

Neuroscience I was always inspired by

play03:17

what I'd learned about how the brain

play03:19

works that there's a bunch of neurons

play03:21

they perform relatively simple

play03:23

operations they're nonlinear um but they

play03:26

collect inputs they wait them and then

play03:29

they an output that depends on that

play03:31

weighted input and the question is how

play03:33

do you change those weights to make the

play03:34

whole thing do something good it seems

play03:36

like a fairly simple question what

play03:38

collaborations do you remember from from

play03:41

that time the main collaboration I had

play03:43

at Carnegie melon was with someone who

play03:45

wasn't at carnegy melon I was

play03:47

interacting a lot with Terry sinowski

play03:48

who was in Baltimore at John's Hopkins

play03:51

and about once a month either he would

play03:53

drive to Pittsburg or I drive to

play03:54

Baltimore it's 250 miles away and we

play03:57

would spend a weekend together working

play03:58

on boltimore machines that was a

play04:00

wonderful collaboration we were both

play04:01

convinced it was how the brain worked

play04:03

that was the most exciting research I've

play04:05

ever done and a lot of technical results

play04:07

came out that were very interesting but

play04:09

I think it's not how the brain works um

play04:11

I also had a very good collaboration

play04:13

with um Peter Brown who was a very good

play04:17

statistician and he worked on speech

play04:19

recognition at IBM and then he came as a

play04:22

more mature student to kind melon just

play04:24

to get a PhD um but he already knew a

play04:27

lot he taught me a lot about spee

play04:30

and he in fact taught me about hidden

play04:31

Markov models I think I learn more from

play04:33

him than he learned from me that's the

play04:35

kind of student you want and when he Tau

play04:38

me about hidden Markov models I was

play04:41

doing back propop with hidden layers

play04:43

only they weren't called hidden layers

play04:44

then and I decided that name they use in

play04:47

Hidden Markov models is a great name for

play04:49

variables that you don't know what

play04:50

they're up to um and so that's where the

play04:54

name hidden in neur NS came from me and

play04:57

P decided that was a great name for the

play04:59

hidden hidden L and your all Nets um but

play05:03

I learned a lot from Peter about speech

play05:05

take us back to when Ilia showed up at

play05:08

your at your office I was in my office I

play05:11

probably on a Sunday um and I was

play05:14

programming I think and there was a

play05:16

knock on the door not just any knock but

play05:17

it won't

play05:19

cutter it's sort of an urgent knock so I

play05:21

went and answer to the door and this was

play05:23

this young student there and he said he

play05:25

was cooking Fries over the summer but

play05:27

he'd rather be working in my lab and so

play05:29

I said well why don't you make an

play05:30

appointment and we'll talk and so Ilia

play05:32

said how about now and that sort of was

play05:35

Ila's character so we talked for a bit

play05:38

and I gave him a paper to read which was

play05:40

the nature paper on back

play05:42

propagation and we made another meeting

play05:45

for a week later and he came back and he

play05:47

said I didn't understand it and I was

play05:49

very disappointed I thought he seemed

play05:50

like a bright guy but it's only the

play05:52

chain rule it's not that hard to

play05:54

understand and he said oh no no I

play05:56

understood that I just don't understand

play05:58

why you don't give the gradient to a

play06:00

sensal a sensible function

play06:02

Optimizer which took us quite a few

play06:04

years to think about um and it kept on

play06:07

like that with a he had very good his

play06:09

raw intuitions about things were always

play06:11

very good what do you think had enabled

play06:14

those uh those intuitions for for Ilia I

play06:17

don't know I think he always thought for

play06:19

himself he was always interested in AI

play06:21

from a young age um he's obviously good

play06:24

at math so but it's very hard to know

play06:27

and what was that collaboration between

play06:29

between the two of you like what part

play06:32

would you play and what part would Ilia

play06:34

play it was a lot of fun um I remember

play06:37

one occasion when we were trying to do a

play06:41

complicated thing with producing maps of

play06:43

data where I had a kind of mixture model

play06:46

so you could take the same bunch of

play06:47

similarities and make two maps so that

play06:50

in one map Bank could be close to Greed

play06:52

and in another map Bank could be close

play06:54

to River um cuz in one map you can't

play06:57

have it close to both right cuz River

play06:59

and greed along wayon so we'd have a

play07:01

mixture maps and we were doing it in mat

play07:05

lab and this involved a lot of

play07:06

reorganization of the code to do the

play07:08

right Matrix multiplies and only got fed

play07:10

up with that so he came one day and said

play07:12

um I'm going to write a an interface for

play07:15

Matlab so I program in this different

play07:17

language and then I have something that

play07:19

just converts it into Matlab and I said

play07:21

no Ilia um that'll take you a month to

play07:24

do we've got to get on with this project

play07:26

don't get diverted by that and I said

play07:28

it's okay I did it this

play07:32

morning and that's that's quite quite

play07:34

incredible and throughout those those

play07:37

years the biggest shift wasn't

play07:40

necessarily just the the algorithms but

play07:42

but also the the skill how did you sort

play07:45

of view that skill uh over over the

play07:49

years Ilia got that intuition very early

play07:51

so Ilia was always preaching that um you

play07:55

just make it bigger and it'll work

play07:56

better and I always thought that was a

play07:58

bit of a copout do you going to have to

play07:59

have new ideas too it turns out I was

play08:02

basically right new ideas help things

play08:04

like Transformers helped a lot but it

play08:06

was really the scale of the data and the

play08:09

scale of the computation and back then

play08:11

we had no idea computers would get like

play08:13

a billion times faster we thought maybe

play08:15

they' get a 100 times faster we were

play08:17

trying to do things by coming up with

play08:19

clever ideas that would have just solved

play08:21

themselves if we had had bigger scale of

play08:22

the data and computation in about

play08:25

2011 Ilia and another graduate student

play08:28

called James Martins and

play08:30

had a paper using character level

play08:32

prediction so we took Wikipedia and we

play08:35

tried to predict the next HTML character

play08:39

and that worked remarkably well and we

play08:41

were always amazed at how well it worked

play08:43

and that was using a fancy Optimizer on

play08:47

gpus and we could never quite believe

play08:50

that it understood anything but it

play08:52

looked as though it

play08:53

understood and that just seemed

play08:55

incredible can you take us through how

play08:58

are do models trained to predict the

play09:01

next word and why is it the wrong way of

play09:06

of thinking about them okay I don't

play09:08

actually believe it is the wrong way so

play09:12

in fact I think I made the first

play09:13

neuronet language model that used

play09:15

embeddings and back propagation so it's

play09:18

very simple data just

play09:19

triples and it was turning each symbol

play09:23

into an embedding then having the

play09:25

embeddings interact to predict the

play09:27

embedding of the next symbol and from

play09:29

that predic the next symbol and then it

play09:31

was back propagating through that whole

play09:32

process to learn these triples and I

play09:35

showed it could generalize um about 10

play09:38

years later Yoshua Benji used a very

play09:40

similar Network and showed it work with

play09:41

real text and about 10 years after that

play09:44

linguist started believing in embeddings

play09:46

it was a slow process the reason I think

play09:49

it's not just predicting the next symbol

play09:52

is if you ask well what does it take to

play09:54

predict the next symbol particularly if

play09:56

you ask me a question and then the first

play09:59

word of the answer is the next symbol um

play10:03

you have to understand the question so I

play10:06

think by predicting the next

play10:08

symbol it's very unlike oldfashioned

play10:11

autocomplete oldfashioned autocomplete

play10:13

you'd store sort of triples of words and

play10:16

then if you sort a pair of words you see

play10:18

how often different words came third and

play10:20

that way you can predict the next symbol

play10:22

and that's what most people think auto

play10:23

complete is like it's no longer at all

play10:26

like that um to predict the next symbol

play10:28

you have to understand what's been said

play10:30

so I think you're forcing it to

play10:31

understand by making it predict the next

play10:33

symbol and I think it's understanding in

play10:36

much the same way we are so a lot of

play10:38

people will tell you these things aren't

play10:40

like us um they're just predicting the

play10:42

next symbol they're not reasoning like

play10:44

us but actually in order to predict the

play10:47

next symbol it's have going to have to

play10:48

do some reasoning and we've seen now

play10:50

that if you make big ones without

play10:52

putting in any special stuff to do

play10:53

reasoning they can already do some

play10:55

reasoning and I think as you make them

play10:57

bigger they're going to be able to do

play10:58

more and more reasoning do you think I'm

play11:00

doing anything else than predicting the

play11:01

next symbol right now I think that's how

play11:04

you're learning I think you're

play11:06

predicting the next video frame um

play11:08

you're predicting the next sound um but

play11:11

I think that's a pretty plausible theory

play11:13

of how the brain's learning what enables

play11:16

these models to learn such a wide

play11:19

variety of of fields what these big

play11:21

language models are doing is they

play11:23

looking for common structure and by

play11:25

finding common structure they can encode

play11:27

things using the common structure and

play11:29

that more efficient so let me give you

play11:31

an example if you ask

play11:33

gp4 why is a compost heap like an atom

play11:36

bomb most people can't answer that most

play11:39

people haven't thought they think atom

play11:41

bombs and compost heeps are very

play11:42

different things but gp4 will tell you

play11:44

well the energy scales are very

play11:46

different and the time scales are very

play11:48

different but the thing that's the same

play11:51

is that when the compost Heep gets

play11:52

hotter it generates heat faster and when

play11:55

the atom bomb produces more NE neutrons

play11:57

it produces more neutrons faster

play12:00

and so it gets the idea of a chain

play12:02

reaction and I believe it's understood

play12:04

they're both forms of chain reaction

play12:06

it's using that understanding to

play12:08

compress all that information into its

play12:09

weights and if it's doing that then it's

play12:13

going to be doing that for hundreds of

play12:15

things where we haven't seen the

play12:16

analogies yet but it has and that's

play12:18

where you get creativity from from

play12:20

seeing these analogies between

play12:21

apparently very different things and so

play12:23

I think gp4 is going to end up when it

play12:25

gets bigger being very creative I think

play12:27

this idea that it's just just

play12:29

regurgitating what it's learned just

play12:31

pasing together text it's learned

play12:33

already that's completely wrong it's

play12:35

going to be even more creative than

play12:37

people I think you'd argue that it won't

play12:40

just repeat the human knowledge we've

play12:43

developed so far but could also progress

play12:46

beyond that I think that's something we

play12:48

haven't quite seen yet we've started

play12:51

seeing some examples of it but to a to a

play12:53

large extent we're sort of still at the

play12:56

current level of of of science what do

play12:58

you think will enable it to go beyond

play13:00

that well we've seen that in more

play13:01

limited context like if you take Alpha

play13:04

go in that famous competition with Leo

play13:08

um there was move 37 where Alpha go made

play13:11

a move that all the experts said must

play13:13

have been a mistake but actually later

play13:15

they realized it was a brilliant move um

play13:18

so that was created within that limited

play13:20

domain um I think we'll see a lot more

play13:22

of that as these things get bigger the

play13:25

difference with alphao as well was that

play13:28

it was using reinforcement learning that

play13:31

that subsequently sort of enabled it to

play13:33

to go beyond the current state so it

play13:35

started with imitation learning watching

play13:37

how humans play the game and then it

play13:39

would through selfplay develop Way

play13:42

Beyond that do you think that's the

play13:43

missing component of the I think that

play13:46

may well be a missing component yes that

play13:48

the the self-play in Alpha in Alpha go

play13:51

and Alpha zero are are a large part of

play13:54

why it could make these creative moves

play13:56

but I don't think it's entirely

play13:58

necessary

play13:59

so there's a little experiment I did a

play14:01

long time ago where you your training in

play14:03

neuronet to recognize handwritten digits

play14:06

I love that example the mest example and

play14:09

you give it training data where half the

play14:11

answers are

play14:12

wrong um and the question is how well

play14:15

will it

play14:17

learn and you make half the answers

play14:20

wrong once and keep them like that so it

play14:23

can't average away the wrongness by just

play14:25

seeing the same example but with the

play14:27

right answer sometimes and the wrong

play14:28

answer sometimes when it sees that

play14:29

example half half of the examples when

play14:32

it sees the example the answer is always

play14:33

wrong and so the training data has 50%

play14:37

error but if you train up back

play14:40

propagation it gets down to 5% error or

play14:44

less other words from badly labeled data

play14:49

it can get much better results it can

play14:51

see that the training data is wrong and

play14:54

that's how smart students can be smarter

play14:55

than their advisor and their advisor

play14:57

tells them all this stuff

play14:59

and for half of what their advisor tells

play15:01

them they think no rubbish and they

play15:03

listen to the other half and then they

play15:05

end up smarter than the advisor so these

play15:06

big neural Nets can actually do they can

play15:09

do much better than their training data

play15:11

and most people don't realize that so

play15:13

how how do you expect this models to add

play15:16

reasoning in into them so I mean one

play15:19

approach is you add sort of the

play15:20

heuristics on on top of them which a lot

play15:23

of the research is doing now where you

play15:25

have sort of Shan of thought you just

play15:26

feedback it's reasoning um in into

play15:29

itself and another way would be in the

play15:32

model itself as you scale scale scale it

play15:34

up what's your intuition around that so

play15:38

my intuition is that as we scale up

play15:40

these models I get better at reasoning

play15:42

and if you ask how people work roughly

play15:44

speaking we have these

play15:47

intuitions and we can do reasoning and

play15:50

we use the reasoning to correct our

play15:52

intuitions of course we use the

play15:54

intuitions during the reasoning to do

play15:55

the reasoning but if the conclusion of

play15:57

the reasoning conflicts with our in

play15:58

itions we realize the intuitions need to

play16:00

be changed that's much like in Alpha go

play16:03

or Alpha zero where you have an

play16:06

evaluation function um that just looks

play16:09

at a board and says how good is that for

play16:10

me but then you do the Monte Cara roll

play16:13

out and now you get a more accurate idea

play16:17

and you can revise your evaluation

play16:18

function so you can train it by getting

play16:20

it to agree with the results of

play16:22

reasoning and I think these large

play16:23

language models have to start doing that

play16:26

they have to start training their Raw

play16:28

intuitions about what should come next

play16:30

by doing reasoning and realizing that's

play16:32

not right and so that way they can get

play16:35

more training data than just mimicking

play16:37

what people did and that's exactly why

play16:40

alphao could do this creative move 37 it

play16:43

had much more training data because it

play16:44

was using reasoning to check out what

play16:47

the right next move should have been and

play16:49

what do you think about multimodality so

play16:52

we spoke about these analogies and often

play16:54

the analogies are Way Beyond what we

play16:56

could see it's discovering analogy that

play16:59

are far beyond humans and at maybe

play17:01

abstraction levels that we'll never be

play17:03

able to to to understand now when we

play17:06

introduce images to that and and video

play17:09

and sound how do you think that will

play17:11

change the models and uh how do you

play17:14

think it will change the analogies that

play17:16

it will be able to make um I think it'll

play17:19

change it a lot I think it'll make it

play17:21

much better at understanding spatial

play17:23

things for example from language alone

play17:26

it's quite hard to understand some

play17:27

spatial things although remarkably gp4

play17:30

can do that even before it was

play17:32

multimodal um but when you make it

play17:35

multimodal if you have it both doing

play17:38

vision and reaching out and grabbing

play17:40

things it'll understand object much

play17:42

better if it can pick them up and turn

play17:44

them over and so on so although you can

play17:47

learn an awful lot from language it's

play17:50

easier to learn if you multimodal and in

play17:53

fact you then need less language and

play17:55

there's an awful lot of YouTube video

play17:57

for predicting the next frame so or

play17:59

something like that so I think these

play18:01

multimodule models are clearly going to

play18:03

take over um you can get more data that

play18:06

way they need less language so there's

play18:08

really a philosophical point that you

play18:10

could learn a very good model from

play18:12

language alone but it's much easier to

play18:14

learn it from a multimodal system and

play18:16

how do you think it will impact the

play18:18

model's reasoning I think it'll make it

play18:21

much better at reasoning about space for

play18:22

example reasoning about what happens if

play18:24

you pick objects up if you actually try

play18:26

picking objects up you're going to get

play18:27

all sorts of training data that's going

play18:29

to help do you think the human brain

play18:32

evolved to work well with with language

play18:35

or do you think language evolved to work

play18:37

well with the human brain I think the

play18:40

question of whether language evolved to

play18:41

work with the brain or the brain evolved

play18:43

to work with language I think that's a

play18:44

very good question I think both happened

play18:48

I used to think we would do a lot of

play18:50

cognition without needing language at

play18:52

all um now I've changed my mind a bit so

play18:57

let me give you three different views of

play18:59

language um and how it relates to

play19:01

cognition there's the oldfashioned

play19:03

symbolic view which is cognition

play19:05

consists of having strings of symbols in

play19:10

some kind of cleaned up logical language

play19:12

where there's no ambiguity and applying

play19:14

rules of inference and that's what

play19:15

cognition is it's just these symbolic

play19:17

manipulations on things that are like

play19:19

strings of language symbols um so that's

play19:22

one extreme view an opposite extreme

play19:24

view is no no once you get inside the

play19:27

head it's all vectors so symbols come in

play19:30

you convert those symbols into big

play19:32

vectors and all the stuff inside's done

play19:34

with big vectors and then if you want to

play19:36

produce output you produce symbols again

play19:38

so there was a point in machine

play19:40

translation in about

play19:42

2014 when people were using neural

play19:44

recurrent neural Nets and words will

play19:46

keep coming in and that have a hidden

play19:48

State and they keep accumulating

play19:50

information in this hidden state so when

play19:52

they got to the end of a sentence that

play19:55

have a big hidden Vector that captures

play19:56

the meaning of that sentence that could

play19:59

then be used for producing the sentence

play20:00

in another language that was called a

play20:02

thought vector and that's a sort of

play20:04

second view of language you convert the

play20:05

language into a big Vector that's

play20:08

nothing like language and that's what

play20:10

cognition is all about but then there's

play20:12

a third view which is what I believe now

play20:15

which is that you take these

play20:20

symbols and you convert the symbols into

play20:23

embeddings and you use multiple layers

play20:25

of that so you get these very rich

play20:26

embeddings but the embeddings are still

play20:28

to the symbols in the sense that you've

play20:30

got a big Vector for this symbol and a

play20:31

big Vector for that symbol and these

play20:34

vectors interact to produce the vector

play20:36

for the symbol for the next word and

play20:39

that's what understanding is

play20:40

understanding is knowing how to convert

play20:42

the symbols into these vectors and

play20:44

knowing how the elements of the vector

play20:45

should interact to predict the vector

play20:47

for the next symbol that's what

play20:49

understanding is both in these big

play20:50

language models and in our

play20:52

brains and that's an example which is

play20:55

sort of in between you're staying with

play20:57

the symbols but you're interpreting them

play21:00

as these big vectors and that's where

play21:02

all the work is and all the knowledge is

play21:04

in what vectors you use and how the

play21:06

elements of those vectors interact not

play21:08

in symbolic

play21:09

rules um but it's not saying that you

play21:13

get away from the symbols all together

play21:14

it's saying you turn the symbols into

play21:16

big vectors but you stay with that

play21:18

surface structure of the symbols and

play21:20

that's how these models are working and

play21:22

that's I seem to be a more plausible

play21:24

model of human thought too you were one

play21:26

of the first folks to get idea of using

play21:30

gpus and I know yansen loves you for

play21:34

that uh back in 2009 you mentioned that

play21:36

you told yansen that this could be a

play21:38

quite good idea um for for training

play21:41

training neural Nets take us back to

play21:43

that early intuition of of using gpus

play21:46

for for training neural Nets so actually

play21:48

I think in about

play21:50

2006 I had a former graduate student

play21:53

called Rick zisy who's a very good

play21:55

computer vision guy and I talked to him

play21:58

and a meeting and he said you know you

play22:00

ought to think about using Graphics

play22:02

processing cards because they're very

play22:03

good at Matrix multiplies and what

play22:05

you're doing is basically all matric

play22:07

multiplies so I thought about that for a

play22:09

bit and then we learned about these

play22:11

Tesla systems that had um four gpus in

play22:16

and initially we just got um gaming gpus

play22:21

and discovered they made things go 30

play22:22

times faster and then we bought one of

play22:24

these Tesla systems with 4 gpus and we

play22:27

did speech on that and it worked very

play22:30

well then in 2009 I gave a talk at nips

play22:34

and I told a thousand machine learning

play22:36

researches you should all go and buy

play22:37

Nvidia gpus they're the future you need

play22:39

them for doing machine learning and I

play22:42

actually um then sent mail to Nvidia

play22:45

saying I told a thousand machine

play22:46

learning researchers to buy your boards

play22:48

could you give me a free one and they

play22:49

said no actually they didn't say no they

play22:51

just didn't reply um but when I told

play22:54

Jensen this story later on he gave me a

play22:55

free

play22:57

one that's uh that's very very good I I

play23:00

think what's interesting is um as well

play23:02

is sort of how gpus has evolved

play23:05

alongside the the field so where where

play23:07

do you think we we should go go next in

play23:10

in the in the compute so my last couple

play23:13

of years at Google I was thinking about

play23:15

ways of trying to make analog

play23:17

computation so that instead of using

play23:19

like a megawatt we could use like 30

play23:21

Watts like the brain and we could run

play23:23

these big language models in analog

play23:26

hardware and I never made it

play23:29

work and but I started really

play23:32

appreciating digital computation so if

play23:36

you're going to use that low power

play23:38

analog

play23:39

computation every piece of Hardware is

play23:41

going to be a bit different and the idea

play23:43

is the learning is going to make use of

play23:45

the specific properties of that hardware

play23:47

and that's what happens with people all

play23:49

our brains are different um so we can't

play23:52

then take the weights in your brain and

play23:54

put them in my brain the hardware is

play23:56

different the precise properties of the

play23:58

individual ual neurons are different the

play23:59

learning used to make has learned to

play24:01

make use of all that and so we're mortal

play24:04

in the sense that the weights in my

play24:05

brain are no good for any other brain

play24:07

when I die those weights are useless um

play24:10

we can get information from one to

play24:12

another rather

play24:13

inefficiently by I produce sentences and

play24:16

you figure out how to change your weight

play24:18

so you would have said the same thing

play24:20

that's called distillation but that's a

play24:22

very inefficient way of communicating

play24:24

knowledge and with digital systems

play24:27

they're immortal because once you got

play24:29

some weights you can throw away the

play24:31

computer just store the weights on a

play24:32

tape somewhere and now build another

play24:34

computer put those same weights in and

play24:36

if it's digital it can compute exactly

play24:39

the same thing as the other system did

play24:41

so digital systems can share weights and

play24:45

that's incredibly much more efficient if

play24:48

you've got a whole bunch of digital

play24:50

systems and they each go and do a tiny

play24:51

bit of

play24:52

learning and they start with the same

play24:54

weights they do a tiny bit of learning

play24:56

and then they share their weights again

play24:58

um they all know what all the others

play24:59

learned we can't do that and so they're

play25:03

far superior to us in being able to

play25:04

share knowledge a lot of the ideas that

play25:07

have been deployed in the field are very

play25:10

old school ideas uh it's the ideas that

play25:13

have been around the Neuroscience for

play25:15

forever what do you think is sort of

play25:17

left to to to apply to the systems that

play25:19

we develop so one big thing that we

play25:23

still have to catch up with Neuroscience

play25:26

on is the time scales for changes so in

play25:31

nearly all the neural Nets there's a

play25:34

fast time scale for changing activities

play25:35

so input comes in the activities the

play25:38

embedding vectors all change and then

play25:40

there's a slow time scale which is

play25:41

changing the weights and that's

play25:43

long-term learning and you just have

play25:45

those two time scales in the brain

play25:48

there's many time scales at which

play25:49

weights change so for example if I say

play25:53

an unexpected word like cucumber and now

play25:56

5 minutes later you put headphones on

play25:58

there's a lot of noise and there's very

play26:00

faint words you'll be much better at

play26:03

recognizing the word cucumber because I

play26:05

said it 5 minutes ago so where is that

play26:08

knowledge in the brain and that

play26:10

knowledge is obviously in temporary

play26:12

changes to synapsis it's not neurons are

play26:14

going cucumber cucumber cucumber you

play26:16

don't have enough neurons for that it's

play26:18

in temporary changes to the weights and

play26:21

you can do a lot of things with

play26:22

temporary weight changes fast what I

play26:24

call fast weights we don't do that in

play26:26

these neural models and the reason we

play26:28

don't do it is because if you have

play26:31

temporary changes to the weights that

play26:33

depend on the input data then you can't

play26:37

process a whole bunch of different cases

play26:38

at the same time at present we take a

play26:41

whole bunch of different strings we

play26:43

stack them stack them together and we

play26:45

process them all in parallel because

play26:47

then we can do Matrix Matrix multiplies

play26:48

which is much more efficient and just

play26:51

that efficiency is stopping us using

play26:53

fast weights but the brain clearly uses

play26:56

fast weights for temporary memory and

play26:59

there's all sorts of things you can do

play27:00

that way that we don't do at present I

play27:02

think that's one of the biggest things

play27:03

we have to learn I was very hopeful that

play27:04

things like graph core um if they went

play27:08

sequential and did just online learning

play27:11

then they could use fast weights

play27:13

um but that hasn't worked out yet I

play27:16

think it'll work out eventually when

play27:18

people are using conductances for

play27:19

weights how has knowing how this models

play27:23

work and knowing how the brain works

play27:26

impacted the way you you think I think

play27:29

there's been one big impact which is at

play27:33

a fairly abstract level which is that

play27:35

for many

play27:37

years people were very scornful about

play27:40

the idea of having a big random neural

play27:42

net and just giving a lot of training

play27:44

data and it would learn to do

play27:46

complicated things if you talk to

play27:47

statisticians or linguists or most

play27:50

people in AI they say that's just a pipe

play27:53

dream there's no way you're going to

play27:54

learn to really complicated things

play27:56

without some kind of innate knowledge

play27:57

without a lot of architectural

play27:59

restrictions it turns out that's

play28:00

completely wrong you can take a big

play28:03

random neural network and you can learn

play28:04

a whole bunch of stuff just from data um

play28:08

so the idea that stochastic gradient

play28:10

descent to adjust the repeatedly adjust

play28:13

the weights using a gradient that will

play28:16

learn things and we'll learn big

play28:17

complicated things that's been validated

play28:21

by these big models and that's a very

play28:23

important thing to know about the brain

play28:25

it doesn't have to have all this innate

play28:27

structure now obviously it's got a lot

play28:28

of innate structure but it certainly

play28:32

doesn't need innate structure for things

play28:33

that are easily

play28:35

learned and so the sort of idea coming

play28:37

from Chomsky that you won't you won't

play28:39

learn anything complicated like language

play28:41

unless it's all kind of wired in already

play28:43

and just matures that idea is now

play28:46

clearly nonsense I'm sure shumsky would

play28:49

appreciate you calling his ideas

play28:51

nonsense well I think actually I think a

play28:54

lot of chs's political ideas are very

play28:56

sensible and I'm was struck by how how

play28:59

come someone with such sensible ideas

play29:00

about the Middle East could be so wrong

play29:02

about

play29:03

Linguistics what do you think would make

play29:05

these models simulate consciousness of

play29:09

of humans more effectively but imagine

play29:12

you had the AI assistant that you've

play29:14

spoken to in your entire life and

play29:16

instead of that being you know like chat

play29:19

today that sort of deletes the memory of

play29:21

the conversation and you start fresh all

play29:23

of the time okay it had

play29:26

self-reflection at some point you you

play29:28

pass away and you tell that to to the

play29:32

assistant do you think I me not me

play29:35

somebody else tells that toist yeah you

play29:38

would it would be difficult for you to

play29:39

tell that to the assistant um do you

play29:42

think that assistant would would feel at

play29:44

that point yes I think they can have

play29:46

feelings too so I think just as we have

play29:49

this inner theater model for perception

play29:51

we have an inthat model for feelings

play29:53

they're things that I can experience but

play29:55

other people can't um

play29:59

I think that model is equally wrong so I

play30:02

think suppose I say I feel like punching

play30:04

Gary on the nose which I often do let's

play30:07

try and Abstract that away from the idea

play30:10

of an inner theater what I'm really

play30:12

saying to you is um if it weren't for

play30:16

the inhibition coming from my frontal

play30:17

loes I would perform an action so when

play30:20

we talk about feelings we really talking

play30:22

about um actions we would perform if it

play30:25

weren't for um con straints and that

play30:29

really that's really what feelings are

play30:31

the actions we would do if it weren't

play30:32

for

play30:33

constraints um so I think you can give

play30:36

the same kind of explanation for

play30:37

feelings and there's no reason why these

play30:39

things can't have feelings in fact in

play30:42

1973 I saw a robot having an emotion so

play30:46

in Edinburgh they had a robot with two

play30:49

grippers like this that could assemble a

play30:51

toy car if you put the pieces separately

play30:54

on a piece of green felt um but if you

play30:58

put them in a pile his vision wasn't

play31:01

good enough to figure out what was going

play31:02

on so it put his grip whack and it

play31:05

knocked them so they were scattered and

play31:06

then it could put them together if you

play31:08

saw that in a person you say it was

play31:10

crossed with the situation because it

play31:11

didn't understand it so it destroyed

play31:13

it that's

play31:16

profound you uh we spoke previously you

play31:19

described sort of humans and and and and

play31:22

the llms as analogy machines what do you

play31:24

think has been the most powerful

play31:27

analogies that you found throughout your

play31:30

life oh in throughout my life um woo I

play31:36

guess probably an a sort of weak analogy

play31:40

that's influenced me a lot is um the

play31:45

analogy between religious belief and

play31:48

between belief in symbol

play31:50

processing so when I was very young I

play31:52

was confronted I came from an atheist

play31:54

family and went to school and was

play31:56

confronted with religious belief and it

play31:58

just seemed nonsense to me it still

play32:00

seems nonsense to me um and when I saw

play32:03

symbol processing as an explanation how

play32:04

people worked um I thought it was just

play32:08

the same

play32:10

nonsense I don't think it's quite so

play32:12

much nonsense now because I think

play32:15

actually we do do symbol processing it's

play32:17

just we do it by giving these big

play32:19

embedding vectors to the symbols but we

play32:21

are actually symbol processing um but

play32:24

not at all in the way people thought

play32:25

where you match symbols and the only

play32:27

thing is symbol has is it's identical to

play32:29

another symbol or it's not identical

play32:31

that's the only property a symbol has we

play32:33

don't do that at all we use the context

play32:35

to give embedding vectors to symbols and

play32:37

then use the interactions between the

play32:39

components of these embedding vectors to

play32:41

do thinking but there's a very good

play32:44

researcher at Google called Fernando

play32:46

Pereira who said yes we do have symbolic

play32:50

reasoning and the only symbolic we have

play32:52

is natural language natural language is

play32:54

a symbolic language and we reason with

play32:55

it and I believe that now you've done

play32:58

some of the most meaningful uh research

play33:00

in the history of of computer science

play33:03

can you walk us through like how do you

play33:05

select the right problems to to work on

play33:08

well first let me correct you me and my

play33:11

students have done a lot of the most

play33:12

meaningful things and it's mainly been a

play33:15

very good collaboration with students

play33:17

and my ability to select very good

play33:19

students and that came from the fact

play33:21

that were very few people doing neural

play33:23

Nets in the 70s and 80s and 90s and

play33:25

2000s and so the few people doing your

play33:28

nets got to pick the very best students

play33:30

so that was a piece of luck but my way

play33:33

of selecting problems is

play33:35

basically well you know when scientists

play33:37

talk about how they work they have

play33:40

theories about how they work which

play33:41

probably don't have much to do with the

play33:42

truth but my theory is that

play33:45

I look for something where everybody's

play33:48

agreed about something and it feels

play33:50

wrong just there's a slight intuition

play33:52

there's something wrong about it and

play33:54

then I work on that and see if I can

play33:56

elaborate why it is I think it's wrong

play33:58

and maybe I can make a little demo with

play34:00

a small computer program that shows that

play34:04

it doesn't work the way you might expect

play34:06

so let me take one example um most

play34:09

people think that if you add noise to a

play34:11

neural net is going to work worse um if

play34:14

for example each time you put a training

play34:16

example through

play34:19

you make half of the neurons be silent

play34:22

it'll work worse actually we know it'll

play34:26

generalize better if you do that

play34:28

and you can demonstrate that um in a

play34:32

simple example that's what's nice about

play34:34

computer simulation you can show you

play34:36

know this idea you had that adding noise

play34:38

is going to make it worse and sort of

play34:39

dropping out half the neurons will make

play34:41

it work worse which you will in the

play34:42

short term but if you train it with like

play34:45

that in the end it'll work better you

play34:47

can demonstrate that with a small

play34:48

computer program and then you can think

play34:49

hard about why that is and how it stops

play34:53

big elaborate co- adaptations um but

play34:56

that I think that that's my method of

play34:58

working find something that sounds

play35:00

suspicious and work on it and see if you

play35:03

can give a simple demonstration of why

play35:05

it's wrong what sounds suspicious to you

play35:07

now well that we don't use fast weight

play35:10

sounds suspicious that we only have

play35:12

these two time scales that's just wrong

play35:14

that's not at all like the brain um and

play35:17

in the long run I think we're going to

play35:18

have to have many more time scans so

play35:20

that's an example there and if you had

play35:23

if you had your group of of students

play35:25

today and they came to you and they said

play35:26

so the Hamming question that we talked

play35:27

about previously you know what's the

play35:29

most important problem in in in your

play35:31

field what would you suggest that they

play35:33

take on and work on on next we spoke

play35:36

about reasoning time scales what would

play35:38

be sort of the highest priority Problem

play35:40

that that you'd give them for me right

play35:43

now it's the same question I've had for

play35:45

the last like 30 years or so which is

play35:48

does the brain do back propagation I

play35:51

believe the brain is getting gradients

play35:52

if you don't get gradients your learning

play35:54

is just much worse than if you do get

play35:56

gradients but how is the brain getting

play35:58

gradients and is it

play36:01

somehow implementing some approximate

play36:03

version of back propagation or is it

play36:04

some completely different technique

play36:06

that's a big open question and if I kept

play36:09

on doing research that's what I would be

play36:11

doing research on and when you look back

play36:13

at at your career now you've been right

play36:16

about so many things but what were you

play36:18

wrong about that you wish you sort of

play36:20

spent less time pursuing a certain

play36:23

direction okay those are two separate

play36:25

questions one is what were you wrong

play36:26

about and two do you wish you'd less

play36:28

spent less time on it I think I was

play36:31

wrong about Boltz machines and I'm glad

play36:33

I spent a long time on it there are much

play36:35

more beautiful theory of how you get

play36:37

gradients than back propagation back

play36:39

propagation is just ordinary and

play36:40

sensible and it's just a chain rule B

play36:42

machines is clever and it's a very

play36:44

interesting way to get gradients and I

play36:47

would love for that to be how the brain

play36:49

works but I think it isn't did you spend

play36:52

much time imagining what would happen

play36:54

post the systems developing as as well

play36:57

did you have an idea that okay if we

play36:59

could make these systems work really

play37:00

well we could you know democratize

play37:02

education we could make knowledge way

play37:04

more accessible um we could solve some

play37:07

tough problems in in in medicine or was

play37:10

it more to you about understanding the

play37:13

Brin yes I I sort of feel scientists

play37:17

ought to be doing things that are going

play37:18

to help Society but actually that's not

play37:22

how you do your best research you do

play37:23

your best research when it's driven by

play37:25

curiosity you just have to understand

play37:28

something um much more recently I've

play37:32

realized these things could do a lot of

play37:33

harm as well as a lot of good and I've

play37:35

become much more concerned about the

play37:37

effects they're going to have on society

play37:39

but that's not what was motivating me I

play37:41

just wanted to understand how on Earth

play37:42

can the brain learn to do things that's

play37:45

what I want to know and I sort of failed

play37:47

as a side effect of that failure we got

play37:49

some nice engineering

play37:51

but yeah it was a good good good failure

play37:54

for the world if you take the lens of

play37:56

the things that could go really right

play37:59

what what do you think are the most

play38:01

promising

play38:02

applications I think Health Care is

play38:05

clearly uh a big one um with Health Care

play38:09

there's almost no end to how much Health

play38:12

Care Society can absorb if you take

play38:14

someone old they could use five doctors

play38:18

fulltime um so when AI gets better than

play38:21

people at doing things um you'd like it

play38:25

to get better in areas where you could

play38:27

do with a lot more of that stuff and we

play38:30

could do with a lot more doctors if

play38:32

everybody had three doctors of their own

play38:33

that would be great and we're going to

play38:35

get to that point um so that's one

play38:38

reason why Healthcare is good there's

play38:41

also just a new engineering developing

play38:44

new materials for example for better

play38:46

solar panels or for superc conductivity

play38:49

or for just understanding how the Body

play38:52

Works um there's going to be huge

play38:55

impacts there those are all going to be

play38:57

be good things what I worry about is Bad

play39:00

actors using them for bad things we've

play39:02

facilitated people like Putin or Z or

play39:05

Trump

play39:06

using AI for Killer Robots or for

play39:10

manipulating public opinion or for Mass

play39:12

surveillance and those are all very

play39:14

worrying things are you ever concerned

play39:17

that slowing down the field could also

play39:20

slow down the positives oh absolutely

play39:23

and I think there's not much chance that

play39:26

the field will slow down partly because

play39:29

it's International and if one country

play39:31

slows down the other countries aren't

play39:32

going to slow down so there's a race

play39:35

clearly between China and the US and

play39:37

neither is going to slow down so yeah I

play39:39

don't I mean there was this partition

play39:41

saying we should slow down for six

play39:42

months I didn't sign it just because I

play39:44

thought it was never going to happen I

play39:46

maybe should have signed it because even

play39:47

though it was never going to happen it

play39:49

made a political point it's often good

play39:51

to ask for things you know you can't get

play39:53

just to make a point um but I didn't

play39:55

think we're going to slow down and how

play39:57

do you think that it will impact the AI

play39:59

research process uh having uh this

play40:03

assistance so I think it'll make it a

play40:04

lot more efficient a research will get a

play40:06

lot more efficient when you've got these

play40:08

assistants that help you program um but

play40:11

also help you think through things and

play40:12

probably help you a lot with equations

play40:14

too have you reflected much on the

play40:17

process of selecting Talent has that

play40:19

been mostly intuitive to you like when

play40:22

Ilia shows up at the door you feel this

play40:24

is smart guy let's work together so for

play40:27

selecting Talent um sometimes you just

play40:30

know so after talking to Ilia for not

play40:32

very long he seemed very smart and then

play40:35

talking him a bit more he clearly was

play40:36

very smart and had very good intuitions

play40:38

as well as being good at math so that

play40:41

was a no-brainer there's another case

play40:43

where I was at a NPS conference um we

play40:47

had a poster and I someone came up and

play40:50

he started asking questions about the

play40:52

poster and every question he asked was a

play40:54

sort of deep insight into what we'd done

play40:56

wrong um and after 5 minutes I offered

play40:59

him a postto position that guy was David

play41:01

McKai who was just brilliant and it's

play41:04

very sad he died but he was it was very

play41:07

obvious you'd want him um other times

play41:10

it's not so obvious and one thing I did

play41:12

learn was that people are different

play41:15

there's not just one type of good

play41:17

student um so there's some students who

play41:21

aren't that creative but are technically

play41:23

extremely strong and will make anything

play41:26

work there's other students who aren't

play41:28

technically strong but are very creative

play41:31

of course you want the ones who are both

play41:32

but you don't always get that but I

play41:34

think actually in the lab you need a

play41:36

variety of different kinds of graduate

play41:38

student but I still go with my gut

play41:41

intuition that sometimes you talk to

play41:43

somebody and they're just very very they

play41:45

just get it and those are the ones you

play41:48

want what do you think is the reason for

play41:51

some folks having better intuition do

play41:54

they just have better training data than

play41:56

than others or how can you develop your

play42:00

intuition I think it's partly they don't

play42:03

stand for nonsense so here's a way to

play42:06

get bad intuitions believe everything

play42:08

you're told that's fatal you have to be

play42:12

able to I think here's what some people

play42:14

do they have a whole framework for

play42:15

understanding reality and when someone

play42:17

tells them something they try and sort

play42:20

of figure out how that fits into their

play42:22

framework and if it doesn't they just

play42:24

reject it and that's a very good

play42:28

strategy um people who try and

play42:30

incorporate whatever they're told end up

play42:33

with a framework that's sort of very

play42:35

fuzzy and sort of can believe everything

play42:38

and that's useless so I think actually

play42:41

having a strong view of the world and

play42:44

trying to manipulate incoming facts to

play42:46

fit in with your view obviously it can

play42:48

lead you into deep religious belief and

play42:51

fatal flaws and so on like my belief in

play42:53

boltzman machines um but I think that's

play42:56

the way to go if you got good intuitions

play42:58

you can trust you should trust them if

play43:00

you got bad intuitions it doesn't matter

play43:03

what you do so you might as well trust

play43:05

them a very very good very good point

play43:09

when when you look at the the types of

play43:12

research that's that's that's being done

play43:15

today do you think we're putting all of

play43:17

our eggs in one basket and we should

play43:19

diversify our ideas a bit more in in the

play43:22

field or do you think this is the most

play43:24

promising Direction so let's go all in

play43:26

on it

play43:28

I think having big models and training

play43:30

them on multimodal data even if it's

play43:33

only to predict the next word is such a

play43:35

promising approach that we should go

play43:37

pretty much all in on it obviously

play43:39

there's lots and lots of people doing it

play43:40

now and there's lots of people doing

play43:42

apparently crazy things and that's good

play43:45

um but I think it's fine for like most

play43:47

of the people to be following this path

play43:48

because it's working very well do you

play43:50

think that the learning algorithms

play43:54

matter that much or is it just a skill

play43:56

are there basically millions of ways

play43:59

that we could we could get to human

play44:01

level in in intelligence or are there

play44:03

sort of a select few that we need to

play44:05

discover yes so this issue of whether

play44:08

particular learning algorithms are very

play44:10

important or whether there's a great

play44:12

variety of learning algorithms that'll

play44:13

do the job I don't know the answer it

play44:16

seems to me though that back propagation

play44:19

there's a sense in which it's the

play44:20

correct thing to do getting the gradient

play44:23

so that you change a parameter to make

play44:24

it work better that seems like the right

play44:27

thing to do and it's been amazingly

play44:30

successful there may well be other

play44:32

learning algorithms that are alternative

play44:34

ways of getting that same gradient or

play44:36

that are getting the gradient to

play44:37

something else and that also work

play44:40

um I think that's all open and a very

play44:43

interesting issue now about whether

play44:45

there's other things you can try and

play44:47

maximize that will give you good systems

play44:50

and maybe the brain's doing that because

play44:51

it's

play44:52

easier but backprop is in a sense the

play44:55

right thing to do and we know that doing

play44:57

it works really

play44:59

well and one last question when when you

play45:02

look back at your sort of Decades of

play45:04

research what are you what are you most

play45:05

proud of is it the students is it the

play45:07

research what what makes you most proud

play45:09

of when you look back at at your life's

play45:11

work the learning algorithm for

play45:14

boltimore machines so the learning

play45:16

algorithm for Boltz machines is

play45:17

beautifully elegant it's maybe hopeless

play45:21

in practice um but it's the thing I

play45:25

enjoyed most developing that with Terry

play45:27

and it's what I'm proudest of um even if

play45:31

it's

play45:31

[Music]

play45:36

wrong what questions do you spend most

play45:39

of your time thinking about now is it

play45:41

the um what what should I watch on

play45:43

Netflix

Rate This

5.0 / 5 (0 votes)

相关标签
人工智能神经网络机器学习大脑机制学习算法直觉培养创新思维技术发展未来趋势跨学科研究