Prof. Chris Bishop's NEW Deep Learning Textbook!

Machine Learning Street Talk
10 Apr 202482:59

Summary

TLDR本次访谈中,我们有幸与人工智能和机器学习领域的杰出人物克里斯·毕晓普教授进行对话。克里斯是微软研究院的技术院士和剑桥科学人工智能主任,同时担任爱丁堡大学计算机科学的名誉教授和达尔文学院的院士。他还是《深度学习基础与概念》一书的合著者,该书是他与儿子休一起出版的。克里斯分享了他在深度学习领域的研究心得,包括他对不变性的看法以及如何从大量论文中提炼出核心概念。此外,克里斯还讨论了他在量子场理论方面的博士研究,以及他如何从理论物理学家转变为全职神经网络研究者的经历。他强调了概率理论在机器学习中的基础性作用,并分享了他对于神经网络和深度学习未来发展的看法。

Takeaways

  • 📚 教授Chris Bishop是人工智能和机器学习领域的杰出人物,微软研究院的技术研究员和剑桥AI for Science的主任。
  • 🎓 Chris Bishop与他的儿子Hugh合著了一本关于深度学习基础和概念的新书。
  • 🌟 Chris在2004年被选为英国皇家工程院院士,并在2007年和2017年分别被选为爱丁堡皇家学会和英国皇家学会的院士。
  • 📈 他在微软研究院负责全球工业研究和开发组合,特别关注机器学习和自然科学。
  • 🔍 Chris解释了在写书时如何选择包含哪些内容,强调了从核心概念中提炼的重要性。
  • 💡 他讨论了深度学习领域的重要概念,如概率、梯度方法等,并强调了这些概念的持久价值。
  • 🌐 Chris提到了他对于神经网络和机器学习对自然科学,包括物理学的影响的看法。
  • 📖 他分享了自己从理论物理学家转变为神经网络研究者的个人经历。
  • 🤖 他讨论了人工智能的未来,特别是神经网络和机器学习如何增强人类的创造力和认知能力。
  • 🔑 他强调了机器学习中的贝叶斯框架的重要性,并讨论了实践中的近似方法。
  • 🚀 Chris对于人工智能技术的未来发展持乐观态度,认为我们正处于一个新时代的开始。

Q & A

  • Chris Bishop教授在人工智能和机器学习领域的主要贡献是什么?

    -Chris Bishop教授是机器学习领域的杰出人物,他是微软研究院的技术研究员和主任,专注于AI在科学研究中的应用。他还是爱丁堡大学计算机科学的名誉教授和达尔文学院的研究员。他的主要贡献包括编写了机器学习领域的重要教材《Pattern Recognition and Machine Learning》(简称PRML),并与其子Hugh共同撰写了新书《Deep Learning Foundations and Concepts》。

  • Chris Bishop教授的学术背景和研究领域有哪些?

    -Chris Bishop教授在牛津大学获得了物理学学士学位,并在爱丁堡大学获得了理论物理学博士学位,博士论文主题是量子场论。他的研究领域包括机器学习和自然科学研究,特别是深度学习的基础和概念。

  • Chris Bishop教授在微软研究院的职责是什么?

    -在微软研究院,Chris Bishop教授负责监督全球范围内的工业研究和开发工作,特别关注机器学习和自然科学。他领导的团队致力于将机器学习技术应用于科学研究,推动科学发现的进程。

  • Chris Bishop教授对于深度学习模型的哪些特性感到自豪?

    -Chris Bishop教授对于他与儿子共同编写的新书《Deep Learning Foundations and Concepts》中的生产价值感到自豪。他们与出版商紧密合作,确保书籍的物理质量高,特别是采用了称为缝合签名的印刷技术,使得书籍可以平整地打开,便于阅读且耐用。

  • Chris Bishop教授如何看待深度学习模型在科学发现中的应用?

    -Chris Bishop教授认为深度学习模型在科学发现中的应用是最令人兴奋的前沿领域。他认为,将机器学习和人工智能应用于科学研究是最重要的应用之一,因为这将极大地加速我们进行科学发现的能力。

  • Chris Bishop教授对于深度学习模型的未来发展有何看法?

    -Chris Bishop教授认为深度学习模型的未来发展前景广阔,他强调了Transformer架构的重要性,并表示深度学习模型的成功部分归功于其能够从大量数据中提取规则和模式。他预测,尽管Transformer架构非常成功,但未来还会有新的架构出现。

  • Chris Bishop教授如何看待人工智能的创造性?

    -Chris Bishop教授认为人工智能系统具有创造性,尽管它们是由人类创造和设计的。他指出,人工智能系统的创造力是建立在人类知识和经验之上的,通过学习和实践,人工智能可以发展出新的思考方式和创新能力。

  • Chris Bishop教授对于人工智能的安全性和道德问题有何看法?

    -Chris Bishop教授强调,我们需要创造对人类有益的技术,并确保人工智能系统的安全性和道德性。他提到,虽然在发展过程中可能会出现一些挑战和错误,但总体上,我们正在朝着正确的方向发展,越来越多的关注和努力正在投入到理解人工智能的潜在风险,并采取措施来减轻这些风险。

  • Chris Bishop教授如何看待神经网络在模式识别中的作用?

    -Chris Bishop教授认为神经网络在模式识别中扮演了强大的工具角色。他的第一本书《Neural Networks for Pattern Recognition》就强调了神经网络在这一领域的重要作用,并推动了神经网络作为机器学习的强大工具的普及。

  • Chris Bishop教授在科学研究中使用深度学习的一个具体例子是什么?

    -Chris Bishop教授在科学研究中使用深度学习的一个具体例子是他在聚变研究中的工作。他和他的团队使用神经网络实现了对高温等离子体形状的实时反馈控制,这是通过训练一个神经网络模拟器来完成的,该模拟器能够根据磁场测量预测等离子体的形状。

  • Chris Bishop教授对于深度学习模型的泛化能力有何看法?

    -Chris Bishop教授认为深度学习模型的泛化能力非常出色,但他同时也指出这是一个开放的问题,我们还需要进一步研究为什么这些看似过参数化的模型能够如此良好地泛化。他提到,尽管我们可以描述模型并对其有很多了解,但为什么它们能够如此有效地工作仍是一个需要深入研究的问题。

Outlines

00:00

📚 与人工智能领域的杰出人物Chris Bishop教授的对话

本段落介绍了与人工智能领域著名学者Chris Bishop教授的对话。Chris是微软研究院AI for Science in Cambridge的技术研究员和主任,同时担任爱丁堡大学计算机科学的名誉教授和达尔文学院剑桥的研究员。他与儿子Hugh合著了一本关于深度学习基础和概念的新书,并讨论了他们在书中特别强调的生产价值和书籍的物理质量。Chris分享了他对于机器学习和人工智能领域的重大贡献,包括他所著的《Pattern Recognition and Machine Learning》(PRML)教科书,以及他对神经网络和机器学习的热情。

05:03

🌟 Chris Bishop教授的职业转变与研究领域

在这一段中,Chris Bishop教授回顾了他的职业生涯,从量子场理论的理论研究转向对核聚变的兴趣,并最终投身于神经网络和机器学习领域。他分享了自己是如何被Geoff Hinton的反向传播论文所启发,并开始使用神经网络处理来自聚变项目的数据。Chris还讨论了他对于深度学习在自然科学研究中潜在影响的兴奋,包括物理学。此外,他提到了与GPT-4相关的个人经历,以及他对人工智能未来发展的看法。

10:05

🤖 人工智能与符号主义的融合

在这一部分中,讨论了人工智能领域中连接主义和符号主义的融合。Chris提到了自2012年以来深度学习的显著成功,以及人们对于结合这两种方法的讨论。他指出,像GPT-4这样的模型已经能够进行更高级别的推理,显示出神经网络的能力正在扩展。Chris认为,神经网络的发展类似于人类大脑,能够处理各种不同类型的推理和智能。他还讨论了对于神经网络未来可能性的看法,强调了神经网络的能力和进步速度。

15:07

📖 创作《模式识别与机器学习》的动机

Chris Bishop分享了他创作《模式识别与机器学习》(PRML)这本书的动机和思考过程。他提到,这本书的初衷是为了提供一个全面且易于理解的机器学习领域的学习资源。Chris强调了概率理论在机器学习中的基础性作用,并讨论了在实践中应用贝叶斯框架的局限性。他还提到了大规模神经网络相对于贝叶斯方法的优势,以及如何通过训练更大的网络来提高效率。Chris认为,尽管贝叶斯方法在某些领域仍有其应用,但在主流机器学习中,重点是规模和使用如随机梯度下降等方法。

20:10

🧠 深度学习模型的本质与能力

在这一段中,Chris探讨了深度学习模型的本质,特别是大型语言模型如GPT-4。他强调了这些模型的多功能性和它们如何超越特定任务的能力。Chris提到了模型的通用性,以及它们如何在输入敏感的方式下激活不同的部分。他还讨论了大型语言模型是否应该被视为单一模型或多个模型的集合,并分享了他对于模型作为推理引擎的看法。Chris提出了关于模型能力和专业化的问题,以及大型语言模型在科学发现中的潜在作用。

25:15

💡 人工智能的创造性与未来

Chris Bishop讨论了人工智能的创造性,以及它们是否能够创造出新的东西。他提出了一个观点,即尽管人工智能系统是由人类创造的,但这并不意味着它们本质上不具有创造性。Chris强调了人工智能系统如何依赖人类的创造力和以往的工作,并认为这些系统增加了人类创造力的总和。他还讨论了人工智能系统如何作为辅助工具增强人类的创造力,并提出了关于人工智能未来发展的一些思考。

30:20

📖 创作新书《深度学习基础与概念》

在这一段中,Chris谈到了他与儿子Hugh合著的新书《深度学习基础与概念》的创作过程。他分享了决定包含和省略书中内容的思考过程,以及他们如何专注于提炼出对理解深度学习领域至关重要的核心概念。Chris还提到了他们在创作过程中遇到的挑战,以及他们如何努力在保持书籍紧凑的同时,确保其内容的质量和深度。

35:20

🌐 AI for Science的愿景与实践

Chris介绍了他在微软研究院领导的AI for Science倡议。他强调了科学发现对人类进步的重要性,并讨论了深度学习和人工智能如何加速这一过程。Chris提到了机器学习模型作为模拟器的潜力,以及它们如何能够显著加快复杂数值模拟的速度。他还提到了AI for Science团队的多元化和多学科背景,以及他们对于利用人工智能推动科学发现的热情。

40:21

🧬 深度学习在药物发现中的应用

Chris讨论了深度学习在药物发现中的应用,特别是在寻找针对特定疾病的目标和候选药物分子方面。他描述了药物发现过程的挑战,包括如何从巨大的分子空间中筛选出具有潜在治疗效果的候选药物。Chris提到了利用深度学习模型来生成和筛选候选药物分子的方法,并分享了他们在结核病治疗方面的具体工作案例。他还强调了与领域专家合作的重要性,以及如何将深度学习模型与实验数据结合起来以推进药物发现。

45:23

🌟 深度学习的未来与挑战

在这段对话的最后部分,Chris和主持人探讨了深度学习的未来,包括对于模型架构的持续研究和改进,以及深度学习在控制和规划问题中的潜在应用。Chris提到了深度学习领域的一些开放性问题,例如为什么深度学习模型能够如此有效地泛化,以及如何更好地理解这些模型的工作原理。他还表达了对于人工智能技术未来发展的乐观态度,并强调了在这个令人兴奋的领域工作的重要性。

Mindmap

Keywords

💡人工智能

人工智能是指由人制造出来的机器或软件模拟人类智能的技术,包括学习、推理、自我修正等功能。在视频中,Chris Bishop教授讨论了人工智能领域的多个方面,包括深度学习、机器学习以及人工智能在科学研究中的应用。

💡深度学习

深度学习是机器学习的一个子领域,它使用类似于人脑的神经网络结构来学习数据的表示和模式。在视频中,Chris Bishop教授提到了他与儿子合著的新书《深度学习基础与概念》,强调了深度学习在现代人工智能发展中的核心地位。

💡机器学习

机器学习是一种使计算机系统利用数据来自动改进性能的技术,通过算法分析数据,从中学习并做出决策或预测。视频中,Chris Bishop教授的职业生涯就是围绕机器学习的研究和应用展开的。

💡神经网络

神经网络是一种模仿人脑神经元连接方式的计算模型,用于实现机器学习和深度学习。视频中提到了神经网络在模式识别和人工智能中的应用,以及Chris Bishop教授早期对神经网络的研究。

💡模式识别

模式识别是指计算机系统识别数据中的规律和模式的过程,是机器学习和人工智能中的一个关键任务。在视频中,Chris Bishop教授的著作《模式识别与机器学习》(PRML)是该领域的主要教材之一。

💡科学发现

科学发现是指通过科学方法获得新知识或理解自然现象的过程。在视频中,Chris Bishop教授强调了人工智能和机器学习在加速科学发现中的潜力,特别是在药物发现和新材料研究方面。

💡概率理论

概率理论是数学的一个分支,研究随机事件及其发生的可能性。在机器学习和深度学习中,概率理论为处理不确定性和做出预测提供了基础。Chris Bishop教授认为概率理论是机器学习的基础。

💡自然语言处理

自然语言处理是人工智能的一个领域,它致力于使计算机能够理解和生成人类语言。在视频中,Chris Bishop教授提到了深度学习在自然语言处理中的应用,如GPT系列模型。

💡核聚变

核聚变是一种能源产生过程,通过聚合轻原子核来释放能量。在视频中,Chris Bishop教授提到了他如何将机器学习技术应用于核聚变研究,以实现对高温等离子体的实时反馈控制。

💡GPT

GPT(Generative Pre-trained Transformer)是一种基于Transformer架构的预训练语言模型,能够生成连贯且相关的文本。在视频中,Chris Bishop教授讨论了GPT在自然语言处理领域的重要性和其强大的文本生成能力。

Highlights

Chris Bishop 教授讨论了深度学习的基础和概念,这是他与儿子 Hugh 合著的新书。

Chris 讨论了在写书时如何决定包含哪些内容,以及如何从数千篇每月发表的论文中筛选出核心概念。

Chris 提到了他们如何通过查看关键论文和专注于经得住时间考验的技术与理念,来确定书中的内容。

Chris 强调了他最喜欢的书中的图表,特别是由他的儿子 Hugh 制作的图表,包括 GPT 的 transformer 架构图。

Chris 讨论了他如何通过研究来准备书籍,包括研究关键论文和识别领域中的近期思想。

Chris 回顾了他的职业生涯,从对人工智能的兴趣出发,到深入研究量子场理论,再到完全投入到神经网络领域。

Chris 讨论了他对神经网络和机器学习对自然科学,包括物理学的影响的看法。

Chris 探讨了神经网络和深度学习技术的未来,以及它们如何推动科学发现。

Chris 讨论了他对人工智能的看法,以及他认为我们已经迈出了真正的人工智能的第一步。

Chris 讨论了他对神经网络的看法,以及他认为它们如何类似于人类大脑的工作方式。

Chris 讨论了他对于神经网络和深度学习未来发展的看法,以及他认为我们还没有达到这些技术的潜力极限。

Chris 讨论了他对于人工智能和机器学习领域的安全性、公平性和道德性的看法,以及我们如何确保这些技术对人类有益。

Chris 讨论了他对于深度学习模型的可解释性和我们如何理解它们工作方式的看法。

Chris 讨论了他对于神经网络在控制和预测方面应用的看法,以及这些技术如何被用于实际问题的解决。

Chris 讨论了他对于深度学习技术在未来的应用和发展方向的看法,以及他认为这些技术将如何继续改变我们的世界。

Transcripts

play00:00

Today, we have the privilege of speaking with Professor Chris Bishop,

play00:04

a luminary in the field of artificial intelligence and machine learning.

play00:09

Chris is a technical fellow and director at Microsoft Research

play00:14

AI for Science in Cambridge. He's also honorary professor of

play00:19

computer Science at the University of Edinburgh and fellow of

play00:23

Darwin College, Cambridge. Hi. Nice to meet you, Tim.

play00:27

This is the new book on deep learning foundations and concepts

play00:31

published with my son Hugh. What prop have you got? Ethanol.

play00:35

I don't know whether I'll use it, but we're going to talk about

play00:37

invariance and that's wonderful. That's wonderful because you

play00:39

ought to get a little bit techie at some point, don't you?

play00:41

Oh yeah. Our audience loves that. In 2004, he was elected fellow of

play00:45

the Royal Academy of Engineering. In 2007, he was elected fellow of

play00:50

the Royal Society of Edinburgh, and in 2017 he was elected

play00:55

fellow of the Royal Society. Chris was a founding member of the

play00:59

UK AI Council, and in 2019 he was appointed to the Prime Minister's

play01:04

Council for Science and Technology. At Microsoft Research,

play01:09

Chris oversees a global portfolio of industrial research and development

play01:14

with a strong focus on machine learning and the natural sciences.

play01:19

Chris obtained a BA in physics from Oxford and a PhD in

play01:23

Theoretical Physics from the University of Edinburgh, with the

play01:26

thesis on quantum field theory. Chris's contributions to the

play01:30

field of machine learning have been truly remarkable.

play01:34

He's authored one of the main textbooks in the field,

play01:37

which is Pattern Recognition and Machine Learning, or Prml,

play01:42

and it has served as an essential reference for countless students

play01:46

and researchers around the world. Chris explained in the interview

play01:49

how it steered the field towards a more probabilistic perspective at

play01:53

the time, and he also mentioned his first textbook, Neural Networks for

play01:58

Pattern Recognition and its role in promoting neural networks as a

play02:02

powerful tool for machine learning. So this is the new textbook, Deep

play02:06

Learning Foundations and Concepts. And one of the things that we

play02:10

were proud of with this book is the production values.

play02:12

We really worked with the publisher to to ensure the book would be

play02:16

produced to a high physical quality. And in particular, it's produced with

play02:19

what are called stitch signatures. So if you look down the edge there,

play02:21

you'll see the pages are not simply glued in.

play02:24

Instead, this uses an offset printing technique where 16 pages

play02:29

are printed on a big sheet of paper on both sides, so some of

play02:31

the pages are turned upside down, and then the the page of the paper

play02:35

is folded and then folded and then folded again and trimmed.

play02:37

And the resulting set is called a signature.

play02:40

And they're actually stitched in with cord.

play02:41

And the point about that is it allows the book to open flat.

play02:44

So it means that the book is easy to read and it means it

play02:47

should should last a long time. What are your favorite figures

play02:50

in the book? Chris. Well, the ones produced by my son,

play02:53

of course, are the best. I mean, here's a nice picture of

play02:56

the transformer architecture, which is this is GPT, so you could

play03:00

say it's one of the most important figures in the book, I suppose.

play03:02

And I just love the way the way he's done this.

play03:05

How did you do the research for this? So that's a great question.

play03:13

I think one of the big challenges with writing a book

play03:15

like this is knowing what to include and what not to include,

play03:17

and with literally thousands of papers being published every month,

play03:21

it can be overwhelming for the authors, never mind the readers.

play03:24

So I think the value we add in the book is trying to distill out what

play03:26

we think of as the core concept. So part of this was really looking at

play03:31

key papers in the field, seeing what relatively recent ideas there are,

play03:36

but also trying to focus down on techniques and ideas that we believe

play03:40

will actually stand the test of time. We don't want this book to go

play03:43

out of date in a year or two. We want it to have have lasting

play03:46

value. And of course, it's quite possible

play03:48

there'll be a breakthrough next week and that it will turn out to be a

play03:51

very important new architecture. But for the most part,

play03:54

many of the core concepts actually go back a long way.

play03:57

And so what we've really done is taken some some of the foundations

play04:00

of the field and brought them into the modern deep learning era.

play04:03

But the idea of probability is the idea of gradient based

play04:06

methods and so on. Those have been around for decades,

play04:08

and they're just as applicable today as they as they ever were. Yes.

play04:11

One of the things I really like, actually, is the chapter on

play04:12

convolutional networks. My son Hugh did a, um,

play04:15

did a lot of this chapter. He works on using, um,

play04:18

techniques like convolutional neural nets as part of his work

play04:21

on autonomous vehicles. And, and I think there's a

play04:24

really nice, uh, description here of convolutional networks,

play04:27

really from the ground up, explaining the basic concepts and,

play04:31

but also motivating them, not just saying this is how a

play04:33

convolutional network is built, but why is it built this way?

play04:36

How do we actually motivate it? So that's one of my favorite

play04:38

chapters as well. Yeah,

play04:40

it's been a very interesting career. And this stage of the career,

play04:43

I can now finally look back and make sense of it.

play04:45

But at the time it felt like a bit of a random walk.

play04:47

So actually when I was a teenager, I went to see, um,

play04:50

2001 A Space Odyssey. It was actually very inspired by

play04:53

that rather abstract concept of an artificial intelligence,

play04:56

very different from the usual sort of Hollywood portrayal of robots.

play04:59

So I was very interested in the idea of artificial intelligence from a

play05:02

young age, but I was very uninspired by the field of AI at the time,

play05:06

which was very much sort of rule based and, and didn't seem to be

play05:09

on a path to intelligence. And then I did a PhD in quantum

play05:11

field theory, which is a very hot field at the time.

play05:14

Gauge field theory at Edinburgh University had a wonderful time.

play05:18

At the end of my PhD, though, I wanted to do something a bit

play05:20

more practical, a bit more useful. And so I entered into the fusion

play05:23

program. I'm a big fan of nuclear fusion.

play05:26

Um, it was sort of 30 years away then, and it's kind of still 30

play05:29

years away now, but I'm still a big believer.

play05:31

But I went to work on, uh, on tokamak physics, essentially

play05:35

theoretical physics of, of plasmas, trying to trying to understand the

play05:39

instabilities and control them. So I was working very happily as

play05:42

a theoretical physicist, having a, having a great time.

play05:45

And after about ten years or so as a theoretical physicist, uh,

play05:48

Geoff Hinton published the backprop paper, and it came to my attention.

play05:52

And I found that very inspiring, because there I saw a very,

play05:55

very different approach to towards intelligence.

play05:58

And so I started by applying neural networks to data from the

play06:03

fusion program, because it was big data in its day.

play06:05

I was I was working near to the the Jet tokamak, and they had many,

play06:10

many high resolution diagnostics. So I had lots of data to play with.

play06:13

And I became more and more fascinated by neural networks.

play06:15

And then I did a sort of completely crazy thing.

play06:17

I walked away from a very respectable career as a theoretical physicist

play06:21

and went full time into the field of neural nets, which at the time

play06:24

was not really a respectable field. I would say it's not wasn't

play06:27

mainstream computer science. It certainly wasn't physics.

play06:29

It wasn't really anything. But I just found it very inspiring.

play06:32

And I was particularly inspired by the work of Geoff Hinton.

play06:34

And so I've been in that field for, you know,

play06:36

three and a half decades now. And of course,

play06:40

recent history suggests that was probably a good career move.

play06:43

Um, and now most recently, I've brought the two ends of my

play06:46

career together because I'm now very excited about the impact that

play06:49

neural nets and machine learning are having on the natural sciences,

play06:51

including physics. Hinton is a famous connectionist,

play06:54

so he believes that knowledge is subsymbolic.

play06:58

And I was speaking with Nick Chaytor the other week.

play07:01

He had a book called The Mind Is Flat, which is talking about the

play07:04

inscrutability of of our brains. How do you feel that things have

play07:08

changed? I mean,

play07:09

you were talking about a convergence of these different ideas in AI.

play07:12

I think one thing that's very interesting is that there has been

play07:15

a lot of discussion, let's say, from 2012 onwards, when deep learning

play07:18

was clearly being very successful, a lot of discussion that it was

play07:22

missing the sort of symbolic approach that we somehow had to find a way

play07:25

to combine this connectionist approach and to use that sort of

play07:28

probably rather dated terms now. But but that sort of, you know,

play07:30

that neural net approach with the more traditional symbolic approach.

play07:33

And I think what we've seen with with models like GPT four, for example,

play07:37

that it's perfectly capable of reasoning at a more symbolic level,

play07:40

not at the level of a human being, of course, but it can do that,

play07:42

that kind of more abstract, higher level reasoning.

play07:45

And so I think what we what we're seeing with neural nets is

play07:48

rather like the human brain. The human brain doesn't have a

play07:50

connectionist neural net piece. Then some other machinery that

play07:54

does symbolic reasoning that that same substrate is capable

play07:57

of all of these different kinds of reasoning and these different

play08:00

kinds of intelligence. And we're starting to see that

play08:02

emerge now with neural nets. So I think, I think for me,

play08:04

the discussion of should we somehow combine symbolic.

play08:07

Reasoning with, with connectionism, know that that

play08:09

to me that's a piece of history. It's about how can we how can we

play08:12

expand on the capabilities of neural nets. Yeah. That's so interesting.

play08:15

I remember there was a paper by Pylyshyn.

play08:17

I think it was the connectionist critique in 1988,

play08:21

and I was quite sold on this idea of, you know, systematicity and

play08:25

productivity and so on. And even now, folks from that school

play08:29

of thought think that our brains are Turing machines, this ability

play08:33

to address potential infinity. And I guess what I'm what I'm

play08:37

getting from, from what you're saying is that the distinction

play08:39

isn't really there anymore. You can do that kind of

play08:42

reasoning with neural networks. Well, I take a very simple view,

play08:45

which is that neural nets in since 2012, in particular,

play08:48

have been shown to be spectacularly capable and there's no end in sight.

play08:52

The rate of progress is faster now than ever, so it seems very straight.

play08:55

Nobody imagines that that machine learning and deep learning has

play08:58

suddenly ended at, you know, whatever the time is today.

play09:00

You know, this is this is this is the beginning of an S curve.

play09:03

So the idea that we would worry so much about the limitations of neural

play09:07

networks and what they can't do, I think we just put the word.

play09:10

Yet at the end of it, neural nets can't do x, Y and Z yet,

play09:13

but I don't think any sense. We've hit the buffers of of what

play09:16

neural nets can do. And it's by far the most

play09:19

successful or the most rapidly advancing technology we have.

play09:21

So to me, you should look for the keys under the lamppost.

play09:24

We have this powerful technology that's getting better by the week.

play09:27

Why would we not see how far we can push it rather than worry

play09:30

about its limitations? Absolutely. Now, um, Professor Bishop,

play09:33

you are incredibly famous, um, for your book, Prml.

play09:37

But of course, it wasn't your first book as you were just speaking to.

play09:39

But what was I mean, could could you just tell us about your your

play09:42

motivations and just the thought process behind that book? Yes.

play09:45

So as you said, it wasn't my first book.

play09:47

My first book was published in 1995, Neural Networks for Pattern

play09:49

Recognition. And that book had a very specific

play09:52

motivation, which is that I was a newcomer to the field.

play09:54

I mentioned earlier that I got excited about backprop and and sort

play09:57

of transitioned from theoretical physics into machine learning.

play10:00

That was my way of learning about the field.

play10:02

You know, if you're a university professor, a great way to learn about

play10:05

something is to teach a course on it, because it forces you to think

play10:08

about it very carefully. You're going to get tricky questions

play10:10

from smart students, and you're very motivated to to really understand it.

play10:13

And so for me, the analog of that was was writing a book.

play10:17

Um, PML was rather different by by the time we got to what's

play10:20

published in 2006, and by then the field was much larger in a sense,

play10:23

it was much more mature as a much more established and respected field.

play10:26

There are many courses on machine learning.

play10:28

The goal there was very different. I simply wanted to write the,

play10:31

as it were, the book that everybody would use to learn about the field.

play10:34

So it was trying to be comprehensive, but trying to be to explain the

play10:38

concepts as clearly as possible. And so really that was the goal.

play10:43

The goal was to in a sense, you know, replace the earlier neural nets

play10:46

for pattern recognition book, which was which served an important

play10:49

role in its day, I think, but really tried to produce a single

play10:53

coherent text where people could learn about the different topics in,

play10:56

you know, with a shared notation and hopefully trying to explain

play10:59

things as clearly as I could. We know in theoretical physics,

play11:01

you know, you can you can write down an equation, but solving it

play11:04

may be extremely difficult. You have to resort to approximations,

play11:06

but it's still nice to have that that North star,

play11:09

that compass that guides you. And so for me, I try to think of

play11:13

machine learning in similar terms. There are some some foundations

play11:16

that that really don't change much over time that are that are

play11:19

very good guiding principles. And we're dealing with data,

play11:22

we're dealing with uncertainty. We want to be quantitative.

play11:25

So you're led very naturally indeed uniquely into probability theory.

play11:28

And if you apply probability theory consistently that is the

play11:31

Bayesian framework. So for me the Bayesian framework

play11:34

is a very natural bedrock on which you can build and think

play11:37

about machine learning. Now, just as with theoretical

play11:40

physics, you can't often just solve things. Exactly.

play11:42

And certainly the Bayesian paradigm calls for integration or

play11:46

marginalization of all possible values of the parameters in your

play11:50

neural network. Well, you always operate with a

play11:52

fixed computational budget, right? It may be a huge one,

play11:55

but it you're always constrained by by computational budget.

play11:58

And should you spend that budget doing a very thorough Bayesian

play12:01

marginalization over a small neural network, or should you take the

play12:04

same number of compute cycles and train a very much larger network?

play12:07

And if you have plenty of data to train the larger network,

play12:10

then the latter seems to be much more effective in a practical sense.

play12:13

So while from a practical point of view, the Bayesian approach

play12:16

still has certain applications in in various domains,

play12:20

for the most part, it's not the framework we'd want to use in, in

play12:23

sort of mainstream machine learning. Today, we're much more interested

play12:27

in scale and making point estimates and using stochastic

play12:30

gradient descent and so on. So I still think that students

play12:33

should learn the basic ideas of of Bayesian inference,

play12:36

because really they have to learn. You have to learn about probability.

play12:39

I don't think you can be a machine learning and not

play12:41

understand probability. And then once you understand

play12:43

probability and you apply it uniformly that, that that really

play12:46

is the Bayesian framework. So I think it's the foundation.

play12:49

But then you're led to make approximations.

play12:52

And in particular you make point estimates.

play12:53

So in practice you don't actually execute the full Bayesian paradigm.

play12:57

Yeah I agree that um, Bayesian reasoning is it's beautiful and it's

play13:02

the continuation even of sort of propositional logic in the domain of,

play13:05

of uncertainty. It's fundamental. But there is this question of, um,

play13:09

the world is a very gnarly place, and folks argue that the brain is a

play13:14

kind of Bayesian inference machine, but it can't it can't possibly

play13:19

be solving the intractable Bayesian problem.

play13:22

And therein lies the question. So there are many hybrids or even

play13:26

deep learning approaches could be seen as some kind of a continuation

play13:29

or somewhere on the spectrum between maximum likelihood point

play13:33

estimation and Bayesian, um, models. I mean,

play13:37

how do you think about that spectrum? I think that's a great that's a

play13:39

great question. I think you're spot on there.

play13:41

If you look back to a time when there were a lot of competitions,

play13:45

here's a data set. We're going to hold out the test set.

play13:47

You've got a score as high as you can on the test set.

play13:49

And what approach should you use. The winner always is an ensemble.

play13:53

You should try ten different things, preferably diverse, and then combine

play13:57

them suitably, maybe taking an average or some smarter combination.

play14:01

And that ensemble will always outperform any one single model.

play14:05

So if you're not constrained by compute and in some of those

play14:07

competitions you weren't, then the ensemble always wins.

play14:10

And you can think about that ensemble as like as you say, a sort

play14:13

of rough and ready approximation to a full marginalization over

play14:17

all of the uncertainty in the predictions that you might make.

play14:20

And so I think there's a little glimmer of sort of Bayesian

play14:22

approaches coming through there. But again, you know, in the modern

play14:25

era, you're probably better off training one single large model than

play14:29

ten smaller ones than averaging. So it's um, so I think knowing

play14:34

about the Bayesian paradigm and understanding where you can learn

play14:36

from it is still valuable today. But nevertheless, um, it's unlikely

play14:42

in most applications that you're going to want to apply the full

play14:44

Bayesian machinery because it's just so computationally expensive.

play14:47

Fascinating. I mean, just one more thing on this.

play14:50

Um, do you think of large, you know, let's say large language models,

play14:53

but large deep learning models, do you think of them as one model,

play14:57

or do you think of them as an inscrutable bundle of models?

play15:00

Because we're kind of getting into the no free lunch theorem here.

play15:02

Um, coming from the Bayesian world, we design models, you know, using

play15:06

principles and with neural networks we just train these big black boxes.

play15:11

So do you think of them as one model or lots of models?

play15:13

I certainly I always think of them as a single model.

play15:16

I've never thought that thought of them as separate models,

play15:18

unless you unless you explicitly construct a mixture of experts or

play15:20

something like that where you have an internal and internal structure.

play15:24

Um, I guess everything is sort of very distributed and somehow sort

play15:27

of holographic and overlapping. And, you know, a remarkable thing

play15:30

about GPT four is that, um, you know, you often see people

play15:34

when they first they first use it, they'll ask, ask some questions.

play15:36

How tall is the Eiffel Tower? And it probably gets the right

play15:38

answer, you know, and say, oh, that's kind of interesting.

play15:41

And you sort of a little bit disappointed in this technology,

play15:43

but it's like being given the keys to a very expensive sports car.

play15:46

And you notice the cup holders and you notice that it can can

play15:48

support a cup rather nicely. You don't realize you need to start

play15:51

the engine and drive off in it to really get the full experience.

play15:54

And so until you realize that actually you can you can have a

play15:56

conversation, it can, it can write poetry, it can explain jokes,

play15:59

it can write code, it can do so many, many different things.

play16:01

And all those capabilities are embedded in the same model.

play16:04

And and what is, I think a really interesting lesson of the last few

play16:07

years is that models like GPT four outperform the specialist models.

play16:11

So for example, in my lab, we had a project for many years which

play16:14

essentially said the following. It said, well, you know, this is

play16:17

Microsoft World's biggest software company. We have lots of source code.

play16:20

We could use source code as training data for machine learning.

play16:23

We'd be able to do all sorts of things.

play16:24

You know, spot bugs, do autocomplete, you know, all kinds of things you

play16:28

could do if you had a good model of source code and the project was

play16:31

reasonably successful, it was, you know, it worked reasonably well.

play16:34

But what we've learned is that when you build one gigantic model that

play16:38

that yes, it sees source code, it sees scientific papers,

play16:41

it sees Wikipedia, it sees many, many different things in some way.

play16:44

It becomes better at writing source code than a model specifically

play16:48

for writing source code. And there are even even been ablation

play16:50

studies where people have have a model that's trained to solve maths

play16:53

problems, and it does reasonably well, and now you give it some

play16:56

apparently irrelevant information, let's say from Wikipedia.

play16:59

But with anything to do with maths stripped out and you find it

play17:02

actually does better at the maths. So I think there are things here

play17:04

that we don't really understand. But the general lesson, I think,

play17:07

is fairly clear that when you have a larger, very general model,

play17:11

it can outperform a specific model, which I think is very interesting.

play17:15

I guess the reason I was talking about the no free lunch theorem

play17:18

is it feels to me, as you say, that models behave quite differently

play17:22

in an input sensitive way. Uh, so you ask them about this

play17:26

particular thing, and it's almost like it's a

play17:27

different model because different parts of the model get activated.

play17:30

And then there's this question of, well, is the no free lunch

play17:34

theorem violated? Can there be such a thing as a

play17:37

general foundational agent that could, in robotics, just do really

play17:42

well in any game or any environment? Or do you think, do you think there's

play17:45

still some need for specialization? Another great question.

play17:50

So I think, um, uh, these are really open research questions.

play17:54

Honestly, I'm not sure anybody really knows, but I think one of the

play17:57

lessons is that the general can be more powerful than the specific.

play18:00

So clearly, one of the research frontiers we

play18:02

should push on is greater and greater generality and see. So.

play18:07

GPT four can't ride a bicycle. But if we have models that can

play18:09

can do robotics, should they be separate and distinct models,

play18:12

or if we somehow combined everything into a single model,

play18:14

would it be more powerful? And there's a decent chance that

play18:17

the latter would be true, that it would be more powerful.

play18:19

So certainly that's one research frontier we should push on.

play18:22

An area I'm very interested in these days is is deep learning for for

play18:26

science, for scientific discovery and science, amongst other things,

play18:29

involves very precise, detailed numerical calculations.

play18:32

Now, if you want to multiply some numbers together, GPT four would

play18:35

be a terrible way of doing it. It might give you the wrong answer,

play18:37

and even if it gets the right answer, you're burning a tremendous

play18:40

amount of compute cycles to do something you could do with far

play18:43

fewer compute cycles. So there will still, as far as I

play18:47

can see in certain domains, be a role for specialist models.

play18:50

But even then I can see them being integrated with things like large

play18:54

language models, partly to provide, um, human interface,

play18:58

because one of the wonderful things about language models is they

play19:01

they're so easy to interact with. You don't have to be a computer

play19:03

programmer. You just have a natural

play19:05

conversation with them. But also so, um,

play19:07

the other remarkable thing about the large language models, I think

play19:10

there are two remarkable things. The first of all is that they're

play19:13

so good at human language. Maybe that's not too surprising

play19:16

because they're sort of designed to do that,

play19:18

but by virtue of being forced to effectively compress human language,

play19:23

they become reasoning engines. And that's a remarkable discovery.

play19:27

Right? That is a big surprise. Certainly to me.

play19:29

I think to many people, perhaps to everybody in the field, that they

play19:32

can function as reasoning engines. And so even if you're, let's say,

play19:35

doing some specialist scientific calculations, you might still think

play19:39

about large language model as, as a kind of a copilot for the scientist,

play19:44

helping the scientist reason over what increasingly consists of

play19:48

massive, massively complex spaces, very high dimensionality,

play19:52

many different modalities of data, it's harder and harder for humans to

play19:55

sort of wrap their head around this. And this is where I think a

play19:57

large language model can can, can be valuable.

play20:00

But I still see it calling on specialist tools in the foreseeable

play20:03

future because you were talking about statistical generalization,

play20:06

but you could argue that language models can't do let's say they can't

play20:10

compute the nth digit of pi because they don't have an expandable memory.

play20:13

They're not Turing machines. So that that's a computational

play20:15

limitation. But but they might be able to do

play20:17

this statistical generalization, as we were talking about,

play20:20

even though it might in fact be a weird form of specialization in terms

play20:25

of an ensemble of methods and models inside, um, a large language model,

play20:29

but on the on the language thing and the reasoning, this is fascinating.

play20:32

So I think that language is a bunch of memetically embedded programs.

play20:38

So we we play the language game and we establish cognitive categories.

play20:44

We embed them and share them socially.

play20:45

And it's like there's a little simulation out there and I'm

play20:48

using that to think. But the question always is to what

play20:51

extent, um, is that that's a bunch of processing that previous humans

play20:56

have done, and we can use it. But can the language model

play20:59

create new programs like that? This is, I think, part of a

play21:03

fascinating and broader discussion. So I do hear a lot of, oh, it can't

play21:06

do X, Y, and Z often. That's true. And I always put the word yet at the

play21:10

end of it, because I don't know any law of physics that it can't do,

play21:12

there are some things which perhaps the current

play21:14

architectures provably can't do, but but there's lots of exploration

play21:17

in different architectures. There's a lot of scope for, for,

play21:20

for expanding and generalizing neural nets.

play21:22

So I always think of it can't do a certain thing yet.

play21:24

But a lot of the questions, a lot of the comments about the

play21:28

limitations of models, um, I have a have a hypothesis on this.

play21:32

I mean, let me test this out on you. I maybe I may be way short of the

play21:34

mark on this one, but a lot of the, a lot of the critique of what models,

play21:38

um, seemingly can't do, especially when it's, oh,

play21:42

they will never be able to do this. They cannot be creative or they

play21:44

cannot reason or they cannot whatever.

play21:47

I wonder if a lot of this comes to, um, to a much more fundamental point

play21:51

that's not actually a technical one. It's really to do with the human,

play21:53

the human journey over the last few thousand years because we've,

play21:56

you know, a few thousand years ago, I guess, most humans would have

play21:59

perceived humanity as the center of the universe.

play22:01

The Earth was the center of the universe.

play22:03

The universe was created for the benefit of humanity.

play22:05

We had this very arrogant view of our own importance and what we've learned

play22:09

over the centuries, especially from fields like astronomy, is.

play22:11

Of course, you know, the entirety of humanity's existence is a brief blink

play22:16

of the eye compared to the existence of the whole universe and our and

play22:20

our physical place in the universe. In terms of length, scale, we're on a

play22:22

little speck of dust orbiting an insignificant star in a rather boring

play22:25

galaxy in this colossal universe. And and so I think it's natural

play22:29

for us as humans to sort of continue to cling to the things

play22:32

that we feel make us special. And we're certainly not the

play22:34

fastest creatures on Earth. We're not the strongest, but it's our

play22:37

brains that seem to make us unique. We are the most intelligent creatures

play22:40

by far on Earth, and so we think of our of our intelligence as

play22:44

being the very special thing. Yes. Okay, we get it that we're just

play22:47

living in a boring corner of the universe, but nevertheless,

play22:49

it's our brains that make us special. So let me tell you a little story,

play22:52

which is, uh, because I work for Microsoft, I was very privileged

play22:56

to have early access to GPT four when it was still a highly tested,

play22:59

highly secret project. And so I was exposed to GPT four

play23:04

at a time when I could only discuss it with a very small.

play23:06

Small number of very specific colleagues.

play23:08

And for everybody else, I couldn't couldn't even talk about it.

play23:10

And it was quite, a quite a shocking moment.

play23:13

The the ability to, to understand and generate language

play23:17

sort of didn't come as so much of a surprise, because of course,

play23:20

I'd been following GPT two and GPT three and, you know, knew this

play23:23

technology was getting better. But this ability to reason there was

play23:27

a a sort of visceral reaction I had, which took me right back to that

play23:30

film, 2001, that sense of I was engaging with

play23:33

something which my colleague Sebastian Bubeck called it the

play23:36

sparks of artificial intelligence. So nobody, nobody's claiming GPT

play23:39

four is anywhere close to human intelligence or anything like that.

play23:42

But there was just the first glimpse of something.

play23:45

It was the first time in my life that I'd interacted with something

play23:48

that wasn't a human being, that had a glimmer of this,

play23:51

this high level of intelligence. And, um, and realizing this may

play23:57

be the dawn of a, of a new era that may be even more

play24:00

significant than the 2012 moment of the dawn of deep learning.

play24:03

There was something very special going on, and I wonder if part

play24:06

of the reaction that we have to these models is a little bit of

play24:10

that sense of that threat to the specialness that we feel as humans.

play24:13

I may be completely wrong. This is purely speculation, but,

play24:17

you know, it's interesting that we talk about people use phrases

play24:20

like stochastic parrot. It's just regurgitating stuff

play24:22

that that it's seen before some people claim or, you know,

play24:25

of course it hallucinates. Sometimes it comes up with stuff

play24:28

that's just wrong or doesn't make sense.

play24:30

But but think about the following. Imagine there was a very,

play24:33

very smart physics student. Went to went to a top university,

play24:37

worked really hard for four years. What would they do?

play24:39

They would they would read books. They would read papers, listen to

play24:42

lectures, have discussions with their professors and with other students,

play24:45

and then they sit their final exam and they get 95% in their final exam,

play24:50

and they come top of the year. We don't say, huh.

play24:53

Well, 95% of the time there are stochastic parrot regurgitating

play24:56

Einstein and Maxwell, and the other 5% of the time they're hallucinating.

play25:00

No, we say congratulations, you have a first class honors degree.

play25:03

You've graduated with honors. This is this is a, you know,

play25:05

a wonderful achievement. So it's interesting that we do seem

play25:10

to view the capabilities of, of neural nets with almost a different

play25:14

ruler to, to that of humans. And while nobody suggesting that

play25:18

current models are anywhere close to humans on many axes of intelligence,

play25:22

nevertheless I see the first sparks of of of artificial intelligence.

play25:26

And just one final comment. Um, the Terme AI artificial

play25:30

intelligence has been very popular for many years. I used to hate it.

play25:32

I used to always say, that's machine learning.

play25:34

None of these systems are intelligent.

play25:36

They're very good at recognizing cats in images.

play25:38

There's nothing really intelligent about this in one sense.

play25:42

Um, and yet now I find for the first time I feel comfortable talking

play25:45

about artificial intelligence, because I think we've taken the first

play25:48

baby steps towards what I think of as true artificial intelligence.

play25:51

I still think that agency and creativity are the distinguishing

play25:55

feature, not necessarily that we are biological beings.

play26:00

It's more to do with we are independent agents and we are

play26:04

sampling random things from our local worlds, and we're combining them

play26:08

together in in interesting ways. And in doing so, intelligence is

play26:12

about the process of building models and sharing models and

play26:17

embedding models in our culture. So it feels to me that GPT was

play26:21

building models at the time it was trained.

play26:24

And and that's all it's doing. I can imagine a world where

play26:29

there were lots of gpts. We all had GPT in our pockets,

play26:32

and maybe then it would be much more like biomimetic intelligence.

play26:36

I think there are lots of interesting points that you touched on there,

play26:39

Tim. So, um, I think one thing is in

play26:42

terms of creativity, you know, are these systems creative?

play26:46

It's certainly true. They only exist because of humans.

play26:48

They are created by humans. And and we should acknowledge that.

play26:51

But I don't think it means they're intrinsically not creative.

play26:54

If I asked an artist to, um, paint me a picture of some people

play26:58

walking on the beach with a sunset or whatever, and they came

play27:02

back a few days later with some beautiful picture, I might hate it.

play27:05

They may have used very vivid colors. I might like pale pastel colors,

play27:08

but that's a matter of opinion. But I wouldn't deny that there

play27:11

was creativity there. But their expertise came because,

play27:15

well, they went. They had some intrinsic ability

play27:17

in some sense. But they went to art school.

play27:19

They studied the work of other artists.

play27:20

They practiced, they got better. And, and, and that creativity

play27:24

owes a lot to what went before. But I don't think it diminishes that

play27:28

in the same way a physics student who can explain the theory of relativity,

play27:32

you have to say, well, you didn't invent the theory of relativity.

play27:34

No, Einstein invented that. You only learned it from Einstein.

play27:36

But it doesn't diminish, um, the fact that they have understanding,

play27:40

the fact that they convey it, and the fact they can potentially

play27:43

think in new ways and be creative. So I'm, I'm less convinced about

play27:49

discussions about the limitations of, of, of the technology in general

play27:53

and where it can go. I don't particularly see any

play27:55

limitations. The brain is a machine. It uses this tum we used earlier

play27:59

connectionist approach. It uses these fine grained

play28:01

neural nets. And and so there are similarities to

play28:04

the technology that we have now. There are also huge differences.

play28:07

Some of those differences point to the artificial neural nets

play28:10

being much more powerful than biological neural nets.

play28:13

And Hinton has made a strong point of this lately.

play28:14

And I think it's a very interesting perspective.

play28:17

So I would be the first to say, yes, the technologies we have on many

play28:23

axes are a long way short of humans on many axes. They're much better.

play28:26

GPT four can create text much better than any human.

play28:29

I mean, to produce a page of coherent text that's correctly

play28:32

punctuated and good grammar and so on in a few seconds.

play28:35

There aren't many people that can do that, I think.

play28:37

So on an increasing number of axes systems clearly outperform humans,

play28:42

and on others there's still a very long way to go.

play28:44

But I think one of the nice things about technologies like this,

play28:47

generative AI technologies, whether it's, you know,

play28:50

Sora for creating videos or GPT four or whatever it might be,

play28:53

is they do rely on a prompt. There is a clear role.

play28:56

They are copilots, as we say they they sit there and do nothing,

play29:00

and you use them as a sort of a a cognitive amplifier.

play29:03

You have an idea sort of half baked, and now you can engage in a

play29:07

conversation and sure enough, it can come up with a different

play29:10

way of thinking and say, hey, that's really good. I like that idea.

play29:12

Now let's take that, work that back in and try again.

play29:15

And so it becomes now a companion, a copilot, something that enhances

play29:19

your your cognitive ability. But the human is still very much in

play29:22

the loop and playing a key part and actually initiating the process.

play29:25

And then of course, finally, at the end of the day,

play29:27

you're the one that selects the, you know, the ten video clips,

play29:29

you pick the one that you like. And so the human is very much

play29:32

involved in the loop throughout. So I think that's a very nice

play29:34

feature of this technology. I completely agree with that.

play29:37

So at the moment AI's are embedded in the cognitive nexus of humans.

play29:42

So we have the agency and we drive these things and they help us think.

play29:47

And also, I agree with you that it doesn't

play29:50

make sense to think of these things as limited forms of computation.

play29:52

We should think of the collective intelligence.

play29:54

So we are Turing machines and we are driving these things and we

play29:57

are sharing information. So when you look at the entire

play30:00

system, it is a new type of memetic intelligence.

play30:03

In fact, you know, to to a certain extent, GPT four

play30:07

isn't running on Microsoft's servers. It's in all of us. Right?

play30:10

And that's that's a wonderful way, um, to think about it.

play30:14

But to me, the extent to which it is constraining our agency and

play30:19

creativity is what I'm fascinated by. So GPT says,

play30:23

unraveling the mysteries. And, you know, the the intricate

play30:26

dance of X, Y, Z and all of these weird motifs and constructions.

play30:31

And maybe that's just the way that our life has constrained the model.

play30:36

Or maybe it's it speaks to the constraining forces in general of

play30:40

having these low entropy models that kind of, you know, snip off a

play30:43

lot of the interesting pathways. So we are very creative.

play30:47

GPT four resists creativity a little bit. Is it a problem?

play30:51

Well, I think there's some design choices there.

play30:53

So you talked about reinforcement learning through human feedback as

play30:56

part of that alignment process. We want to create this

play30:58

technology in a way that does good to minimize harm.

play31:01

And so naturally we do constrain it. So for sure it's true that a

play31:05

constrained GPT four behaving you might say less creative ways

play31:09

but perhaps more helpful and beneficial ways.

play31:11

And it's appropriate that we should do that.

play31:13

And perhaps we lose a little bit of the creativity, um, in the process.

play31:17

And so there's a balance. There's a, there's a,

play31:20

there's a choice to be made, a design choice in how we want

play31:23

to create the technology. And we should be very deliberate

play31:25

about that and not not apologetic for that.

play31:27

I think it's good that we are that we are making those design choices,

play31:30

but people sometimes have an intuition that it's not creative and

play31:35

contrast that to I'm using DaVinci resolve and I'm using all of these

play31:39

nodes and I have all of these filters and, and processing transforms.

play31:44

The difference seems to be that I'm designing the architecture.

play31:48

So I'm using cognitive primitives, and I'm composing them together

play31:52

in a new way. And by tweaking the parameters

play31:54

on the filters, I'm going off piste a little bit.

play31:57

I'm doing I'm creating the structure myself.

play31:59

Whereas in neural networks the structure is implicit.

play32:03

I don't know what the structure is. Well, I think you're talking

play32:06

about your contrasting two different kinds of tools there.

play32:08

So the video editing tool, um, is designed so that it follows

play32:11

your instructions very precisely. And you prefer one tool over another,

play32:15

perhaps because the interface is easier to use, you get the

play32:17

results faster, but you have in your you've done the creativity.

play32:20

You've designed this to video edit that you want, that you want to have.

play32:24

And now the tool is to try to get you to that as fast as possible,

play32:26

as accurately as possible. But sometimes we need more than that.

play32:30

Sometimes, you know, if you've got writer's block and you don't know

play32:32

where to begin, having a tool like GPT four could be very powerful.

play32:35

You're not you're not delegating the entire process to the technology.

play32:39

You're working with it as a, as a as a copilot,

play32:42

as an assistant that can can for sure help you with that creative process.

play32:46

It will come up with with crazy things.

play32:48

And most of them you may not like, but maybe one of them you don't

play32:51

like it either, but it causes you to think about something that you

play32:53

would otherwise not have thought of. And so the two working together

play32:57

can surely be more creative. So I think certainly as a as a

play33:00

working in unison with humans, it's certainly enhances creativity.

play33:03

So that's certainly my experience I think there's no doubt about that.

play33:06

But also if you think about, let's say, a simple example,

play33:09

I think most people relate to which is which is image generation.

play33:12

You're giving a talk and you want some image to illustrate the talk.

play33:15

And, you know, you could go to stock images and you know,

play33:18

it's a fixed set and you know, you can't easily adjust it or you or you

play33:23

go to editing the images yourself. That's a sort of slow and

play33:26

painful process. But now you can just with a

play33:28

simple prompt, you know, you can get a bunch of examples.

play33:31

And if one of them isn't quite what you like, you can alter the

play33:34

prompt and fine tune it. And it now becomes that that process,

play33:38

which is a creative process. Um, and you can sort of say the

play33:41

human is in the driving seat, but the overall creativity is

play33:44

certainly enhanced. And when you take a text prompt and,

play33:47

and the machine produces this beautiful photorealistic image,

play33:51

I mean, how many of us weren't absolutely

play33:53

blown away by the incredible advances in generative AI over the last,

play33:57

you know, the last decade? Um, why would you not call that

play34:01

creative of a human being? Did it you would call it creative.

play34:04

Why? Why are we not allowing the machine

play34:05

to be described as creative? That's the piece that I don't

play34:08

that I don't quite understand. So you could argue that creativity is

play34:12

just pure novelty of the artifact. So it's just how much entropy is

play34:15

in the artifact. But, um,

play34:17

you could you could think of GPT four prose as being a kind of category.

play34:21

So there's a lot of variance in there.

play34:23

Um, but there are also certain motifs.

play34:25

And now when people see the motifs, they say, oh, I've seen that

play34:28

a million times before. So I did think it was novel and

play34:30

interesting, and now I don't. And, but but this is the thing.

play34:34

So now when I'm writing blog posts and stuff like that,

play34:37

I'm deliberately trying to do something genuinely creative.

play34:40

You know, it's almost like the intrinsic creativity isn't important.

play34:44

I don't want people to think that I use GPT four, so that's driving it.

play34:48

Do you see what I mean? Yeah. So I mean, clearly creativity is

play34:51

about novelty and novelties. Um, you know, what we desire here,

play34:55

but whether that novelty has value or not, that's a subjective opinion.

play35:01

In your case, it's whether it's achieving the goals that you desire.

play35:04

So I think there is no doubt that even if you say we're just taking

play35:08

existing ideas and combining them in new ways, everything that humans do,

play35:12

I think builds on the work of their own previous experience

play35:15

and on the work of others. And and I think that's

play35:18

absolutely fine. That's a wonderful thing about

play35:19

humanity is that we from generation to generation, we build upon the

play35:23

work of what's gone before and the machines that we build now are

play35:26

heavily dependent on the creativity and the work of humans before,

play35:30

because they learn from humans and they're designed by humans.

play35:33

And I think that's absolutely fine. It's a wonderful thing.

play35:35

And they add to the sum total of human creativity.

play35:37

And that's a wonderful thing. Chris, you've written a really

play35:41

beautiful book and you wrote it with your son Hugh.

play35:46

And there was a picture of Hugh, and I think in the introduction of,

play35:49

of Prml. And I guess part of what I want

play35:53

to understand is, is deep learning is is a huge field.

play35:56

Um, I mean, what was the thought process and how did you decide what

play36:00

to tackle and what not to tackle? Great questions.

play36:03

There's an interesting story behind the new deep Learning book, which

play36:06

is that Prml was written in 2006. It predates the deep learning

play36:12

revolution, and what is constantly surprised me is just how popular it's

play36:16

remained, in spite of the fact that, in one sense, it's massively out

play36:19

of date because it has no mention of the most important thing in

play36:22

the field of machine learning. So I've long felt it was time to

play36:25

update the book, produced the second edition,

play36:27

add some material and deep learning. Um, but life is busy, and you know,

play36:32

anybody who's ever written a book will tell you that it takes

play36:35

way more effort than you can possibly imagine if you've not

play36:37

actually had that experience. And so I never really got around

play36:40

to to doing it. And then along came the Covid

play36:42

pandemic and we all went into lockdown.

play36:44

And I feel like it was one of the very privileged people in

play36:47

that lockdown. We were we were locked down together

play36:50

as a family, um, in Cambridge. And you know, when you're locked

play36:55

down at home for several months, you kind of need a project.

play36:58

And, and I thought this would be a great time to think about a

play37:00

second edition of the Prml book, because, you know, what else are

play37:03

you going to do in lockdown? And it became a project with my

play37:05

son because he was he was with me by this time.

play37:08

He he'd, uh, um, gained a lot of experience,

play37:11

master's degree in machine learning. And he'd been working in

play37:13

autonomous vehicle technology. And in a sense,

play37:16

he had a lot more practical, hands on experience with deep learning

play37:18

than than I did at that point. And so we started this as a

play37:21

joint project. But we very quickly realized that

play37:24

what was needed was not a couple of extra chapters on Prml, but rather

play37:28

the whole field had changed so much. And also, we didn't want to

play37:32

write a book that we were just accumulated more and more material.

play37:35

It would just become a huge, a huge tome.

play37:38

The value of a book, I think is, um, is in the distillation is in

play37:43

the way it draws your attention to a subset of specific things.

play37:46

This is the small set of things that you really need to understand,

play37:48

and then you're equipped to go off into the field.

play37:51

So what we omitted was almost as important as what we what we added.

play37:54

And we very quickly realized this was a this was a new book.

play37:57

So we, we, we we called the book Deep Learning Foundations and Concepts.

play38:01

And we made a lot of progress. But then of course,

play38:04

the lockdowns ended. Um, I started a new team called

play38:07

AI for Science at Microsoft. Uh, Hugh started at Wave

play38:10

Technologies, building the the core machine learning technology

play38:13

for their autonomous vehicles. And we were all just far too busy.

play38:17

And then the next thing that happened was the ChatGPT moment where,

play38:21

you know, in a space of a few weeks, 100 million people were using this

play38:24

and suddenly AI machine learning was in the in the consciousness

play38:28

of the general public. And we realized that if ever there

play38:32

was a time to finish this book, it had to be now.

play38:34

And so we had just a really big push to to get the book finished and

play38:39

available for, for NeurIPS in 2023. And we made it just, you know,

play38:44

at the last minute as you do. And the book was on display at

play38:48

NeurIPS there, and Hugh and I spent the week going

play38:51

around the conference together. Talking to folks at posters and

play38:55

and just had a great time. So it was actually a huge privilege

play38:58

to to be able to write the book with my son. Yeah. That's fantastic.

play39:02

Um, what was your favorite chapter? And I mean, are there any, um,

play39:06

things that you felt were omissions that you would have liked to do,

play39:09

but you just had to draw a line under it? Yeah.

play39:12

In terms of favorite chapters, I mean, of course, the things the

play39:15

the more recent architectures were particularly interesting.

play39:17

I very much enjoyed writing the diffusion chapter, and Hugh had a

play39:21

lot of input into that chapter, and of course, Transformers as well,

play39:24

and just understanding how to how to integrate the sort of the

play39:29

different generative frameworks, how to bring think about Gans and

play39:33

how to think about variational autoencoders and how to think

play39:35

about normalizing flows and so on, and how to think about those

play39:37

under one umbrella and present them in a more coherent way.

play39:40

So that was that was part of the interesting for me,

play39:43

the learning experience. I always enjoy learning new things,

play39:45

and I learned things writing that book, and I think Hugh did as well.

play39:48

And so in a sense, that was that was the favorite part of the book.

play39:51

The things where I where I learned new things or new ways of looking at

play39:54

things I already knew about the real decision process was what to put in

play39:57

and what not to put in, while keeping the size of the book under control.

play40:00

Because I think it's something like it's thousands of papers a month

play40:03

now published in Machine Learning. It's overwhelming for the beginner.

play40:06

So really the goal of the book is to distill out those few core concepts,

play40:09

which means there are always things, oh, should we have added this?

play40:12

Should we have added that? What we wanted to do was to

play40:14

avoid adding the latest sort of architecture that might be very hot

play40:18

at the minute, that could easily disappear three months down the line.

play40:21

So I hope we resisted that, that temptation.

play40:23

Um, but there are areas where, you know, perhaps when we at some point,

play40:27

if we get around to a second edition, we might think about including

play40:29

reinforcement learning is something which is of growing

play40:32

importance and would be lovely to have a chapter on reinforcement

play40:35

learning that integrates well with the rest of the book.

play40:38

There are books on reinforcement learning. There are review articles.

play40:40

There's plenty of place to go learn about them, but something that

play40:42

sort of integrated with the book, I think could be could be valuable.

play40:46

So that is something we might we might visit in the future.

play40:49

But for the moment, we've just focused on what we think are the

play40:52

the core principles that any, any newcomer to the field,

play40:56

whether a master's student, whether they're somebody who's

play40:58

self-taught, a practitioner coming into the field wanting to

play41:00

understand the basics of the field. And so the goal was to try to

play41:03

keep the book, as it were, as short as possible, but no shorter.

play41:06

Looking back on on your last couple of books as well, in retrospect,

play41:10

um, which bits are you? Are you most kind of proud of,

play41:12

and which bits do you do you kind of feel that when you did make

play41:16

the decision at the time, perhaps you mispredicted how successful

play41:20

something might be? Very interesting. So the thing I'm most proud of,

play41:24

actually, is the very first book called Neural

play41:26

Networks for Pattern Recognition. And the reason is because I think

play41:29

the book was quite influential in steering the field towards a more

play41:33

probabilistic, more statistical perspective of machine learning.

play41:36

It perhaps hard for people to appreciate today,

play41:38

but it wasn't always that way. When I first went into machine

play41:42

learning, a lot of it was inspired by neurobiology,

play41:44

which is which is fine, but it lacked sort of mathematical rigor.

play41:48

It lacked any mathematical foundation.

play41:50

And so there was a lot of trying to learn a bit more about the

play41:53

brain and then try to copy that in the algorithms and see if

play41:55

that worked better or not. And there was a lot of trial and

play41:58

error. Still a lot of empirical, uh,

play42:00

trial and error in machine learning, of course, but at least we have that,

play42:03

that sort of bedrock of probability theory.

play42:05

And so I think the book was the first one to really address

play42:08

machine learning and neural networks from a statistical from

play42:11

a probabilistic perspective. I think in that respect,

play42:13

the book was very influential. The field was much smaller then today

play42:16

we take we take that as obvious. But I think in terms of the

play42:19

thing I'm most proud of, it's probably the influence of that.

play42:22

That first book, back in back in 1995, in terms of things I look back

play42:27

on that I might do differently. I suppose when I look at if I look

play42:31

at Prml, for example, and I look at the trajectory of the field,

play42:34

we've seen that neural networks were were all the rage in the

play42:37

mid mid 1980s to mid 1990s, and then they kind of got

play42:41

overtaken by other techniques. And then we had this sort of

play42:44

Cambrian explosion of, you know, support vector machines and

play42:47

Gaussian process and Bayesian methods and graphical models and,

play42:49

and all the rest of it. And, and I think one thing,

play42:52

that one thing that I think Geoff Hinton really got right is he really

play42:55

understood that neural networks were the way, the way forward.

play42:58

And he really stuck to that perspective sort of through

play43:01

thick and thin. Um, I got kind of distracted

play43:04

particularly. We talked earlier about Bayesian

play43:06

methods and how beautiful and how elegant they are.

play43:08

And to a theoretical physicist, it's very appealing to think of everything

play43:11

from a Bayesian perspective. But really, what we've seen today

play43:14

is that the the practical tool that's giving us these extraordinary

play43:19

advances is neural networks. And most of those ideas go back to

play43:23

the to the mid 1980s to the idea of gradient descent and, and so on,

play43:26

a few new, a few new tweaks. You know, we have GPUs,

play43:29

we have Relu's, we have a few, but essentially most of the ideas

play43:32

were were were still were around back in the back in the late 1980s.

play43:37

We didn't really understand the incredible scale at which you need

play43:41

to use them, but they only really work when you have this gargantuan

play43:45

scale of data and compute. And of course,

play43:47

we didn't really have GPUs or know how to use them back then.

play43:50

So there were there were some key developments that have unlocked.

play43:53

This had made it possible, but I think perhaps if I did

play43:55

something differently with the amazing benefit of hindsight,

play43:58

other than sort of investing in certain stocks and whatever,

play44:01

and all the other things you could do if you had perfect hindsight,

play44:03

I think the other thing I would do is probably just stay really

play44:06

focused on neural networks, because eventually that's the

play44:08

technology that came good. But I always come back to probability

play44:11

theory is very much a unifying idea. So let me just give you a

play44:15

specific example from Prml. Actually, there were two different

play44:18

technologies, one called Hidden Markov Models, that were all the rage

play44:20

in speech recognition back then. Another technique called Kalman

play44:23

filters that have been used for many years to to guide spacecraft,

play44:26

track aircraft on radar and all sorts of things.

play44:29

Um, it turns out they're essentially the same algorithm,

play44:32

and not only are the same algorithm, but they can be derived from the

play44:36

most beautifully simple principle. You just take the sum and

play44:39

product rule of probabilities, and then you take the idea that

play44:42

a joint probability distribution has a factorization described by a

play44:44

directed graph, and if you want to. So when I was preparing Prml,

play44:50

I looked over a bunch of books called Kalman Filters An

play44:52

Introduction to Kalman Filters, and they would go on chapter

play44:55

after chapter at the forward and then chapter after chapter about

play44:57

the reverse equations and so on. It's very, very complex and very,

play45:00

very heavy going. But you can derive the Kalman filter

play45:04

and get the hidden Markov model for free in almost a few lines of,

play45:07

of algebra, just starting from probability theory.

play45:10

And this idea of factorization, a sort of deep mathematical principle

play45:13

that operates there, and you discover the message passing algorithm.

play45:16

And if it's a tree structure graph, it's exact and you have two passes.

play45:19

It's very beautiful and very elegant. So I so I love the fact we're

play45:22

exploring all these many different frontiers, but I love

play45:24

the fact we have some at least some compass to guide us as we as

play45:28

we engage in the exploration of this combinatorially vast space.

play45:31

Yeah, it's so interesting. My co-host, Doctor Keith Dugger,

play45:35

he always says that he doesn't need to remember all of the different

play45:37

statistical quantities because he can read rederive them from

play45:40

first principles. It's that nice. But we should move on to AI for

play45:44

science. So you're leading this initiative

play45:46

at Microsoft Research. Can you tell us about that? Yes.

play45:48

So at a personal level, of course, this brings back my my earlier

play45:52

interest in theoretical physics and and chemistry and biology

play45:56

and brings it together with, with machine learning.

play45:59

And what many people realized, uh, a few years ago was that of the

play46:05

many areas that machine learning would impact the scientific,

play46:09

the area of scientific discovery would be, I think, in my view,

play46:12

the most important, the reason I say that is because

play46:15

it's actually scientific discovery that really has allowed

play46:19

humans to go on that trajectory of the last few thousand years,

play46:21

not just understanding our place in the universe, but to be much

play46:24

more in control of our own destiny, to double our lifespan,

play46:27

to cure many diseases, to give us much higher standards of living, uh,

play46:31

to give us a sense, a much brighter outlook for the future than human

play46:34

humans have traditionally enjoyed. And and that's come through

play46:38

scientific discovery. And then the application of that

play46:41

knowledge and understanding of the world in the form of technologies to

play46:44

agriculture, industrial and so on. And so I can't think of any more

play46:48

important application for AI. But what's really interesting is

play46:52

it's very clear that many areas of scientific discovery are

play46:55

being disrupted. And when I say disrupted, I'll

play46:57

just give you one simple example, the ability of neural nets,

play47:01

machine learning models to act as emulators for what previously

play47:05

were very expensive numerical simulators very often gives you

play47:09

a factor of 1000 acceleration. You know, we can forecast the

play47:12

weather a thousand times faster with the same accuracy than we

play47:14

could a few years ago, prior to the use of deep learning.

play47:17

Now, if that were the only thing that was happening,

play47:20

that alone would be a disruption. That alone would be worth setting

play47:23

up a team on AI for science. I think actually it's only scratching

play47:25

the surface, but any time something that's very core, very important,

play47:29

gets a thousand times faster, it means you can do things that would

play47:33

take years in a few tens of hours. That really is a disruption.

play47:37

It really is transformational. So, um, a couple of years ago,

play47:41

I pitched to to our chief technology officer to say, look,

play47:44

this is a really important field. I'm happy to step down from my role

play47:48

as the lab director of MSR in Europe. And instead, um, I'd like to lead a

play47:53

new team focusing on AI for science. And it met with enormous enthusiasm.

play47:58

And so we've been growing and building that team.

play48:00

It's very interesting team. It's very multinational.

play48:02

We have people on on many different continents and different countries.

play48:04

We've opened new labs in Amsterdam and in Berlin.

play48:09

We have teams in, in Beijing and, and in Shanghai and folks in,

play48:13

in Seattle as well. And um, so very,

play48:16

very multidisciplinary, very multinational, but with,

play48:20

with one thing in common, this real excitement and passion

play48:23

for what machine learning and AI is going to do to really transform

play48:27

and accelerate our ability to do scientific discovery.

play48:30

You were talking about Inductive priors just a second ago, and I

play48:33

guess I first learned about this, the art of, you know, designing

play48:36

inductive priors and machine learning from Max Wellings group.

play48:39

They were saying that, you know, the remarkable thing is that you can,

play48:43

um, using, um, principles, let's say from physics, we can design

play48:47

these inductive priors and we can reduce the size of the hypothesis

play48:51

class that that we're. Approximating. And because we know the target

play48:54

function is inside that class, we are not introducing any

play48:57

approximation error. And we're we are kind of overcoming

play49:00

some of the curses in machine learning by making the problem

play49:03

tractable, which which is amazing. But that's speaking to this kind

play49:07

of principled approach of imbuing domain knowledge into these systems.

play49:12

It's really interesting, actually. Max and I have a similar trajectory.

play49:14

You both did PhDs in theoretical physics and then moved into

play49:16

machine learning, and I think we both feel there's a

play49:19

very important role for inductive bias to play in the use of machine

play49:23

learning in the scientific domain. I think I'm sure everybody is

play49:26

familiar with the the blog called The Bitter Lesson by Rich

play49:30

Sutton and if any, if anybody watching this is not familiar,

play49:33

they should immediately after this video go and read that blog.

play49:36

It's a very short blog, and without giving too much of a spoiler,

play49:39

he essentially says that every attempt by people to improve the

play49:43

performance of machine learning by building in prior knowledge building,

play49:46

in what we call inductive biases into the models,

play49:50

it produces some improvement. And then but very quickly,

play49:52

it's overtaken by somebody else who just has more data.

play49:55

And and that indeed is a bitter lesson.

play49:57

And, and it's a wonderful blog and people should I've read it many times

play50:00

and I think people should, you know, probably read that once a month.

play50:03

And it's very inspiring. But I think there may be exceptions.

play50:07

And I think the scientific domain is one where inductive biases for

play50:10

the foreseeable future will be extremely important, sort of almost

play50:13

contrary to the bitter lesson. And a couple of reasons for this.

play50:16

One is that the inductive biases we have are not not of the kind.

play50:21

Let's say let's take linguistics or something, which is or any

play50:25

domain where which is based on human expertise acquired through

play50:28

experience, because a person who's had a lot of experience over a number

play50:33

of years and formulated some sort of rules of thumb that guide them,

play50:37

that's exactly what machine learning is very good at processing very

play50:40

large amounts of data and and inducing the the rules, as it were,

play50:44

the patterns within that data. Um, so I think that kind of inductive

play50:49

bias is, is typically harmful. And, and I think the bitter lesson

play50:52

will certainly apply there. But in the scientific domain

play50:54

it's rather different. First of all, the inductive

play50:57

biases we have are very rigorous. We have the idea of conservation of

play51:00

of energy, conservation of momentum. We have symmetries.

play51:03

If I have a molecule in a vacuum, it has a certain energy.

play51:06

If I rotate the molecule, the representation of the

play51:09

coordinates of all the atoms changes wildly in the computer,

play51:12

but the energy is the same. So we have this very rigorous

play51:15

inductive bias. We also know that the world at the

play51:17

atomic level is described exquisitely well by by Schrodinger's equation.

play51:21

And sprinkle in a few relativistic effects.

play51:22

And you've got an amazingly accurate description of the world, but it's

play51:25

way too complex to just solve it directly, or it's exponentially

play51:29

costly in the number of electrons. But nevertheless, we have this

play51:32

bedrock of of really understanding the laws that govern the universe.

play51:36

And so and so I think that's the first the first thing,

play51:39

we have very rigorous priors that we believe in deeply.

play51:42

It's not that we think conservation of energy doesn't work.

play51:44

We we know that it's true. The second thing is that we're

play51:47

operating in a data scarce regime. So large language models are able

play51:51

to use very large quantities of internet scale quantities of

play51:54

human created data, whether it's in the form of, you know,

play51:57

whether it's Wikipedia or whether it's just scientific papers or, uh,

play52:01

any of the output of humans almost is potentially material on which,

play52:05

which large models can feed their in a very data rich regime and can

play52:09

go to scale and and so the bitter lesson I think really kicks in there

play52:13

in the scientific domain, the data might come from simulations which

play52:17

are computational and expensive, or it might come from lab experiments

play52:21

which are which are expensive. And the data is is limited.

play52:24

So we're operating usually in a data scarce regime.

play52:28

So we have relatively limited data and we have very rigorous

play52:30

prior knowledge. And so the balance between the

play52:33

data and the inductive bias, um, is very different because of course,

play52:37

the no free lunch theorem says you can't learn purely from data.

play52:40

You have to have some form of inductive bias.

play52:42

And in the case of a transformer, it's a very lightweight form of

play52:46

inductive bias. We believe there's a there's a

play52:48

deep hierarchy. There's some, you know,

play52:50

data dependent, uh, self-attention, but but really, that's it.

play52:54

Uh, and the rest is determined from the data.

play52:57

In science, there's much more scope for

play52:59

bringing in these inductive biases. There's much more need to bring

play53:02

in the inductive biases. And that also, incidentally, again,

play53:05

in my personal very biased opinion, makes the application of machine

play53:10

learning and AI to the sciences the most exciting frontier of AI

play53:13

and machine learning, because it's the one that's richest

play53:16

in terms of the creativity, and also in terms of the need to

play53:19

bring in some of that beautiful mathematics that that underpins the

play53:23

universe. Yeah, so, so fascinating. I mean, could we just linger

play53:26

just just for a second on that? So Rich Sutton, in his Bitter Lesson

play53:29

essay, he explicitly called out symmetries as being, you know,

play53:33

he was warning against human designed artifacts in these models.

play53:37

And I mean, Max welling, as you say, famously, um, built these gauge

play53:41

equivariant neural networks, bringing in his physics knowledge.

play53:45

Um, so I'm just trying to understand the spectrum between

play53:48

high resolution physical priors and the kind of macro.

play53:52

For human knowledge that that we learn, which is presumably brittle.

play53:56

Is it just that we think that these physical priors are

play54:00

fundamental and that's that's a that's a perfectly acceptable way

play54:04

to constrain the search space. But these high level priors are

play54:08

brittle. Yes. I think I think the prior knowledge

play54:11

that comes from human experience, um, is, is, is more of that

play54:15

brittle kind because the machine can see far more examples than

play54:19

any human can in a lifetime, and can can do a more systematic job

play54:24

of looking across all of that data. We're not not subject to, say,

play54:28

recency bias and those sorts of things.

play54:30

So I think that kind of prior knowledge is, is one where where

play54:34

scale and data will, will will win, whereas the, the prior knowledge

play54:38

that we have from the physical laws in a sense, is much more rigorous.

play54:41

And symmetry is, is, is very powerful.

play54:43

It's sometimes said that physics more or less is symmetry.

play54:46

That's almost right. I mean, right. So conservation conservation

play54:50

laws arise from symmetry. You know, translation invariance in

play54:54

space time gives you conservation of energy and momentum.

play54:57

And, you know, gauge symmetry of the electromagnetic field gives

play55:00

you charge conservation and so on. And so these are very, very rigorous

play55:05

laws that apply from symmetry. But, you know, even if you take

play55:07

a data driven approach, people often use data augmentation.

play55:10

If you know that an object doesn't depend is identity,

play55:12

doesn't depend on where it is in the image, you might, you know,

play55:15

make lots of random translations of your data to augment your data.

play55:18

So data augmentation can be a data driven way of building in

play55:21

those symmetries. But now when we have very rich

play55:24

prior knowledge, I'll come back to Schrodinger's equation.

play55:26

It describes the world with exquisite precision at the atomic level.

play55:29

But solving it is very, very expensive.

play55:31

And so what we can do is we can cache those computations.

play55:34

We call it the fifth paradigm of scientific discovery,

play55:36

which is a rather fancy turn. But the idea is very simple,

play55:39

is that instead of taking a conventional numerical solver and

play55:43

using it to solve something like Schrodinger's equation or something

play55:46

called density functional theory, instead of solving that directly

play55:49

to solve your problem. Instead you use that simulator

play55:53

to generate training data, and then use that training data to

play55:56

train a machine learning emulator. And then that machine learning

play55:59

emulator can now emulate the simulator, but typically 3 or 4

play56:04

orders of magnitude faster. So provided you use it a lot and

play56:07

you amortize the one off cost of generating the training data and

play56:10

doing the training, if you're going to use it many, many times overall,

play56:14

it becomes dramatically faster, dramatically more efficient than

play56:18

using the simulator. And that that's just one of the

play56:20

breakthroughs we're seeing in this space.

play56:22

So first of all, um, there's a spectrum, as you say,

play56:25

of we could just train on lots of data or we could augment the data,

play56:28

or we could make a simulator for the data, and then we can train

play56:33

a machine learning model. And as we were just speaking to these

play56:36

inductive priors, they are so high resolution that we are not, um,

play56:41

restricting the target function that, that, that we want to learn.

play56:44

And we can make quite a principled argument about that.

play56:46

But the one question to me is there's a kind of I don't know

play56:51

whether it's best to frame it as exploration versus exploitation,

play56:53

but there needs to be some amount of going off piste.

play56:57

So we define the structure and we we essentially build a generative

play57:02

model and we can generate a whole bunch of trajectories.

play57:05

But could it ever be the case that we wouldn't have enough variance

play57:09

to find something interesting? There's a very interesting question

play57:13

about the the overall scientific method of formulating hypotheses,

play57:16

running tests, evaluating those hypotheses, refining the hypotheses,

play57:20

running more experiments, and so on that, that that scientific loop.

play57:22

I think machine learning will have an important role to play there

play57:25

because data is becoming very high dimensional, very high throughput.

play57:28

Humans can't analyze this data anymore.

play57:30

A human can't directly look at the output of the Large Hadron Collider

play57:33

with its, you know, petabytes a second or whatever it is pouring off.

play57:36

We need machines to help us. But again, I think the human, um,

play57:40

rises to the level of the conductor of the orchestra, as it were.

play57:43

They no longer have to do things by hand.

play57:45

Machines are helping to accelerate that.

play57:48

And and I think the machines can help accelerate the creative process,

play57:51

potentially by pointing to anomalies or highlighting patterns in the

play57:54

data and so on, but very much with the human scientist in the loop.

play57:58

But but even coming down from those sort of lofty, more sort

play58:00

of philosophical considerations just to the the practicalities,

play58:03

when we talk about discovery, we're also interested in just

play58:06

the very practical method of how do we discover a new drug,

play58:09

or how do we discover a new material. So scientific discovery also means

play58:12

that, that, that, that, that very pragmatic near-tum approach.

play58:15

And there we're seeing really dramatic acceleration through the

play58:19

concept of this emulator, um, in our ability to explore the

play58:23

combinatorially vast space of new molecules and new materials,

play58:26

exploring those spaces efficiently to find potential candidates that

play58:30

might be new drugs or new, um, uh, new materials for batteries or

play58:34

other other forms of green energy. So that that alone is a very

play58:38

exciting frontier. I think it's so interesting.

play58:42

Um, so searching the space, I mean, drug discovery is an interesting one.

play58:45

I think you spoke about sustainability as well as another

play58:49

application you can speak to, but how do you identify?

play58:52

Find an interesting drug. So the drug discovery process starts

play58:58

first of all with the disease. And trying to, first of all,

play59:00

deciding we want to go tackle a particular disease and then

play59:03

finding a suitable target. So the standard so-called small

play59:07

molecule paradigm, which is where most drugs are today, they're small

play59:11

synthetic organic molecules that bind with a particular protein.

play59:15

So pharma companies will will spend a lot of time identifying targets.

play59:19

So say a protein that has a particular region with a

play59:22

molecule combined and can bind to and therefore can influence

play59:26

the behavior of that protein, switching on or switching off some

play59:29

part of that disease pathway and breaking the chain of disease.

play59:32

And so the challenge then is to find a small molecule that first of all,

play59:36

has the property that it binds with the target protein.

play59:38

That's the first step. But there are many other things

play59:40

that it has to do. It has to be absorbed into the body.

play59:43

It has to be metabolized and excreted.

play59:45

It mustn't and particularly mustn't be toxic.

play59:47

It mustn't bind to anything, many other proteins in the body

play59:49

and cause bad things to happen. So what you have is a very large

play59:53

space of molecules, usually estimated around ten to the power

play59:55

60 potential drug like molecules. And out of that enormous space

play60:00

of ten to the 60 are trying to find an example that meets all

play60:03

of these many, many criteria. And so one approach is to

play60:07

generate a lot of candidates, but in computationally and then

play60:10

screen them one by one for different properties, that screening process,

play60:14

the more that can be done in silico rather than in a wet lab,

play60:18

the faster it can be done. And the the larger the search

play60:22

space can be, and therefore the bigger the fraction of that space

play60:25

of possibilities you can explore, hopefully thereby increasing the

play60:29

chances of finding a good candidate. Because many attempts to find a drug

play60:31

for a disease simply fail. Nothing. Nothing eventually comes of it.

play60:35

So increasing the probability of success, increasing the speed of

play60:38

that discovery process. So in all of that,

play60:40

there are many places where machine learning could could be disruptive.

play60:44

So on that process of, um, I guess you're describing,

play60:48

you generate candidates and then you almost discriminate interesting ones,

play60:53

and then you rinse and repeat in a kind of iterative process.

play60:57

Let me give you a concrete example. So we've done some work looking

play61:01

at tuberculosis. So tuberculosis kills something

play61:04

like 1.3 million people very sadly back in 2022, which is the

play61:07

last year we have we have data. And it might seem surprising because

play61:10

we have we have antibiotics, we have drugs for tuberculosis.

play61:13

Why are so many people dying? And one core reason is that the

play61:16

bacterium is evolving to develop drug resistance.

play61:19

And so there's a search on for new for new drugs.

play61:21

So maybe I'll just take a moment to explain some of the architecture and

play61:24

get into a little bit of the sort of the techie details of, of this.

play61:28

So we wanted a way of finding we know what the target is.

play61:31

We've been told what the target protein is.

play61:33

The target has a region called a pocket, and we're looking for

play61:35

molecules that will bind tightly with that pocket region on the protein.

play61:39

And and so the way the way we approached this was, first of all,

play61:43

build a language model, but not a language model for human language,

play61:46

but for the language of molecules. So we first of all take there's

play61:49

a representation representation called smiles.

play61:51

It's a way of taking a it's an acronym.

play61:53

It's just a way of taking a molecule and describing it as a

play61:56

one dimensional string. And so you first of all,

play61:58

take a large database of, I don't know, 10 million molecules

play62:01

represented as smile strings, and you treat them like the tokens

play62:04

for a, for a transformer model. And by getting it to predict the

play62:07

next token, the next element of the smile string, you build a transformer

play62:11

based language model that can speak the language of molecules.

play62:14

So it can it can run generatively and it can create new create new

play62:18

molecules as output. So you can think of that as kind of

play62:20

like a foundation language model. But speaking the language of

play62:24

molecules. Now we want to generate molecules,

play62:27

but not just any molecules. We want molecules which bind with

play62:29

a particular target protein. So we have the target protein in

play62:32

particular. It's the pocket region that

play62:34

we're interested in. So we can give it the amino acid

play62:36

sequence of the protein as input. But we need more than that.

play62:39

We need the geometry of the of the pocket.

play62:41

And this is where some of those inductive biases come in.

play62:43

So we we need to have representations of the geometry of

play62:46

the atoms that form that pocket. But a way that represents these

play62:50

equivariances. And so they're encoded as input

play62:53

to a transformer model that learns a representation for the

play62:56

protein pocket. And the final piece we need, as you

play62:58

said, we want to do this iteratively. We want to take a good molecule and

play63:01

make it a better molecule, rather than just searching blindly across a

play63:05

space of ten to the 60 possibilities. And so the other thing we want to

play63:08

provide is input is a molecule a descriptor of a of a of a of a

play63:12

known small molecule that does bind with the pocket already.

play63:16

And but we want to do this in a way that creates variability.

play63:19

And we actually use a variational autoencoder to

play63:21

create that representation. And and that's the an encoder

play63:24

that translates the molecule into a latent space.

play63:28

And we can sample from that latent space.

play63:30

And then the this um language model, the Smiles language model can

play63:35

attend to both the output of the variational auto encoder and the

play63:37

output of the protein encoded using Cross-attention.

play63:40

And so what we've done there is, I think, rather tastefully combined

play63:44

some elements from, you know, state of the art modern deep learning.

play63:48

The result then can be can be trained end to end using a

play63:52

database of known other protein. Teens that are known to bind

play63:55

efficiently to to small molecules. And once the system is trained,

play63:59

we can now provide as input the known target for tuberculosis and some

play64:04

known molecules that bind with this. And then we can iteratively refine

play64:08

those molecules at the output. We get molecules that have

play64:10

better binding efficiency. And we're able to increase the

play64:13

binding effectiveness by two orders of magnitude.

play64:15

And so we now have state of the art molecules in terms of

play64:18

binding efficiency to this, um, to this, uh, target protein.

play64:23

Of course, we can't do the wet lab experiments ourselves.

play64:25

We partner with an organization called Gedi, the Global Health

play64:28

Drug Discovery Institute. They've synthesized the

play64:31

molecules that we've we've generated and measured their

play64:34

their binding efficacy. And so we're very,

play64:37

very excited about this. And of course, the next stage now

play64:39

is to take that as a starting point and further refine and

play64:42

optimize those molecules and, and try to address all those other

play64:45

requirements that we have for, for a drug can actually be

play64:48

tested on humans in terms of its toxicity and metabolism and all

play64:51

and all the other things. But I think it's just a very,

play64:54

a very nice example of almost like a first step in using modern deep

play64:59

learning architecture to accelerate the process of drug discovery.

play65:02

And already we have, I think, really quite a spectacular success,

play65:05

given that we're we're kind of newcomers to this, to this field,

play65:09

partnering with experts, domain experts, with the wet lab experience

play65:12

and the wet lab capability. Um, to me, this is the beginning

play65:15

of a very exciting journey. That sounds incredible.

play65:19

Is there any kind of representational transfer between the models?

play65:22

So, for example, um, you were talking about this, um,

play65:26

this geometric prayad model and in a generating tokens to go

play65:31

into the language model. Because just using language models,

play65:33

by the way, is a fascinating approach.

play65:35

I spoke with Christian Szigeti and he was doing mathematical conjecturing,

play65:39

just using language models, you know, just just taking

play65:42

mathematical constructions and putting them into language.

play65:45

And they used to use graph neural networks for this.

play65:48

And so I guess the question is, could you kind of bootstrap it with a,

play65:52

you know, with an inductive, um, principled model and then kind of,

play65:57

um, just train using the language model afterwards?

play66:00

I think, I think the general principle

play66:01

there is a very powerful one. So the idea of borrowing strength

play66:04

from other domains, and I think we're seeing this time and time again in

play66:07

deep learning that that the machine learning models are able to extract

play66:11

some general patterns from, even from one domain and translate them

play66:14

into a completely different domain. We talked earlier about large

play66:16

language models being getting better at writing code.

play66:19

If they've also got exposure to to poetry or something as

play66:22

seemingly quite irrelevant. There's some there's something quite

play66:25

deep and subtle going on there, but perhaps in a less subtle way,

play66:28

it's clear there's a sort of a language of molecules,

play66:31

there's a language of materials, and that by building models that have

play66:36

a broader exposure to that language, they almost invariably will become

play66:40

better at the specific tasks that we want to to apply them to.

play66:44

So I think there is a general principle at heart there. Yeah.

play66:46

It's so interesting because I used to think that that perhaps

play66:50

the drawback of these inductive prior models is that it was one

play66:53

inductive prior per model, but this ability potentially to,

play66:57

um, bootstrap a foundational model that can do all of the

play67:00

things that's really interesting. I think the most powerful inductive

play67:04

biases are the ones we focus on, are really those very general ones,

play67:07

symmetries. That's just a very fundamental

play67:10

properties of the universe. And we we want we want those

play67:13

really baked into the models. I think the, the,

play67:15

the sort of intuitions we have about more specific domains.

play67:19

I think they can perhaps lead us astray because they're based on

play67:22

our experience of much more limited domains.

play67:24

And I think this is where the machines can be, can be much better

play67:27

at processing and and interpreting large volumes of data and drawing

play67:31

regularities out of that, out of data in a more systematic way. Okay, okay.

play67:36

And, um, just before we leave this, this is a bit of a galaxy brain

play67:39

question, and, and that's parlance that all the kids

play67:42

are using these days, by the way. But, um, how fundamental is is

play67:46

our physical knowledge? You know, the question is like, we

play67:49

are we're designing these inductive priors as if they are fundamental.

play67:52

But folks like Stephen Wolfram, for example, argue that there's

play67:56

there's a deeper ontological reality. You know, it might be a graph,

play67:59

cellular automaton or something like that.

play68:01

And is that something you think about, the kind of the gap between

play68:05

our models and what reality is. So I think, first of all, one of the

play68:09

greatest scientific discoveries of all time is the fact the universe

play68:11

can be described by simple laws that that is not obvious a priori.

play68:15

That itself is perhaps the most profound discovery, you know,

play68:17

really going back to Newton. But we found it time and time again.

play68:20

What we've also found is that the our understanding of the

play68:23

universe as it exists today has, has, is almost like onions were

play68:27

peeling away layers of onions. You know, Newton, if you want to

play68:29

navigate a spacecraft to Jupiter, you still use Newton's laws of

play68:33

motion and Newton's law of gravity. It's just fine.

play68:35

Um, it doesn't mean we believe it's an exact description of nature.

play68:38

We've now got deeper descriptions of nature. We understand relativity.

play68:41

For example, general relativity tells us that actually Newton's

play68:44

second law of motion, or Newton's Newton's law of gravity,

play68:47

rather, is just an approximation. The inverse square law is a pretty

play68:49

good approximation, but we've got a much better description now. But.

play68:53

But it's, it's it's hard to say that we've

play68:56

we've found the ultimate answer. It's rather that human knowledge is

play69:00

just always stands on that, that edge of what we don't understand.

play69:03

And scientific discovery is always about exploring the

play69:06

things we don't understand, working out whether, um, you know,

play69:09

whether the laws actually do hold. And the anomaly we see in the data

play69:13

is, is because of some phenomenon that we haven't yet observed.

play69:16

I mean, this is how Neptune was discovered by by seeing that the

play69:20

planets were not behaving as they should do according to Newton's laws.

play69:23

Newton's laws were just fine. It was just another planet

play69:25

perturbing them. Uh, or is the precession of the

play69:28

perihelion of Mercury because. Because there's another.

play69:30

No, it's because actually the Newton's

play69:32

law of gravity isn't quite right. We need relativity to understand

play69:35

that. So I think scientific exploration,

play69:38

as far as I can tell, uh, has no particular end in sight.

play69:42

It's rather that we have things that we understand and there are

play69:45

new frontiers. You know, when I was, uh,

play69:47

when I was a teenager getting excited by physics, I loved reading about

play69:50

relativity and quantum physics, but it's kind of depressing because I

play69:54

thought I was kind of born, you know, 50 years too late or whatever.

play69:57

You know, all the exciting stuff happened at

play69:58

the beginning of the 20th century, and it's kind of all been done.

play70:02

Uh, but now we have, you know, dark matter and dark energy, and we

play70:05

realize that most of the universe isn't sitting on the periodic table

play70:07

that I learned about in schools. And actually, um, I needn't have

play70:11

worried, you know, um, uh, I think was it Vannevar Bush who

play70:15

called it The Endless Frontier? Um, the, you know,

play70:18

science is an endless frontier. There is just there is always more to

play70:21

explore and always more to learn. So whether the particular ideas

play70:24

you alluded to have substance, I don't know.

play70:26

At the end of the day, the scientific method will tell us if

play70:28

they have predictive capabilities. They can predict new phenomena that

play70:31

we weren't aware of before then. Um, uh, then, you know,

play70:35

then they have they have credence as far as a scientist is concerned.

play70:38

But ultimately, you know, we still stick to the scientific method.

play70:42

It's about our ability to make predictions that are testable

play70:45

experimentally. And if they stand up to the test

play70:47

of experiment, then we give more weight to those, to those hypotheses,

play70:50

and eventually they're elevated to the status of theory.

play70:53

I often wonder about the horizon of our cognition.

play70:57

You know, what we are capable of, of understanding,

play70:59

and we tend to understand things using high level metaphors.

play71:03

Um, information is a great example of that.

play71:05

So, you know, a lot of people talk about the universe as information.

play71:09

Um, this essential view is quite interesting.

play71:11

So modeling everything as, as, as agents.

play71:14

And it might well be possible that the universe is just so

play71:17

strange and alien that we could never possibly understand it.

play71:20

So there's a bit of an interplay between our kind of intelligibility

play71:24

and our models and what it is. And the universe clearly is

play71:29

completely unintelligible in the sense of nobody can really, um,

play71:33

think about quantum physics. It completely defies our everyday

play71:36

intuitions that we learn at this sort of macroscopic level.

play71:39

So I think we have to accept already that the universe is

play71:42

described mathematically. That's our precise description.

play71:44

And then we have kind of metaphors about waves and particles and so on.

play71:47

But none of them, none of them really work properly.

play71:49

They're just they're just crutches to lean on.

play71:51

But ultimately it's a mathematical description.

play71:54

But that that is that is also very interesting, the fact that the

play71:57

world is described by mathematics, by making little marks on a piece

play72:00

of paper, you can discover a new planet that's quite incredible.

play72:03

Shifting over to to deep learning a little bit more, more broadly.

play72:07

Um, we were touching on this already, but the landscape is dominated

play72:10

by transformers architectures. What are your broad thoughts

play72:14

about that? Like any field,

play72:16

I think machine learning has its sort of its fads and its waves.

play72:19

Something works really well, and then everybody latches on to

play72:22

that and makes use of that and that. That's all well and good.

play72:24

Um, I'd be kind of surprised if the transformer is the last word

play72:28

in deep learning. If that's the, the the, the,

play72:30

the architecture will be used forevermore.

play72:32

Um, but it clearly works very well. And we haven't reached the end of

play72:35

its capabilities by any means. So it makes a lot of sense to

play72:38

exploit the transformer architecture in applications and

play72:41

see how much we can gain from that. At the same time, there's clearly

play72:44

opportunities to think about the limitations of transformers,

play72:47

the computational costs. Can we do the same thing we know with

play72:50

better scaling if we want longer context windows and all the rest.

play72:53

So there's plenty of interesting research, I think, to be done in,

play72:55

in new architectures as well. So I think we need both.

play72:58

So, you know, here's another galaxy brain question.

play73:00

Um, why does deep learning work? You know, because on on the face

play73:03

of it, it shouldn't work. It shouldn't train,

play73:05

it shouldn't generalize. And they've been an absolutely

play73:08

remarkable success. Why is that? So I think first of all,

play73:13

at one level you could say, well, we understand why they work.

play73:15

We're fitting non-linear functions. We're kind of doing curve fitting

play73:18

in high dimensional space. So we need some some generalization.

play73:21

And it comes back to the no free lunch theorem, some inductive biases,

play73:24

perhaps its smoothness continuity. Perhaps it's something more more

play73:27

constraining than that. So at one level it's sort of not

play73:29

surprising. I can I can fit a polynomial to

play73:32

a bunch of data points and by gradient methods,

play73:34

and I can make good predictions for sort of intermediate points,

play73:37

just, just generalizing that to more data and higher dimensions.

play73:40

So, so at one level I say no, it's not at all surprising.

play73:42

They work at a different level. Of course, the fact they work so

play73:45

well is remarkable, but the way in which they work is very interesting.

play73:49

So one thing which, um, if we go back to the earlier.

play73:53

Years of machine learning, and certainly back to the world

play73:55

of statistics. The idea that you would fit models

play73:58

that have way more parameters than the number of data points would be

play74:01

clearly insane to any self-respecting statistician. We never would have.

play74:04

And perhaps that's why nobody really tried it very much.

play74:07

Um, and yet we have these odd phenomena whereby, you know,

play74:10

the training error goes to zero, and yet the test error continues

play74:13

to come down, even though the training error is already at zero.

play74:16

Uh, something about stochastic gradient descent.

play74:19

Um, the actual training process clearly is important there.

play74:22

It's not just here's a cost function. We find the global minimum.

play74:24

It's a property of the global minimum.

play74:26

No, there are many, many global minima that all have zero error.

play74:29

Some solutions will clearly overfit. Others generalize well.

play74:33

And so there's something about the training process that we

play74:35

need to understand. So I think there's a lot of

play74:37

research to be to be done in. Why do they work so well I think

play74:40

it's an open question. We can describe the model.

play74:42

We can say lots of things about the model.

play74:44

We can say because it has this and this and that number of layers.

play74:47

Therefore, the structure of the space has this and that properties and that

play74:50

divides it up into such and such regions and so on. Those are true.

play74:54

I don't know whether that gives us real insights into why it's working.

play74:57

I think there are some, some very much open questions there.

play75:00

It strikes me a little bit like neuroscience.

play75:02

You know, we have the human brain. It does these amazing things.

play75:04

And we can get more and more and richer and richer data about which

play75:08

neurons are firing and when and how the firings are correlated.

play75:10

We can learn something about the the underlying machinery.

play75:13

This is a bit like neuroscience, except we can put a probe in every

play75:16

neuron in our artificial brain and gather very, very rich information.

play75:20

So again, I think there's a very interesting research frontier of

play75:23

of getting better understanding of why are they able to generalize

play75:26

so well and why do we have these strange phenomena with these

play75:30

seemingly overparameterized models that don't overfit, but rather

play75:33

have very good generalization? Lots of research to be done and

play75:36

just to linger on that, that observation you made that you

play75:39

can train a deep learning model. And after the training error has

play75:43

converged, the test loss continues to improve.

play75:47

I mean, that just seems it just doesn't make sense.

play75:50

I mean, and that there's Grokking as well, which is another um,

play75:54

it is almost like we were saying with physics that outside of the

play75:57

the machinations of the optimization algorithm stuff is

play76:00

happening that we don't understand. Well, you can tell stories, right?

play76:03

You can say there's a there's a big space.

play76:05

Each point in the space is a setting for all the parameters of the model.

play76:07

So the sort of the weight space of the model, and maybe you started

play76:10

off somewhere near the origin with some little random initialization.

play76:13

Then you follow some trajectory that's defined by stochastic

play76:15

gradient descent. And there are lots and lots of

play76:17

places in this space, all of which have zero training error.

play76:20

So and they're connected. So there's some sort of manifold

play76:23

of zero training error. And you're starting off at the

play76:25

origin. And stochastic gradient descent

play76:26

is somehow not taking you in a random way.

play76:28

Maybe it's taking you to something like, you know, the nearest point

play76:31

on this manifold or something, and that maybe that's some kind

play76:33

of regularization and maybe that, that that place has certain

play76:38

smoothness properties that lead to good generalization.

play76:40

So you can kind of tell these stories.

play76:41

I think the challenge is to take the stories and make them predictive.

play76:44

So I think when we have a theory, when we have a theory of what's going

play76:47

on, we'll know we have a theory because it can predict new things,

play76:50

not just tell stories about what we've already discovered empirically,

play76:53

but really become predictive. I think that's still very much

play76:55

an open question. So, um, what do you think about

play76:59

the the intelligibility of, of neural networks in terms of things

play77:03

like bias and fairness and safety? Um, because you could just think

play77:09

of these things as inscrutable, um, bags of neurons.

play77:13

And, but we need to have some guardrails, don't we?

play77:17

Well, we absolutely need to create technology that's

play77:19

beneficial to humanity. There's no question about that.

play77:21

And there are mechanisms for doing that to to align the systems,

play77:24

whether it's to through, you know, human feedback, whether it's external

play77:28

guardrails that are providing more conventional sort of checks

play77:31

on on how things are being used. That's,

play77:33

that's that's clearly necessary. And I find it very encouraging

play77:37

that so much energy and effort is going into this.

play77:39

And yes, there will be there'll be bumps in the road and

play77:42

missteps on the way for sure. Um, but overall, we seem to be

play77:45

heading in a very good direction. But I think the fact that there is a

play77:48

lot of attention being paid to the potential risks associated with this

play77:52

very powerful and very general new technology gives me hope that we will

play77:56

avoid most of the the biggest risks. Can you give me a specific example

play78:01

of an emulator? Yes I can. So one very nice example.

play78:05

Actually it was the final project I worked on, uh, when I was

play78:07

working in the fusion program. So I was using fusion as a sort of

play78:10

springboard to get into machine learning, and we wanted to do real

play78:14

time feedback control of a fusion experiment, a thing called a tokamak,

play78:17

very high temperature plasma. We wanted to use neural nets to

play78:20

do nonlinear feedback. So the challenge there was to

play78:24

take a plasma. It's like a donut shaped ring of

play78:28

hot plasma. And it was known that if you could

play78:30

change the cross sectional shape, you could improve its performance.

play78:32

So there's an experiment called a Compass Compact assembly at

play78:35

Culham in Oxfordshire. And the experiment is designed

play78:37

to produce very interesting, exotic cross sectional shapes to

play78:41

to explore the performance. So we want to use a neural net

play78:43

to do that. Feedback control. Now the good news is we had a great

play78:47

piece of inductive bias, a thing called the grad Shafranov equation.

play78:50

It's a second order elliptic partial differential equation.

play78:52

But the point is it. Describes the boundary of the

play78:55

plasma very accurately. Right? So you make a bunch of measurements

play78:59

from hundreds of little pickup coils around the plasma.

play79:01

And those are boundary conditions. You solve the grashoff runoff

play79:03

equation. You know, the shape of the plasma.

play79:05

And the goal was to decide ahead of the time that you wanted to

play79:09

create a circular plasma and then change its shape and, and,

play79:14

and then make corrections. If the shape wasn't quite the

play79:17

one you wanted, you would change the the big control

play79:19

coil currents and alter it shape. The problem was the Grad Trifonov

play79:23

equation on a state of the art workstation of the day would

play79:26

take 2 or 3 minutes to solve, whereas we had to do feedback on a

play79:30

sort of 20kHz frequency or something. It was about something like six

play79:33

orders of magnitude too slow. So what we did instead was we we

play79:37

solved the Grad Trifonov equation many times on the, on the

play79:40

workstation, um, over a period of, you know, days and weeks until

play79:43

we built up a large database of, of known solutions along with

play79:48

their magnetic measurements. And then we trained a neural network,

play79:50

just a simple two layer neural network back in the day, um,

play79:54

with probably only a few thousand parameters.

play79:55

I mean minuscule by modern standards, but it was trained to take the

play79:59

magnetic measurements and predict the shape, and we could put that

play80:01

into a standard feedback loop. And and we're in a bit of a race

play80:05

with another organization that was doing a similar thing,

play80:08

a different fusion lab that was working on the same project.

play80:10

And so that was very motivating. And I'm pleased to say we got

play80:13

there first, and we did the world's first ever real time

play80:16

feedback control of a tokamak plasma using a neural network.

play80:20

But it was a beautiful example of a of a of an emulator.

play80:22

We could get 5 or 6 orders of magnitude speed up,

play80:25

not by solving the equation directly to do feedback control,

play80:29

but by using the numerical solver to generate training data and using

play80:33

the training data to train the emulator and then the emulator.

play80:36

And even then it was still quite demanding for the silicon of the day.

play80:41

Uh, there was no process of fast enough.

play80:42

So we actually built a physical implementation of the neural net,

play80:47

believe it or not. So it was a hybrid, um,

play80:49

analog digital system. It had an analog signal pathway

play80:53

with analog sigmoidal units, but the weights were set using

play80:56

digitally set resistors. So we could take the numerical

play80:58

output of the the emulator, download it into this bespoke

play81:02

hardware physical neural network and do real time feedback control.

play81:06

So pretty pretty excited about that project. That's fascinating.

play81:09

What do you think about, um, control now?

play81:11

Do you have any opinions on, you know, model predictive control.

play81:14

And control is a super important area different.

play81:17

Both both the both the control problem and the overall planning

play81:20

problem? I think, um, despite all the

play81:23

remarkable advances in GPT four, the world of instantiated AI and

play81:26

robotics and so on is still a very, very wide open frontier.

play81:30

Um, we don't we don't really have robots that can even yet drive a

play81:34

car through central London. That's still a major challenge

play81:38

that we're seeing some very remarkable progress recently. Yeah.

play81:41

I mean, more broadly, um, I've been speaking with some

play81:44

neuroscientists and they say that we have the matrix in our heads,

play81:46

so we're always running simulations. And presumably in the future

play81:51

this will be a principled way of building agents.

play81:53

So the agents will run counterfactual simulations and select trajectories

play81:58

which look like good ones. And then the process will will

play82:00

iterate. I think this is very powerful.

play82:03

I mean, the idea of sort of type one and type two, fast learning,

play82:06

slow learning, the idea that we simulate the world

play82:08

and we compare the simulation with the reality and we can learn from

play82:11

our own simulators and so on. We don't we don't quite know

play82:13

what best to do with that, but it feels such a powerful and

play82:16

compelling concept. And we we think something like that

play82:18

is going on in the brain that, again, that feels like an area

play82:22

that's ripe for exploration. And I think in some form, some kind

play82:25

of, you know, model prediction and simulation of the world feels like

play82:29

it will be increasingly a part of AI systems as we go forward.

play82:33

I mean, for me, the takeaway in all of this is just what an

play82:35

amazing time to be in this field. There are so many fascinating

play82:37

things to work on. Professor Bishop, it's been an

play82:40

honor to have you on MLS. Thank you so much.

play82:42

Well thank you I've enjoyed it. Thank you. Amazing.

Rate This

5.0 / 5 (0 votes)

Related Tags
人工智能深度学习神经网络科学发现机器学习理论物理创新技术行业专家科技发展学术研究
Do you need a summary in English?