Shane Legg (DeepMind Founder) - 2028 AGI, Superhuman Alignment, New Architectures

Dwarkesh Podcast
26 Oct 202344:19

Summary

TLDRShane Legg,Google DeepMind的创始人兼首席AGI科学家,在播客中探讨了通用人工智能(AGI)的进展衡量、当前大型语言模型的不足以及未来的发展路径。他强调了人类认知的多样性和复杂性,以及实现AGI所需的多方面测试和理解。Legg还讨论了深度学习模型的样本效率问题,以及如何通过架构创新和对伦理的深入理解来实现对强大AI的安全对齐。此外,他还预测了多模态AI将成为该领域的下一个重要里程碑。

Takeaways

  • 🤖 AGI(人工通用智能)的定义是能够执行人类通常能做的各种认知任务的机器。
  • 📈 衡量AGI进展需要多种不同类型的测试和测量,涵盖人类认知任务的广泛范围。
  • 🧠 当前的AI模型,如大型语言模型,缺乏类似人类的情景记忆(episodic memory)的能力。
  • 🔍 现有的基准测试无法充分衡量人类的某些认知方面,如理解视频流等。
  • 🌐 未来的AI发展需要解决样本效率问题,即如何让AI在较少的数据上也能快速学习。
  • 🎮 人类水平的AI到来的标志不是单一的测试,而是一系列综合测试和无法找到AI性能低于人类的任务。
  • 🧭 早期研究强调AI应该能够在多种开放式环境中操作和成功,类似于视频游戏。
  • 🔄 深度学习模型目前通过增加上下文窗口和训练基础模型来学习,但缺少中间过程。
  • 🌟 未来的AI里程碑可能是更加全面的多模态理解,这将极大地扩展AI的应用范围。
  • 🚀 目前的AI安全研究包括可解释性、过程监督、红队测试、评估危险能力和机构治理等。

Q & A

  • 如何具体衡量我们朝着人工通用智能(AGI)的进展?

    -衡量AGI的进展需要多种不同类型的测试和测量,这些测试应该涵盖人类可以执行的各种认知任务的广度。我们需要了解人类在这些任务上的表现,然后判断AI系统是否达到了人类的水平。如果AI系统能够在广泛的任务上达到人类的表现,并且很难找到它表现不佳的任务,那么我们可以认为已经实现了AGI。

  • 目前的基准测试有哪些局限性?

    -目前的基准测试,如MMLU等,主要针对语言模型,因此它们不能衡量人类理解视频流等非语言任务的能力。此外,它们也不测试人类的记忆系统,如情景记忆,这是人类快速学习特定信息的能力。

  • 样本效率与人类的记忆系统有何关联?

    -样本效率与人类的记忆系统密切相关,因为它是人类能够快速学习新事物的能力之一。大型语言模型在一定程度上具有样本效率,因为它们可以在上下文窗口中快速学习新信息。然而,现有的系统缺乏人类快速学习特定信息的能力,即情景记忆。

  • 深度学习模型的样本效率是否是一个根本性缺陷?

    -样本效率不是深度学习模型的根本性缺陷。虽然这些模型需要比人类一生中能看到的更多数量级的数据,但我们可以预期,随着研究的深入,将有清晰的路径来解决现有模型的这些不足,包括它们的记忆和学习类型。

  • 如何具体判断人工智能达到了人类水平?

    -无法通过单一的测试或任务来判断人工智能达到了人类水平,因为这是一个关于通用智能的问题。我们需要确保AI能够在大量不同的任务上达到人类水平,并且没有明显的能力差距。此外,当人们尝试找到机器失败的任务示例时,如果他们无法成功,那么我们就可以认为AI达到了人类水平。

  • Shane Legg对于人工智能在开放环境中操作和成功的看法有何变化?

    -Shane Legg的早期研究强调了人工智能应该能够在各种开放环境中操作和成功。随着时间的推移,他的观点有所发展,从尝试定义一个完全通用、数学上清晰的智能概念,转变为更具体地考虑人类智能和构建能够匹配人类智能的机器,因为我们理解人类智能是一个非常强大的东西。

  • Shane Legg如何看待AlphaFold等特定领域的模型与AGI的关系?

    -像AlphaFold这样的特定领域模型并不是直接推动AGI发展的,它们可能在这个过程中学到了一些可能相关的东西,但Shane Legg并不认为它们是通往AGI的道路。不过,DeepMind作为一个大团队,会探索各种不同的项目,包括AlphaFold这样的重大项目。

  • Shane Legg对于未来的人工智能模型有什么预测?

    -Shane Legg预测,未来的人工智能模型将会更加多模态,这将打开更广阔的可能性空间。他认为,未来的人们回顾过去的模型时,会认为它们只处理文本聊天是非常狭窄的。未来的模型将能够理解图像、视频,并能处理更多种类的数据,这将使系统以更强大的方式与世界建立联系。

  • Shane Legg如何看待当前的人工智能安全研究?

    -Shane Legg认为,为了实现真正符合伦理的AI系统,我们需要构建具有深入理解世界、良好伦理观念和稳健推理能力的系统。他强调,我们需要训练系统理解伦理,并且需要社会和伦理学家来确定系统应该遵循的伦理原则。此外,我们需要确保系统在每次做出决策时都进行伦理分析,并且人类专家需要不断检查这些决策和推理过程。

  • Shane Legg对于如何确保AI遵循我们设定的伦理原则有何看法?

    -Shane Legg认为,我们需要向系统明确指出应该遵循的伦理原则,并且我们需要检查系统在实际操作中是否一致地遵循这些原则。他提出,通过深入理解世界和伦理,以及稳健的推理,系统应该能够在面对不同选项时进行伦理分析,并执行最符合伦理的行动。

  • Shane Legg如何看待当前多模态AI模型的发展?

    -Shane Legg认为,目前的多模态AI模型还处于早期阶段,但他预期未来人们将越来越多地理解图像和视频等内容。他认为,当系统能够很好地处理视频和其他类型的数据时,它们将对世界有更深入的理解,这将自然地打开许多新的应用和可能性。

Outlines

00:00

🤖 对话Shane Legg:AGI的测量与定义

Shane Legg,作为Google DeepMind的首席AGI科学家,讨论了如何具体衡量向人工通用智能(AGI)的进展。他指出,由于AGI的定义是关于通用性而非特定任务,因此需要多种不同类型的测量和测试来跨越人类可以执行的各种认知任务的广度。Shane强调,尽管我们可能永远无法完全涵盖人类能做的所有事情,但如果我们能够对一系列认知任务进行测试,并且有一个AI系统能够在这些任务上达到人类的表现水平,那么我们就接近拥有AGI了。

05:04

🚀 深度学习模型的进步与未来

Shane Legg讨论了大型语言模型(LLMs)的进步,以及我们目前使用的基准测试所缺失的人类认知方面。他提到,目前的模型并不测量像理解流媒体视频这样的事物,因为它们是语言模型,而人类能够做到这一点。他还讨论了人类的记忆系统,特别是我们如何在模型中缺乏这种类型的记忆。Shane认为,尽管现有的系统有一些问题,但他认为这些问题可以通过研究和工作得到解决,并且我们正在走向一个更清晰的未来,其中这些缺陷将得到充分解决。

10:08

🎮 人工智能在游戏中的应用与思考

Shane Legg回顾了他早期关于人工智能应该能够在各种开放式环境中操作和成功的研究,并将其与当前的情况联系起来。他提到,他的观点已经有所发展,从尝试定义一个极其通用和数学上清晰的智能框架,到现在更多地关注于构建能够匹配人类智能的机器。Shane认为,人类智能是一个有意义的参考点,因为我们理解它,知道它是一个非常强大的东西,具有经济、哲学和历史的重要性。

15:12

🧠 人工智能的记忆与学习

Shane Legg深入讨论了人工智能模型中记忆的必要性,特别是情景记忆,以及它如何与样本效率相关。他认为,当前的架构并没有真正实现这一点,因为它们主要是基于非常流动的上下文窗口和缓慢更新的权重。Shane提出,为了实现真正的创造性问题解决,我们需要在模型中加入搜索过程,以便在可能性的空间中寻找隐藏的宝藏。

20:20

🔍 人工智能的未来发展与安全

Shane Legg讨论了为达到人类水平甚至超人类水平的AI所需的对齐工作。他强调,我们需要确保AI系统从一开始就是一个高度道德价值对齐的系统。Shane提出了一个观点,即AI系统需要具备深入理解世界、良好的伦理理解以及可靠的推理能力,以便能够分析面前的不同选择,并执行最符合伦理的行动方案。

25:21

🌐 多模态AI:未来的里程碑

Shane Legg预测,未来的AI里程碑将与多模态理解有关,这将使AI系统能够以更加全面和深入的方式理解世界。他认为,未来的AI系统将不再局限于文本聊天,而是能够理解图像、视频等多种模态,这将为新的应用和可能性打开大门。Shane强调,尽管目前我们仍处于多模态AI的早期阶段,但随着时间的推移,我们将看到AI在理解和处理不同类型的数据方面取得更大的进步。

Mindmap

Keywords

💡人工通用智能(AGI)

人工通用智能(AGI)是指一种能够执行人类通常能够执行的各种认知任务的机器。在视频中,Shane Legg解释说,AGI不仅仅是关于特定任务的表现,而是关于机器的一般性能力。衡量AGI的进展需要多种不同类型的测试和测量,以覆盖人类认知任务的广度。

💡损失函数(loss numbers)

损失函数是机器学习中用于评估模型预测与实际结果差距的函数。在视频中,提到通过损失函数的改善来衡量模型性能的进步,但Shane指出,这只是一个数字,要理解这种进步如何转化为实际的AGI进展,需要更全面的评估。

💡认知任务

认知任务是指人类大脑执行的各种信息处理活动,如记忆、学习、解决问题等。在视频中,Shane提到要测试AGI是否达到人类水平,需要一系列涵盖人类所有认知任务的测试。

💡样本效率(sample efficiency)

样本效率是指机器学习模型从有限的数据中学习并泛化到新情况的能力。在视频中,Shane提到人类在某些类型的认知任务上表现出高效的样本效率,而现有的AI模型在这方面还有待提高。

💡工作记忆(working memory)

工作记忆是短期记忆的一部分,负责临时存储和处理信息。在视频中,Shane提到人类的工作记忆与AI模型中的上下文窗口类似,但AI模型缺乏类似人类海马体的快速学习机制。

💡基准测试(benchmarks)

基准测试是评估机器学习模型性能的标准测试。在视频中,Shane讨论了当前使用的基准测试,如MMLU,以及它们未能充分衡量人类认知的某些方面,例如理解视频流。

💡多模态(multimodal)

多模态指的是能够处理和理解多种类型数据(如文本、图像、声音等)的能力。在视频中,Shane预测下一个重要的里程碑将是AI模型变得更加多模态,从而更好地理解和互动世界。

💡强化学习(reinforcement learning)

强化学习是一种机器学习范式,模型通过与环境的交互获得奖励或惩罚来学习。在视频中,Shane提到强化学习是探索和解决创造性问题的重要方向,需要与搜索结合以发现新的可能性。

💡安全对齐(safety alignment)

安全对齐是指确保AI系统的行为符合人类的价值观和伦理标准。在视频中,Shane讨论了为实现安全对齐,AI系统需要具备深入的世界模型、伦理理解以及强大的推理能力。

💡系统2思维(System 2 thinking)

系统2思维是指深思熟虑、逻辑推理的决策过程,与系统1的直觉、快速反应相对。在视频中,Shane提到AI模型需要发展系统2能力,以便不仅仅是模仿数据,而是能够进行创造性的问题解决和搜索。

Highlights

衡量通用人工智能(AGI)的进展是一个挑战,因为它涉及广泛的任务类型。

AGI的定义是能够执行人类通常能做的各种认知任务的机器。

为了测试AGI,我们需要多种不同类型的测量和测试,涵盖人类能做的所有认知任务。

当前的大型语言模型在某些认知任务上还不足,例如理解视频流和情景记忆。

样本效率与人类的认知效率相关,它使得人类能够快速学习特定事物。

深度学习模型可能不是样本效率的根本限制,未来可能有解决路径。

达到人类水平的AI需要通过一系列全面的测试,并且在对抗性测试中没有明显弱点。

AI研究应该关注于构建能够在开放环境中操作和成功的智能系统。

DeepMind的研究不仅限于AGI,还涉及可持续能源、气候变化等多个领域。

为了实现AGI,我们需要在架构上进行创新,以支持快速学习和深度泛化。

AlphaFold等特定领域的模型虽然强大,但并不直接推动AGI的发展。

真正的创造性问题解决需要通过搜索空间来寻找可能性,而不仅仅是模仿数据。

未来的AI系统将更加多模态,能够理解和处理图像、视频等多种类型的数据。

AI的安全性需要通过确保系统具有深入的世界模型、伦理理解和可靠的推理能力来实现。

强化学习和其他技术可能存在风险,需要检查AI的推理过程和伦理理解。

为了实现伦理对齐,AI系统需要在决策时进行伦理分析,并执行最符合伦理的行动方案。

DeepMind的成立初衷是为了加速AI安全研究,但其对AI能力的推动影响难以评估。

Shane Legg预测,有50%的可能性在2028年实现AGI,但这是基于当前趋势的推测。

未来的AI里程碑可能是多模态理解的突破,这将极大地扩展AI的应用范围和能力。

Transcripts

play00:01

Today I have the pleasure  of interviewing Shane Legg,  

play00:06

who is the founder and the Chief AGI scientist of  Google DeepMind. Shane, welcome to the podcast. 

play00:12

Thank you. It's a pleasure being here. First question. How do we measure progress  

play00:17

towards AGI concretely? We have these loss  numbers and we can see how the loss improves  

play00:22

from one model to another, but it's just a  number. How do we interpret this? How do we  

play00:25

see how much progress we're actually making? That’s a hard question. AGI by its definition  

play00:34

is about generality. It's not about doing a  specific thing. It's much easier to measure  

play00:39

performance when you have a very specific thing in  mind because you can construct a test around that. 

play00:44

Maybe I should first explain what I mean  by AGI because there are a few different  

play00:48

notions around it. When I say AGI, I mean a  machine that can do the sorts of cognitive things  

play00:55

that people can typically do, possibly more.  To be an AGI that's the bar you need to meet. 

play01:02

So if we want to test whether we're meeting the  threshold or we're getting close to the threshold,  

play01:07

what we actually need is a lot of different kinds  of measurements and tests that span the breadth  

play01:16

of all the sorts of cognitive tasks that people  can do and then to have a sense of what human  

play01:22

performance is on these sorts of tasks. That then  allows us to judge whether or not we're there. 

play01:29

It's difficult because you'll never have a  complete set of everything that people can  

play01:33

do because it's such a large set. But I think  that if you ever get to the point where you  

play01:38

have a pretty good range of tests of all sorts  of cognitive things that we can do, and you have  

play01:45

an AI system which can meet human performance  and all those things and then even with effort,  

play01:51

you can't actually come up with new examples of  cognitive tasks where the machine is below human  

play01:59

performance then at that point, you have an AGI. It may be conceptually possible that there is  

play02:03

something that the machine can't do that people  can do but if you can't find it with some effort,  

play02:09

then for all practical purposes, you have an AGI. Let's get more concrete. We measure the  

play02:16

performance of these large language models on MMLU  and other benchmarks. What is missing from the  

play02:23

benchmarks we use currently? What aspect of  human cognition do they not measure adequately? 

play02:32

Another hard question. These are quite  big areas. They don't measure things like  

play02:39

understanding streaming video, for example,  because these are language models and people  

play02:42

can do things like understanding streaming video. They don't do things like episodic memory. Humans  

play02:50

have what we call episodic memory. We have  a working memory, which are things that have  

play02:54

happened quite recently, and then we have  a cortical memory, things that are sort of  

play02:59

being in our cortex, but there's also a system in  between, which is episodic memory, which is the  

play03:04

hippocampus. It is about learning specific things  very, very rapidly. So if you remember some of the  

play03:10

things I say to you tomorrow, that'll  be your episodic memory hippocampus. 

play03:16

Our models don't really have that kind of thing  and we don't really test for that kind of thing.  

play03:19

We just sort of try to make the context  windows, which is more like working memory,  

play03:24

longer and longer to sort of compensate for this. But it is a difficult question because the  

play03:38

generality of human intelligence is very, very  broad. So you really have to start going into  

play03:42

the weeds of trying to find if there's specific  types of things that are missing from existing  

play03:48

benchmarks or different categories of benchmarks  that don't currently exist or something. 

play03:55

The thing you're referring to with episodic  memory, would it be fair to call that sample  

play03:58

efficiency or is that a different thing? It's very much related to sample efficiency.  

play04:03

It's one of the things that enables humans  to be very sample efficient. Large language  

play04:09

models have a certain kind of sample efficiency  because when something's in their context window,  

play04:15

that biases the distribution to behave in a  different way and so that's a very rapid kind  

play04:21

of learning. There are multiple kinds of learning  and the existing systems have some of them,  

play04:27

but not others. It's a little bit complicated. Is this kind of memory, what we call sample  

play04:33

efficiency, a fatal flaw of these deep learning  models that it just takes trillions of tokens,  

play04:40

a magnitude more than what any human will  see throughout their lifetime or is this  

play04:44

something that will be solved over time? The models can learn things immediately  

play04:49

when it's in the context window and then  they have this longer process when you  

play04:54

actually train the base model and that's when  they're learning over trillions of tokens.  

play04:58

But they miss something in the middle.  That's sort of what I'm getting at here. 

play05:03

I don't think it's a fundamental limitation.  I think what's happened with large language  

play05:08

models is something fundamental has changed.  We know how to build models now that have  

play05:14

some degree of understanding of what's going on.  And that did not exist in the past. And because  

play05:22

we've got a scalable way to do this now, that  unlocks lots and lots of lots of new things. 

play05:28

Now we can look at things which are missing,  such as this sort of episodic memory type thing,  

play05:34

and we can then start to imagine ways to address  that. My feeling is that there are relatively  

play05:43

clear paths forward now to address most of the  shortcomings we see in the existing models,  

play05:48

whether it's about delusions, factuality, the  type of memory and learning that they have, or  

play05:55

understanding video, or all sorts of things like  that. I don't see any big blockers. I don't see  

play06:00

big walls in front of us. I just see that there's  more research and work and all these things will  

play06:06

improve and probably be adequately solved. Going back to the original question of how do  

play06:11

you measure when human level AI has arrived  or has gone beyond it. As you mentioned,  

play06:18

there's these other sorts of benchmarks  you can use and other sorts of traits,  

play06:21

but concretely, what would it have to do for you  to be like, “Okay, we've reached human level.” 

play06:25

Would it have to beat Minecraft from  start to finish? Would it have to get  

play06:28

100% on MMLU? What would it have to do? There is no one thing that would do it,  

play06:33

because I think that's the nature of it.  It's about general intelligence. So I'd  

play06:37

have to make sure it could do lots and lots  of different things and it didn't have a gap. 

play06:42

We already have systems that can do very  impressive categories of things to human  

play06:46

level or even beyond. I would want a whole suite  of tests that I felt was very comprehensive and  

play06:54

then furthermore, when people come in and say,  “Okay, so it's passing a big suite of tests,  

play07:00

let's try to find examples. Let's take  an adversarial approach to this. Let's  

play07:06

deliberately try to find examples where  people can clearly, typically do this, but  

play07:11

the machine fails.” And when those people cannot  succeed, I'll go, “Okay, we're probably there.” 

play07:16

A lot of your earlier research, at least the  ones I could find, emphasized that AI should be  

play07:22

able to manipulate and succeed in a variety of  open-ended environments. It almost sounds like  

play07:28

a video game. Is that where your head is still  at now, or do you think about it differently? 

play07:33

It’s evolved a bit. When I did my thesis work  around universal intelligence, I was trying to  

play07:40

come up with an extremely universal, general,  mathematically clean framework for defining  

play07:49

and measuring intelligence. I think there were  aspects of that that were successful. I think  

play07:58

in my own mind, it clarified the nature of  intelligence as being able to perform well  

play08:05

in lots of different domains and different tasks  and so on. It's about that sort of capability of  

play08:10

performance and the breadth of performance. I  found that was quite helpful and enlightening. 

play08:18

There was always the issue of the reference  machine. In the framework, you have a weighting  

play08:26

of things according to the complexity. It's like  an Occam's razor type of thing, where you weight  

play08:33

tasks and environments which are simpler,  more highly. You’ve got a countable space of  

play08:44

semi-computable environments. And that Kolmogorov  complexity measure has something built into it,  

play08:51

which is called a reference machine. And  that's a free parameter. So that means that  

play08:57

the intelligence measure has a free parameter  in it and as you change that free parameter,  

play09:03

it changes the weighting and the distribution  over the space of all the different tasks and  

play09:07

environments. This is sort of an unresolved  part of the whole problem. So what reference  

play09:15

machine should we ideally use? There's no  universal reference machine. People will  

play09:24

usually put a Turing machine in there, but  there are many kinds of different machines. 

play09:34

Given that it's a free parameter, I think the  most natural thing to do is to think about what's  

play09:41

meaningful to us in terms of intelligence. I think  human intelligence is meaningful to us in the  

play09:47

environment that we live in. We know what human  intelligence is. We are human too. We interact  

play09:52

with other people who have human intelligence.  We know that human intelligence is possible,  

play09:57

obviously, because it exists in the world.  We know that human intelligence is very,  

play10:02

very powerful because it's affected the world  profoundly in countless ways. And we know if  

play10:08

human level intelligence was achieved, that  would be economically transformative because  

play10:13

the types of cognitive tasks people do in the  economy could be done by machines then. And it  

play10:18

would be philosophically important because this  is sort of how we often think about intelligence.  

play10:25

Historically it would be a key point. So I think that human intelligence in a human-like  

play10:30

environment is quite a natural sort of reference  point. You could imagine setting your reference  

play10:37

machine to be such that it emphasizes the kinds  of environments that we live in as opposed to some  

play10:45

abstract mathematical environment. And so that's  how I've kind of gone on this journey of — “Let's  

play10:51

try to define a completely universal, clean,  mathematical notion of intelligence” to “Well,  

play10:58

it's got a free parameter. “ One way of thinking about it is to  

play11:02

think more concretely about human intelligence and  build machines that can match human intelligence.  

play11:09

Because we understand what that is and we know  that that is a very powerful thing. It has  

play11:13

economic, philosophical and historical importance. The other aspect of course is that, in this pure  

play11:22

formulation of Kolmogorov complexity, it's  actually not computable. I also knew that  

play11:27

there was a limitation at the time but it  was an effort to just theoretically come  

play11:34

up with a clean definition. I think we can  sort of get there, but we have this issue  

play11:39

of a reference machine, which is unspecified. Before we move on, I do want to ask a question  

play11:44

on the original point you made on LLMs  needing episodic memory. You said that  

play11:52

these are problems that we can solve and  these are not fundamental impediments. 

play11:57

But when you say that, do you think they  will just be solved by scale or do each  

play12:01

of these need a fine-grained specific  solution that is architectural in nature? 

play12:05

I think it'll be architectural in nature because  the current architectures don't really have what  

play12:14

you need to do this. They basically have a  context window, which is very, very fluid,  

play12:19

of course, and they have the weights, which  things get baked into very slowly. So to my mind,  

play12:23

that feels like working memory, which is like  the activations in your brain, and then the  

play12:28

weights are like the synapses in your cortex. Now, the brain separates these things out. It  

play12:33

has a separate mechanism for rapidly learning  specific information because that's a different  

play12:40

type of optimization problem compared to slowly  learning deep generalities. There's a tension  

play12:48

between the two but you want to be able to do  both. You want to be able to hear someone's name  

play12:54

and remember it the next day. And you also want to  be able to integrate information over a lifetime  

play12:59

so you start to see deeper patterns in the world. These are quite different optimization targets,  

play13:07

different processes, but a comprehensive system  should be able to do both. And so I think it's  

play13:14

conceivable you could build one system that  does both, but you can also see that because  

play13:16

they're quite different things, it makes  sense for them to be done differently. I  

play13:19

think that's why the brain does it separately. I'm curious about how concretely you think that  

play13:24

would be achieved. DeepMind has been working  on these domain specific reinforcement learning  

play13:35

type setups: AlphaFold, AlphaCode and so on. How  does that fit into what you see as a path to AGI?  

play13:42

Have these just been orthogonal domain specific  models or do they feed into the eventual AGI? 

play13:49

Things like AlphaFold are not really feeding  into AGI. We may learn things in the process  

play13:58

that may end up being relevant, but I don't  see them as likely being on the path to AGI.  

play14:05

But we're a big group. We've got hundreds and  hundreds and hundreds of PhDs working on lots  

play14:10

of different projects. When we find what we see  as opportunities to do something significant like  

play14:17

AlphaFold, we'll go and do it. It's not like we  only do AGI type work. We work on fusion reactors  

play14:25

and various things in sustainability, energy.  We've got people looking at satellite images  

play14:35

of deforestation. We have people looking  at weather forecasting. We've got tons  

play14:41

of people working on lots of things. On the point you made earlier about  

play14:46

the reference machine as human intelligence.  It's interesting because one of the things  

play14:51

you mentioned in your 2008 thesis about how  you would measure intelligence was — You said  

play14:55

you could do a compression test and you could  see if it fills in words and a sample of text  

play15:01

and that could measure intelligence. And funnily  enough, that's basically how the LLMs are trained. 

play15:06

At the time, did it stick out to you as  an especially fruitful thing to train for? 

play15:11

Well, yeah. In the sense what's happened  is actually very aligned with what I wrote  

play15:18

about in my thesis. The ideas from Marcus Hutter  with AIXI, where you take Solomonoff induction,  

play15:26

which is this incomputable but theoretically  very elegant and extremely sample efficient  

play15:33

prediction system, and then once you have that,  you can build a general agent on top of it by  

play15:40

basically adding search and reinforcement  signal. That's what you do with AIXI. 

play15:47

But what that sort of tells you is that if you  have a fantastically good sequence predictor,  

play15:54

some approximation of Solomonoff induction, then  going from that to a very powerful, very general  

play16:02

AGI system is just sort of another step. You've  actually solved a lot of the problem already. 

play16:09

And I think that's what we're seeing  today actually, that these incredibly  

play16:12

powerful foundation models are incredibly good  sequence predictors that are compressing the  

play16:17

world based on all this data. And then you will  be able to extend these in different ways and  

play16:23

build very, very powerful agents. Let me ask you more about that. 

play16:27

Richard Sutton's Bitter Lesson essay says  that there's two things you can scale,  

play16:32

search and learning. I guess you could say  that LLMs are about the learning aspect. The  

play16:38

search stuff, which you worked on throughout  your career, where you have an agent that is  

play16:43

interacting with this environment, is that the  direction that needs to be explored again? Or  

play16:49

is that something that needs to be added  to LLMs where they can actually interact  

play16:51

with their data or the world or in some way? Yeah, I think that's on the right track. These  

play17:00

foundation models are world models of a kind and  to do really creative problem solving, you need to  

play17:08

start searching. If I think about something like  AlphaGo and the famous Move 37, where did that  

play17:16

come from? Did that come from all its data that  it's seen of human games or something like that?  

play17:21

No, it didn't. It came from it identifying a move  as being quite unlikely, but plausible. And then  

play17:31

via a process of search, coming to understand  that it was actually a very, very good move. 

play17:37

So to get real creativity, you need to search  through spaces of possibilities and find these  

play17:43

hidden gems. That's what creativity is. Current  language models don't really do that. They really  

play17:51

are mimicking the data. They are mimicking all the  human ingenuity and everything, which they have  

play17:57

seen from all this data that's coming from the  internet that's originally derived from humans. 

play18:09

These models can blend things. They can do  Harry Potter in the style of a Kanye West  

play18:14

rap or something, even though it's never  happened, they can blend things together.  

play18:18

But if you want a system that can go truly beyond  that and not just generalize in novel ways and do  

play18:19

something that's truly creative, that is  not just a blending of existing things,  

play18:23

that requires searching through a space of  possibilities and finding these hidden gems  

play18:29

that are hidden away in there somewhere. And  that requires search. So I don't think we'll  

play18:34

see systems that truly step beyond their training  data until we have powerful search in the process. 

play18:43

There are rumors that Google DeepMind is  training newer models, and you don't have  

play18:47

to comment on those specifically, but when you  do that, if it's the case that something like  

play18:53

search is required to go to the next level, are  you training in a completely different way than  

play18:58

how GPT-4 or other transformers are trained? And I can't say much about how we're training.  

play19:04

I think it's fair to say we're roughly doing the  sorts of scaling and training that you see many  

play19:11

people in the field doing but we have our own take  on it and our own different tricks and techniques. 

play19:19

Okay, maybe we'll come back to it  and get another answer on that. 

play19:22

Let's talk about alignment briefly. What will  it take to align human level and superhuman AIs? 

play19:31

It's interesting because the sorts of  reinforcement learning and self-play kinds  

play19:35

of setups that are popular now, like Constitution  AI or RLHF, DeepMind obviously has expertise in  

play19:40

it for decades longer. I'm curious what you think  of the current landscape and how DeepMind pursues  

play19:47

that problem of safety towards human level models. Do you want to know about what we're currently  

play19:52

doing or do you want me to have a  stab at what I think needs to be done? 

play19:55

Needs to be done. Currently we're doing lots  

play20:00

of things. We're doing interpretability. We're  doing our process supervision. We're doing red  

play20:04

teaming. We're doing evaluation for dangerous  capabilities. We're doing work on institutions  

play20:10

and governance and tons of stuff, right? Anyway, what do I think needs to be done?  

play20:19

I think that powerful machine learning, powerful  AGI, is coming in some time and if the system is  

play20:29

really capable, really intelligent, really  powerful, trying to somehow contain it or  

play20:33

limit it is probably not a winning strategy  because these systems ultimately will be very,  

play20:38

very capable. So what you have to do is you have  to align it. You have to get it such that it's  

play20:43

fundamentally a highly ethical value aligned  system from the get go. How do you do that? 

play20:58

Maybe this is slightly naive, but this is my  take on it — How do people do it? If you have  

play21:03

a really difficult ethical decision in front  of you, what do you do? You don't just do the  

play21:12

first thing that comes to mind, because  there could be a lot of emotions involved  

play21:15

in other things. It's a difficult problem. What you have to do is to calm yourself down.  

play21:20

You've got to sit down and you've got to think  about it. You've got to think, “Well, okay, what  

play21:25

could I do?” I could do this. I could do this.  I could do this. If I do each of these things,  

play21:31

what will happen? So that requires a model of the  world. And then you have to think about ethically,  

play21:40

how do I view each of these different actions and  the possibilities and what might happen from it?  

play21:47

What is the right thing to do? And as you think  about all the different possibilities and your  

play21:53

actions and what can follow from them and how  it aligns with your values and your ethics,  

play21:58

you can then come to some conclusion of what is  really the best choice that you should be making  

play22:03

if you want to be really ethical about this. I think AI systems need to essentially do the same  

play22:10

thing. When you sample from a foundation model  at the moment, it's blurting out the first thing.  

play22:18

It's like System 1, if you like, from psychology,  from Kahneman et al. That's not good enough. 

play22:24

And if we do RLHF without human feedback (RLAIF),  Constitutional AI tries to do that sort of thing,  

play22:38

you're trying to fix the underlying System 1  in a sense. That can shift the distribution  

play22:44

and that can be very helpful but it's a very high  dimensional distribution and you're sort of poking  

play22:49

it in a whole lot of points. So it's not likely  to be a very robust solution. It's like trying to  

play22:56

train yourself out of a bad habit. You can sort  of do it eventually. But what you need to do is  

play23:02

you need to have a System 2. You need the system  to not just sample from the model. You need the  

play23:09

system to go, “Okay, I'm going to reason this  through. I'm going to do step by step reasoning.  

play23:14

What are the options in front of me? I'm going to  use my world model now and I'm going to use a good  

play23:19

world model to understand what's likely to happen  from each of these options.” And then reason about  

play23:25

each of these from an ethical perspective. So you need a system which has a deep  

play23:30

understanding of the world, a good world  model, and has a good understanding of people,  

play23:34

and has a good understanding of ethics, and  it has robust and very reliable reasoning.  

play23:39

And then you set it up in such a way that it  applies this reasoning and this understanding  

play23:43

of ethics to analyze the different options  which are in front of it and then execute  

play23:48

on which is the most ethical way forward. But when a lot of people think about the  

play23:54

fundamental alignment problem, the worry is  not that it's not going to have a world model  

play24:03

to understand the effects of its actions, the  worry is that the effects it cares about are  

play24:11

not the ones we will care about. So even if  you improve its systems, and its thinking and  

play24:15

do better planning, the fundamental problem is  — We have these really nuanced values about what  

play24:20

we want. How do we communicate those values  and make sure they're reinforced in the AI? 

play24:26

It needs not just a good model of the world,  but it needs a really good understanding of  

play24:30

ethics. And we need to communicate to the system  what ethics and values it should be following. 

play24:35

And how do we do that in a way that we  can be confident that a super human level  

play24:41

model will preserve those values or  have learned them in the first place? 

play24:46

It should preserve them because if it's making  all its decisions based on a good understanding  

play24:51

of ethics and values, and it's consistent in  doing this, it shouldn't take actions which  

play24:56

undermine that. That would be inconsistent. Right, so then how do we get to the point  

play24:59

where it has learned them in the first place? Yeah, that's the challenge. We need to have  

play25:04

systems. The way I think about it is this: to have  a profoundly ethical AI system, it also has to be  

play25:11

very, very capable. It needs a really good world  model, a really good understanding of ethics,  

play25:16

and it needs really good reasoning. Because  if you don't have any of those things,  

play25:20

how can you possibly be consistently profoundly  ethical? You can't. So we actually need better  

play25:28

reasoning, better understanding of the world, and  better understanding of ethics in our systems. 

play25:33

It seems to me that the former  two would just come along for the  

play25:36

ride as these models get more powerful. Yeah. That's a nice property because it's  

play25:40

actually a capabilities thing to some extent. But if the third one, the ethical model, is a  

play25:43

bottleneck, or if it’s a thing that doesn't  come along with the AI itself, what is the  

play25:48

actual technique to make sure that that happens? First of all, we should train the system on ethics  

play26:01

generally so that it understands human ethics  well. There's a lot of lectures and papers  

play26:05

and books and all sorts of things. We need to  make sure it understands humans ethics well,  

play26:13

at least as well as a very good  ethicist because that's important. 

play26:17

And we then need to decide, of this general  understanding of ethics, what do we want the  

play26:25

system to actually value and what sort of  ethics do we want it to apply? Now, that's  

play26:32

not a technical problem. That's a problem for  society and ethicists and so on to come up with. 

play26:41

I'm not sure there's such a thing as optimal  ethics but I'm pretty sure that it's possible to  

play26:49

come up with a set of ethics, which is much better  than what the so-called doomers are worried about  

play26:58

in terms of the behavior of these AGI systems.  And then what you do is you engineer the system  

play27:03

to actually follow these things so that every  time it makes a decision, it does an analysis  

play27:11

using a deep understanding of the world and of  ethics and very robust and precise reasoning to  

play27:18

do an ethical analysis of what it's doing. And of course, we would want lots of other  

play27:22

things. We would want people checking  these processes of reasoning. We’d want  

play27:26

people verifying that it's behaving itself  in terms of how it reaches these conclusions. 

play27:32

But I still feel like I don't understand how that  fundamental problem of making sure it follows that  

play27:37

ethic works. Because presumably, it has read  Mao’s books so it understands Maro’s ethics  

play27:41

and understands all these other ethics. How  do we make sure the ethic that ethicists say  

play27:47

is the one is what it ends up following  and not the other ones it understands? 

play27:53

Right. So you have to specify to the system, these  are ethical principles that you should follow. 

play27:58

And how do we make sure it does that? We have to check it as it's doing it. We  

play28:01

have to assure ourselves that it is consistently  following these ethical principles at least  

play28:11

as well as a group of human experts. Are you worried that if you do it the  

play28:15

default way, which is just reinforcing it  whenever it seems to be following them,  

play28:20

you could be training deception as well? Reinforcement does have some dangerous  

play28:27

aspects to it. I think it's actually more  robust to check the process of reasoning and  

play28:36

check its understanding of ethics. To reassure  ourselves that the system has a really good  

play28:41

understanding of ethics, it should be grilled  for some time to try to really pull apart its  

play28:49

understanding and make sure it is very robust. And also, if it's deployed, we should have  

play28:54

people constantly looking at the decisions it’s  making and the reasoning process that goes into  

play29:01

those decisions to try to make sure that it is  correctly reasoning about these types of things. 

play29:08

Do you have some sort of framework  for that at Google DeepMind? 

play29:12

This is not so much a Google DeepMind perspective  on this. This is my take on how I think we  

play29:18

need to do this kind of thing. There are many  different views within and there are different  

play29:23

variants on these sorts of ideas as well. So then do you personally think there needs  

play29:27

to be some sort of framework for as you arrive  at certain capabilities, these are the concrete  

play29:32

safety benchmarks that you must have instated  at this point, or you should pause or slow down? 

play29:38

I think that's a sensible thing to do but it's  actually quite hard to do. There are some people  

play29:42

thinking about that. I know Anthropic has put  out some things like that. We were thinking  

play29:47

about similar things but actually putting  concrete things down is quite a hard thing  

play29:53

to do. I think it's an important problem and  I certainly encourage people to work on it. 

play29:59

It's interesting because you have  these blog posts that you wrote when  

play30:02

you started DeepMind, back in 2008, where  the motivation was to accelerate safety. 

play30:11

On net, what do you think the impact of  DeepMind has been on safety versus capabilities? 

play30:17

Ooh, interesting. I don't know.  It's hard to judge, actually. 

play30:30

I've been worried about AGI safety for a  long time, well before DeepMind. But it  

play30:39

was always really hard to hire people to work  on AGI safety, particularly in the early days.  

play30:46

Back in 2013 or so, we had our first hire and  he only agreed to do it part-time because he  

play30:53

didn't want to drop all the capabilities work  because of the impact it could have on his  

play30:58

career. And this was someone who had already  previously been publishing in AGI safety. 

play31:05

I don't know. It's hard to know what is the  counterfactual if we weren't there doing  

play31:12

it. We have been a group that has talked  about this openly. I've talked about the  

play31:24

importance of it on many occasions. We've been  hiring people to work on these topics. I know  

play31:31

a lot of other people in the area and I've  talked to them over many, many years. I've  

play31:34

known Dario since 2005 or something and we've  talked on and off about AGI safety and so on. 

play31:42

The impact that DeepMind has had: I guess we  were the first AGI company and as the first AGI  

play31:52

company, we always had an AGI safety group. We've  been publishing papers on this for many years. I  

play32:00

think that's lent some credibility to the area of  AGI safety. AGI was a fringe term not that long  

play32:08

ago. I hope that creates some space for people. Where do you think AI progress itself  

play32:19

would have been without DeepMind? This is not just a point that people make  

play32:22

about DeepMind. I think this is a general point  people make about OpenAI and Anthropic as well,  

play32:26

that these people went into the business to  accelerate safety and the net effect might have  

play32:31

been to accelerate capabilities far more. Right, right. I think we have accelerated  

play32:36

capabilities, but again, the counterfactuals  are quite difficult. We didn't do ImageNet,  

play32:42

for example, and ImageNet was very  influential in attracting investment  

play32:47

to the field. We did do AlphaGo, and  that changed some people's minds. But,  

play32:56

the community is a lot bigger than just DeepMind. If you went back more than five years in the  

play33:09

future, we were able to do bigger projects with  bigger teams and take on more ambitious things  

play33:17

than a lot of the smaller academic groups,  right? And so the sort of nature of the  

play33:22

type of work we could do was a bit different.  And that affected the dynamics in some ways. 

play33:28

But, the community is much, much bigger  than DeepMind. There are a number  

play33:32

of other players with significant resources.  Maybe we've sped things up a bit, but I think  

play33:36

a lot of these things would have happened before  too long anyway. Often good ideas are in the air,  

play33:45

and as a researcher, when you're about to  publish something, you see somebody else  

play33:52

has got a very similar idea coming out with some  good results. Often it's kind of like the time  

play33:57

is right for things. So I find it very hard  to reason about the counterfactuals there. 

play34:03

Speaking of the early years, it's  really interesting that in 2011,  

play34:06

you had a blog post where you said — “I’ve decided  to once again leave my prediction for when human  

play34:11

level AGI will arrive unchanged. That is, I give  it a log-normal distribution with a mean of 2028  

play34:15

and a mode of 2025, under the assumption that  nothing crazy happens like a nuclear war.” 

play34:16

This is before deep learning, this  is when nobody's talking about AI,  

play34:19

and it turns out that if the trends continue,  this is not an unreasonable prediction. 

play34:27

How did you have that accurate of an estimate  before all these trends came into effect? 

play34:29

First I'd say it's not before deep learning.  Deep learning was getting started around 2008. 

play34:36

Oh, sorry. I meant to say before ImageNet. Before ImageNet? Yeah, that was 2012. 

play34:43

I first formed those beliefs around 2001 after  reading Ray Kurzweil's The Age of Spiritual  

play34:50

Machines. There were two really important points  in his book that I came to believe as true. One is  

play35:02

that computational power would grow exponentially  for at least a few decades. And that the quantity  

play35:09

of data in the world would grow exponentially for  a few decades. And when you have exponentially  

play35:15

increasing quantities of computation and data,  then the value of highly scalable algorithms gets  

play35:23

higher and higher. There's a lot of incentive  to make a more scalable algorithm to harness  

play35:28

all this computing data. So I thought it would be  very likely that we'll start to discover scalable  

play35:35

algorithms to do this. And then there's a positive  feedback between all these things, because if your  

play35:40

algorithm gets better at harnessing computing  data, then the value of the data and the compute  

play35:45

goes up because it can be more effectively used.  And that drives more investment in these areas. If  

play35:50

your compute performance goes up, then the value  of the data goes up because you can utilize more  

play35:55

data. So there are positive feedback loops between  all these things. That was the first thing. 

play36:00

And then the second thing was just looking at  the trends. If the scalable algorithms were to  

play36:07

be discovered, then during the 2020s, it should be  possible to start training models on significantly  

play36:15

more data than a human would experience in a  lifetime. And I figured that that would be a  

play36:20

time where big things would start to happen  that would eventually unlock AGI. So that was  

play36:27

my reasoning process. And I think we're now at  that first part. I think we can start training  

play36:31

models now with the scale of the data that is  beyond what a human can experience in a lifetime.  

play36:36

So I think this is the first unlocking step. And so, yeah, I think there's a 50% chance that  

play36:43

we have AGI by 2028. Now, it's just a 50% chance.  I'm sure what's going to happen is we’re going to  

play36:48

get to 2029 and someone's going to say, “Shane,  you were wrong.” Come on, I said 50% chance. 

play36:55

I think it's entirely plausible but I'm not  going to be surprised if it doesn't happen by  

play37:04

then. You often hit unexpected problems  in research and science and sometimes  

play37:11

things take longer than you expect. If we're in 2029 and it hasn't happened  

play37:17

yet, if there was a problem that caused it,  what would be the most likely reason for that? 

play37:24

I don't know. At the moment, it looks to me like  all the problems are likely solvable with a number  

play37:36

of years of research. That's my current sense. And what does the time from here to 2028 look  

play37:41

like if 2028 ends up being the year? Is it just that we have trillions of  

play37:45

dollars of economic impact in the meantime  and the world gets crazy or what happens? 

play37:51

I think you'll see the existing models  maturing. They'll be less delusional,  

play37:58

much more factual. They'll be more up to date  on what's currently going on when they answer  

play38:02

questions. They'll become multimodal,  much more than they currently are. And  

play38:11

this will just make them much more useful. So I think probably what we'll see more than  

play38:15

anything is just loads of great applications  for the coming years. There can be some misuse  

play38:23

cases as well. I'm sure somebody will come up  with something to do with these models that is  

play38:30

quite unhelpful. But my expectation for the  coming years is mostly a positive one. We'll  

play38:35

see all kinds of really impressive, really  amazing applications for the coming years. 

play38:43

And on the safety point, you mentioned these  different research directions that are out  

play38:48

there and that you are doing internally in  DeepMind as well. Interpretability, RLAIF and  

play38:52

so on. Which are you most optimistic about? Oooh. I don't know. I don't want to pick  

play39:01

favorites. It's hard picking favorites. I know  the people working on all these areas. I think  

play39:12

things of the sort of system 2 flavor. There's  work we have going on that Jeffrey Irving leads  

play39:20

called Deliberative Dialogue, which has the System  2 flavor where a sort of debate takes place about  

play39:30

the actions that an agent could take or what's  the correct answer to something like this. And  

play39:36

people then can sort of review these debates and  so on. And they use these AI algorithms to help  

play39:42

them judge the correct outcomes and so on. And so  this is sort of meant to be a way in which to try  

play39:49

to scale the alignment to increasingly powerful  systems. I think things of that kind of flavor  

play40:00

have quite a lot of promise in my opinion,  but that's kind of quite a broad category.  

play40:05

There are many different topics within that. That's interesting. So you mentioned two areas  

play40:11

in which LLMs needs to improve. One is the  episodic memory and the other is the System  

play40:15

2 thinking. Are those two related  or are they two separate drawbacks? 

play40:23

I think they're fairly separate, but they can be  somewhat related. You can learn different ways  

play40:29

of thinking through problems and actually learn  about this rapidly using your episodic memory.  

play40:36

All these different systems and subsystems  interact so they're never completely separate.  

play40:41

But I think conceptually you can probably  think of them as quite separate things. 

play40:44

I think delusions and factuality is another  area that's going to be quite important and  

play40:51

particularly important in lots of applications.  If you want a model that writes creative poetry,  

play40:57

then that's fine because you want to be able to  be very free to suggest all kinds of possibilities  

play41:01

and so on. You're not really constrained by a  specific reality. Whereas if you want something  

play41:06

that's in a particular application, normally  you have to be quite concrete about what's  

play41:12

currently going on and what is true and what  is not true and so on. And models are a little  

play41:17

bit sort of freewheeling when it comes to truth  and creativity at the moment. And that I think  

play41:22

limits their applications in many ways. The final question is this. You've been  

play41:27

in this field for over a decade, much longer  than many others, and you've seen different  

play41:34

landmarks like ImageNet and Transformers. What  do you think the next landmark will look like? 

play41:41

I think the next landmark that people will  think back to and remember is going much  

play41:49

more fully multimodal. That will open out the  sort of understanding that you see in language  

play41:58

models into a much larger space of possibilities.  And when people think back, they'll think about,  

play42:05

“Oh, those old fashioned models, they just did  like chat, they just did text.” It just felt like  

play42:10

a very narrow thing whereas now they understand  when you talk to them and they understand images  

play42:16

and pictures and video and you can show them  things or things like that. And they will have  

play42:21

much more understanding of what's going on. And  it'll feel like the system's kind of opened up  

play42:24

into the world in a much more powerful way. Do you mind if I ask a follow-up on that?  

play42:31

ChatGPT just released their multimodal feature  and you, in DeepMind, you had the Gato paper,  

play42:36

where you have this one model where you can throw  images, video games and even actions in there. So  

play42:43

far it doesn't seem to have percolated as much  as ChatGPT initially from GPT3 or something. 

play42:50

What explains that? Is it just  that people haven't learned to use  

play42:52

multimodality? They're not powerful enough yet? I think it's early days. I think you will see  

play42:59

understanding images and things more and  more. But I think it's early days in this  

play43:03

transition is when you start really digesting  a lot of video and other things like that,  

play43:09

that the systems will start having a much more  grounded understanding of the world and all kinds  

play43:13

of other aspects. And then when that works well,  that will open up naturally lots and lots of new  

play43:20

applications and all sorts of new possibilities  because you're not confined to text chat anymore. 

play43:25

New avenues of training data as well, right? Yeah, new training data and all kinds of different  

play43:31

applications that aren't just purely textual  anymore. And what are those applications? Well,  

play43:37

probably a lot of them we can't even imagine  at the moment because there are just so many  

play43:41

possibilities once you can start dealing with all  sorts of different modalities in a consistent way. 

play43:46

Awesome. I think that's an actionable place to  leave it off. Thank you so much for coming on  

play43:49

the podcast Shane. Thank you.

Rate This

5.0 / 5 (0 votes)

Related Tags
通用人工智能Shane LeggDeepMind技术进展认知任务多模态伦理安全系统2思考未来预测行业影响
Do you need a summary in English?