Shane Legg (DeepMind Founder) - 2028 AGI, Superhuman Alignment, New Architectures
Summary
TLDRShane Legg,Google DeepMind的创始人兼首席AGI科学家,在播客中探讨了通用人工智能(AGI)的进展衡量、当前大型语言模型的不足以及未来的发展路径。他强调了人类认知的多样性和复杂性,以及实现AGI所需的多方面测试和理解。Legg还讨论了深度学习模型的样本效率问题,以及如何通过架构创新和对伦理的深入理解来实现对强大AI的安全对齐。此外,他还预测了多模态AI将成为该领域的下一个重要里程碑。
Takeaways
- 🤖 AGI(人工通用智能)的定义是能够执行人类通常能做的各种认知任务的机器。
- 📈 衡量AGI进展需要多种不同类型的测试和测量,涵盖人类认知任务的广泛范围。
- 🧠 当前的AI模型,如大型语言模型,缺乏类似人类的情景记忆(episodic memory)的能力。
- 🔍 现有的基准测试无法充分衡量人类的某些认知方面,如理解视频流等。
- 🌐 未来的AI发展需要解决样本效率问题,即如何让AI在较少的数据上也能快速学习。
- 🎮 人类水平的AI到来的标志不是单一的测试,而是一系列综合测试和无法找到AI性能低于人类的任务。
- 🧭 早期研究强调AI应该能够在多种开放式环境中操作和成功,类似于视频游戏。
- 🔄 深度学习模型目前通过增加上下文窗口和训练基础模型来学习,但缺少中间过程。
- 🌟 未来的AI里程碑可能是更加全面的多模态理解,这将极大地扩展AI的应用范围。
- 🚀 目前的AI安全研究包括可解释性、过程监督、红队测试、评估危险能力和机构治理等。
Q & A
如何具体衡量我们朝着人工通用智能(AGI)的进展?
-衡量AGI的进展需要多种不同类型的测试和测量,这些测试应该涵盖人类可以执行的各种认知任务的广度。我们需要了解人类在这些任务上的表现,然后判断AI系统是否达到了人类的水平。如果AI系统能够在广泛的任务上达到人类的表现,并且很难找到它表现不佳的任务,那么我们可以认为已经实现了AGI。
目前的基准测试有哪些局限性?
-目前的基准测试,如MMLU等,主要针对语言模型,因此它们不能衡量人类理解视频流等非语言任务的能力。此外,它们也不测试人类的记忆系统,如情景记忆,这是人类快速学习特定信息的能力。
样本效率与人类的记忆系统有何关联?
-样本效率与人类的记忆系统密切相关,因为它是人类能够快速学习新事物的能力之一。大型语言模型在一定程度上具有样本效率,因为它们可以在上下文窗口中快速学习新信息。然而,现有的系统缺乏人类快速学习特定信息的能力,即情景记忆。
深度学习模型的样本效率是否是一个根本性缺陷?
-样本效率不是深度学习模型的根本性缺陷。虽然这些模型需要比人类一生中能看到的更多数量级的数据,但我们可以预期,随着研究的深入,将有清晰的路径来解决现有模型的这些不足,包括它们的记忆和学习类型。
如何具体判断人工智能达到了人类水平?
-无法通过单一的测试或任务来判断人工智能达到了人类水平,因为这是一个关于通用智能的问题。我们需要确保AI能够在大量不同的任务上达到人类水平,并且没有明显的能力差距。此外,当人们尝试找到机器失败的任务示例时,如果他们无法成功,那么我们就可以认为AI达到了人类水平。
Shane Legg对于人工智能在开放环境中操作和成功的看法有何变化?
-Shane Legg的早期研究强调了人工智能应该能够在各种开放环境中操作和成功。随着时间的推移,他的观点有所发展,从尝试定义一个完全通用、数学上清晰的智能概念,转变为更具体地考虑人类智能和构建能够匹配人类智能的机器,因为我们理解人类智能是一个非常强大的东西。
Shane Legg如何看待AlphaFold等特定领域的模型与AGI的关系?
-像AlphaFold这样的特定领域模型并不是直接推动AGI发展的,它们可能在这个过程中学到了一些可能相关的东西,但Shane Legg并不认为它们是通往AGI的道路。不过,DeepMind作为一个大团队,会探索各种不同的项目,包括AlphaFold这样的重大项目。
Shane Legg对于未来的人工智能模型有什么预测?
-Shane Legg预测,未来的人工智能模型将会更加多模态,这将打开更广阔的可能性空间。他认为,未来的人们回顾过去的模型时,会认为它们只处理文本聊天是非常狭窄的。未来的模型将能够理解图像、视频,并能处理更多种类的数据,这将使系统以更强大的方式与世界建立联系。
Shane Legg如何看待当前的人工智能安全研究?
-Shane Legg认为,为了实现真正符合伦理的AI系统,我们需要构建具有深入理解世界、良好伦理观念和稳健推理能力的系统。他强调,我们需要训练系统理解伦理,并且需要社会和伦理学家来确定系统应该遵循的伦理原则。此外,我们需要确保系统在每次做出决策时都进行伦理分析,并且人类专家需要不断检查这些决策和推理过程。
Shane Legg对于如何确保AI遵循我们设定的伦理原则有何看法?
-Shane Legg认为,我们需要向系统明确指出应该遵循的伦理原则,并且我们需要检查系统在实际操作中是否一致地遵循这些原则。他提出,通过深入理解世界和伦理,以及稳健的推理,系统应该能够在面对不同选项时进行伦理分析,并执行最符合伦理的行动。
Shane Legg如何看待当前多模态AI模型的发展?
-Shane Legg认为,目前的多模态AI模型还处于早期阶段,但他预期未来人们将越来越多地理解图像和视频等内容。他认为,当系统能够很好地处理视频和其他类型的数据时,它们将对世界有更深入的理解,这将自然地打开许多新的应用和可能性。
Outlines
🤖 对话Shane Legg:AGI的测量与定义
Shane Legg,作为Google DeepMind的首席AGI科学家,讨论了如何具体衡量向人工通用智能(AGI)的进展。他指出,由于AGI的定义是关于通用性而非特定任务,因此需要多种不同类型的测量和测试来跨越人类可以执行的各种认知任务的广度。Shane强调,尽管我们可能永远无法完全涵盖人类能做的所有事情,但如果我们能够对一系列认知任务进行测试,并且有一个AI系统能够在这些任务上达到人类的表现水平,那么我们就接近拥有AGI了。
🚀 深度学习模型的进步与未来
Shane Legg讨论了大型语言模型(LLMs)的进步,以及我们目前使用的基准测试所缺失的人类认知方面。他提到,目前的模型并不测量像理解流媒体视频这样的事物,因为它们是语言模型,而人类能够做到这一点。他还讨论了人类的记忆系统,特别是我们如何在模型中缺乏这种类型的记忆。Shane认为,尽管现有的系统有一些问题,但他认为这些问题可以通过研究和工作得到解决,并且我们正在走向一个更清晰的未来,其中这些缺陷将得到充分解决。
🎮 人工智能在游戏中的应用与思考
Shane Legg回顾了他早期关于人工智能应该能够在各种开放式环境中操作和成功的研究,并将其与当前的情况联系起来。他提到,他的观点已经有所发展,从尝试定义一个极其通用和数学上清晰的智能框架,到现在更多地关注于构建能够匹配人类智能的机器。Shane认为,人类智能是一个有意义的参考点,因为我们理解它,知道它是一个非常强大的东西,具有经济、哲学和历史的重要性。
🧠 人工智能的记忆与学习
Shane Legg深入讨论了人工智能模型中记忆的必要性,特别是情景记忆,以及它如何与样本效率相关。他认为,当前的架构并没有真正实现这一点,因为它们主要是基于非常流动的上下文窗口和缓慢更新的权重。Shane提出,为了实现真正的创造性问题解决,我们需要在模型中加入搜索过程,以便在可能性的空间中寻找隐藏的宝藏。
🔍 人工智能的未来发展与安全
Shane Legg讨论了为达到人类水平甚至超人类水平的AI所需的对齐工作。他强调,我们需要确保AI系统从一开始就是一个高度道德价值对齐的系统。Shane提出了一个观点,即AI系统需要具备深入理解世界、良好的伦理理解以及可靠的推理能力,以便能够分析面前的不同选择,并执行最符合伦理的行动方案。
🌐 多模态AI:未来的里程碑
Shane Legg预测,未来的AI里程碑将与多模态理解有关,这将使AI系统能够以更加全面和深入的方式理解世界。他认为,未来的AI系统将不再局限于文本聊天,而是能够理解图像、视频等多种模态,这将为新的应用和可能性打开大门。Shane强调,尽管目前我们仍处于多模态AI的早期阶段,但随着时间的推移,我们将看到AI在理解和处理不同类型的数据方面取得更大的进步。
Mindmap
Keywords
💡人工通用智能(AGI)
💡损失函数(loss numbers)
💡认知任务
💡样本效率(sample efficiency)
💡工作记忆(working memory)
💡基准测试(benchmarks)
💡多模态(multimodal)
💡强化学习(reinforcement learning)
💡安全对齐(safety alignment)
💡系统2思维(System 2 thinking)
Highlights
衡量通用人工智能(AGI)的进展是一个挑战,因为它涉及广泛的任务类型。
AGI的定义是能够执行人类通常能做的各种认知任务的机器。
为了测试AGI,我们需要多种不同类型的测量和测试,涵盖人类能做的所有认知任务。
当前的大型语言模型在某些认知任务上还不足,例如理解视频流和情景记忆。
样本效率与人类的认知效率相关,它使得人类能够快速学习特定事物。
深度学习模型可能不是样本效率的根本限制,未来可能有解决路径。
达到人类水平的AI需要通过一系列全面的测试,并且在对抗性测试中没有明显弱点。
AI研究应该关注于构建能够在开放环境中操作和成功的智能系统。
DeepMind的研究不仅限于AGI,还涉及可持续能源、气候变化等多个领域。
为了实现AGI,我们需要在架构上进行创新,以支持快速学习和深度泛化。
AlphaFold等特定领域的模型虽然强大,但并不直接推动AGI的发展。
真正的创造性问题解决需要通过搜索空间来寻找可能性,而不仅仅是模仿数据。
未来的AI系统将更加多模态,能够理解和处理图像、视频等多种类型的数据。
AI的安全性需要通过确保系统具有深入的世界模型、伦理理解和可靠的推理能力来实现。
强化学习和其他技术可能存在风险,需要检查AI的推理过程和伦理理解。
为了实现伦理对齐,AI系统需要在决策时进行伦理分析,并执行最符合伦理的行动方案。
DeepMind的成立初衷是为了加速AI安全研究,但其对AI能力的推动影响难以评估。
Shane Legg预测,有50%的可能性在2028年实现AGI,但这是基于当前趋势的推测。
未来的AI里程碑可能是多模态理解的突破,这将极大地扩展AI的应用范围和能力。
Transcripts
Today I have the pleasure of interviewing Shane Legg,
who is the founder and the Chief AGI scientist of Google DeepMind. Shane, welcome to the podcast.
Thank you. It's a pleasure being here. First question. How do we measure progress
towards AGI concretely? We have these loss numbers and we can see how the loss improves
from one model to another, but it's just a number. How do we interpret this? How do we
see how much progress we're actually making? That’s a hard question. AGI by its definition
is about generality. It's not about doing a specific thing. It's much easier to measure
performance when you have a very specific thing in mind because you can construct a test around that.
Maybe I should first explain what I mean by AGI because there are a few different
notions around it. When I say AGI, I mean a machine that can do the sorts of cognitive things
that people can typically do, possibly more. To be an AGI that's the bar you need to meet.
So if we want to test whether we're meeting the threshold or we're getting close to the threshold,
what we actually need is a lot of different kinds of measurements and tests that span the breadth
of all the sorts of cognitive tasks that people can do and then to have a sense of what human
performance is on these sorts of tasks. That then allows us to judge whether or not we're there.
It's difficult because you'll never have a complete set of everything that people can
do because it's such a large set. But I think that if you ever get to the point where you
have a pretty good range of tests of all sorts of cognitive things that we can do, and you have
an AI system which can meet human performance and all those things and then even with effort,
you can't actually come up with new examples of cognitive tasks where the machine is below human
performance then at that point, you have an AGI. It may be conceptually possible that there is
something that the machine can't do that people can do but if you can't find it with some effort,
then for all practical purposes, you have an AGI. Let's get more concrete. We measure the
performance of these large language models on MMLU and other benchmarks. What is missing from the
benchmarks we use currently? What aspect of human cognition do they not measure adequately?
Another hard question. These are quite big areas. They don't measure things like
understanding streaming video, for example, because these are language models and people
can do things like understanding streaming video. They don't do things like episodic memory. Humans
have what we call episodic memory. We have a working memory, which are things that have
happened quite recently, and then we have a cortical memory, things that are sort of
being in our cortex, but there's also a system in between, which is episodic memory, which is the
hippocampus. It is about learning specific things very, very rapidly. So if you remember some of the
things I say to you tomorrow, that'll be your episodic memory hippocampus.
Our models don't really have that kind of thing and we don't really test for that kind of thing.
We just sort of try to make the context windows, which is more like working memory,
longer and longer to sort of compensate for this. But it is a difficult question because the
generality of human intelligence is very, very broad. So you really have to start going into
the weeds of trying to find if there's specific types of things that are missing from existing
benchmarks or different categories of benchmarks that don't currently exist or something.
The thing you're referring to with episodic memory, would it be fair to call that sample
efficiency or is that a different thing? It's very much related to sample efficiency.
It's one of the things that enables humans to be very sample efficient. Large language
models have a certain kind of sample efficiency because when something's in their context window,
that biases the distribution to behave in a different way and so that's a very rapid kind
of learning. There are multiple kinds of learning and the existing systems have some of them,
but not others. It's a little bit complicated. Is this kind of memory, what we call sample
efficiency, a fatal flaw of these deep learning models that it just takes trillions of tokens,
a magnitude more than what any human will see throughout their lifetime or is this
something that will be solved over time? The models can learn things immediately
when it's in the context window and then they have this longer process when you
actually train the base model and that's when they're learning over trillions of tokens.
But they miss something in the middle. That's sort of what I'm getting at here.
I don't think it's a fundamental limitation. I think what's happened with large language
models is something fundamental has changed. We know how to build models now that have
some degree of understanding of what's going on. And that did not exist in the past. And because
we've got a scalable way to do this now, that unlocks lots and lots of lots of new things.
Now we can look at things which are missing, such as this sort of episodic memory type thing,
and we can then start to imagine ways to address that. My feeling is that there are relatively
clear paths forward now to address most of the shortcomings we see in the existing models,
whether it's about delusions, factuality, the type of memory and learning that they have, or
understanding video, or all sorts of things like that. I don't see any big blockers. I don't see
big walls in front of us. I just see that there's more research and work and all these things will
improve and probably be adequately solved. Going back to the original question of how do
you measure when human level AI has arrived or has gone beyond it. As you mentioned,
there's these other sorts of benchmarks you can use and other sorts of traits,
but concretely, what would it have to do for you to be like, “Okay, we've reached human level.”
Would it have to beat Minecraft from start to finish? Would it have to get
100% on MMLU? What would it have to do? There is no one thing that would do it,
because I think that's the nature of it. It's about general intelligence. So I'd
have to make sure it could do lots and lots of different things and it didn't have a gap.
We already have systems that can do very impressive categories of things to human
level or even beyond. I would want a whole suite of tests that I felt was very comprehensive and
then furthermore, when people come in and say, “Okay, so it's passing a big suite of tests,
let's try to find examples. Let's take an adversarial approach to this. Let's
deliberately try to find examples where people can clearly, typically do this, but
the machine fails.” And when those people cannot succeed, I'll go, “Okay, we're probably there.”
A lot of your earlier research, at least the ones I could find, emphasized that AI should be
able to manipulate and succeed in a variety of open-ended environments. It almost sounds like
a video game. Is that where your head is still at now, or do you think about it differently?
It’s evolved a bit. When I did my thesis work around universal intelligence, I was trying to
come up with an extremely universal, general, mathematically clean framework for defining
and measuring intelligence. I think there were aspects of that that were successful. I think
in my own mind, it clarified the nature of intelligence as being able to perform well
in lots of different domains and different tasks and so on. It's about that sort of capability of
performance and the breadth of performance. I found that was quite helpful and enlightening.
There was always the issue of the reference machine. In the framework, you have a weighting
of things according to the complexity. It's like an Occam's razor type of thing, where you weight
tasks and environments which are simpler, more highly. You’ve got a countable space of
semi-computable environments. And that Kolmogorov complexity measure has something built into it,
which is called a reference machine. And that's a free parameter. So that means that
the intelligence measure has a free parameter in it and as you change that free parameter,
it changes the weighting and the distribution over the space of all the different tasks and
environments. This is sort of an unresolved part of the whole problem. So what reference
machine should we ideally use? There's no universal reference machine. People will
usually put a Turing machine in there, but there are many kinds of different machines.
Given that it's a free parameter, I think the most natural thing to do is to think about what's
meaningful to us in terms of intelligence. I think human intelligence is meaningful to us in the
environment that we live in. We know what human intelligence is. We are human too. We interact
with other people who have human intelligence. We know that human intelligence is possible,
obviously, because it exists in the world. We know that human intelligence is very,
very powerful because it's affected the world profoundly in countless ways. And we know if
human level intelligence was achieved, that would be economically transformative because
the types of cognitive tasks people do in the economy could be done by machines then. And it
would be philosophically important because this is sort of how we often think about intelligence.
Historically it would be a key point. So I think that human intelligence in a human-like
environment is quite a natural sort of reference point. You could imagine setting your reference
machine to be such that it emphasizes the kinds of environments that we live in as opposed to some
abstract mathematical environment. And so that's how I've kind of gone on this journey of — “Let's
try to define a completely universal, clean, mathematical notion of intelligence” to “Well,
it's got a free parameter. “ One way of thinking about it is to
think more concretely about human intelligence and build machines that can match human intelligence.
Because we understand what that is and we know that that is a very powerful thing. It has
economic, philosophical and historical importance. The other aspect of course is that, in this pure
formulation of Kolmogorov complexity, it's actually not computable. I also knew that
there was a limitation at the time but it was an effort to just theoretically come
up with a clean definition. I think we can sort of get there, but we have this issue
of a reference machine, which is unspecified. Before we move on, I do want to ask a question
on the original point you made on LLMs needing episodic memory. You said that
these are problems that we can solve and these are not fundamental impediments.
But when you say that, do you think they will just be solved by scale or do each
of these need a fine-grained specific solution that is architectural in nature?
I think it'll be architectural in nature because the current architectures don't really have what
you need to do this. They basically have a context window, which is very, very fluid,
of course, and they have the weights, which things get baked into very slowly. So to my mind,
that feels like working memory, which is like the activations in your brain, and then the
weights are like the synapses in your cortex. Now, the brain separates these things out. It
has a separate mechanism for rapidly learning specific information because that's a different
type of optimization problem compared to slowly learning deep generalities. There's a tension
between the two but you want to be able to do both. You want to be able to hear someone's name
and remember it the next day. And you also want to be able to integrate information over a lifetime
so you start to see deeper patterns in the world. These are quite different optimization targets,
different processes, but a comprehensive system should be able to do both. And so I think it's
conceivable you could build one system that does both, but you can also see that because
they're quite different things, it makes sense for them to be done differently. I
think that's why the brain does it separately. I'm curious about how concretely you think that
would be achieved. DeepMind has been working on these domain specific reinforcement learning
type setups: AlphaFold, AlphaCode and so on. How does that fit into what you see as a path to AGI?
Have these just been orthogonal domain specific models or do they feed into the eventual AGI?
Things like AlphaFold are not really feeding into AGI. We may learn things in the process
that may end up being relevant, but I don't see them as likely being on the path to AGI.
But we're a big group. We've got hundreds and hundreds and hundreds of PhDs working on lots
of different projects. When we find what we see as opportunities to do something significant like
AlphaFold, we'll go and do it. It's not like we only do AGI type work. We work on fusion reactors
and various things in sustainability, energy. We've got people looking at satellite images
of deforestation. We have people looking at weather forecasting. We've got tons
of people working on lots of things. On the point you made earlier about
the reference machine as human intelligence. It's interesting because one of the things
you mentioned in your 2008 thesis about how you would measure intelligence was — You said
you could do a compression test and you could see if it fills in words and a sample of text
and that could measure intelligence. And funnily enough, that's basically how the LLMs are trained.
At the time, did it stick out to you as an especially fruitful thing to train for?
Well, yeah. In the sense what's happened is actually very aligned with what I wrote
about in my thesis. The ideas from Marcus Hutter with AIXI, where you take Solomonoff induction,
which is this incomputable but theoretically very elegant and extremely sample efficient
prediction system, and then once you have that, you can build a general agent on top of it by
basically adding search and reinforcement signal. That's what you do with AIXI.
But what that sort of tells you is that if you have a fantastically good sequence predictor,
some approximation of Solomonoff induction, then going from that to a very powerful, very general
AGI system is just sort of another step. You've actually solved a lot of the problem already.
And I think that's what we're seeing today actually, that these incredibly
powerful foundation models are incredibly good sequence predictors that are compressing the
world based on all this data. And then you will be able to extend these in different ways and
build very, very powerful agents. Let me ask you more about that.
Richard Sutton's Bitter Lesson essay says that there's two things you can scale,
search and learning. I guess you could say that LLMs are about the learning aspect. The
search stuff, which you worked on throughout your career, where you have an agent that is
interacting with this environment, is that the direction that needs to be explored again? Or
is that something that needs to be added to LLMs where they can actually interact
with their data or the world or in some way? Yeah, I think that's on the right track. These
foundation models are world models of a kind and to do really creative problem solving, you need to
start searching. If I think about something like AlphaGo and the famous Move 37, where did that
come from? Did that come from all its data that it's seen of human games or something like that?
No, it didn't. It came from it identifying a move as being quite unlikely, but plausible. And then
via a process of search, coming to understand that it was actually a very, very good move.
So to get real creativity, you need to search through spaces of possibilities and find these
hidden gems. That's what creativity is. Current language models don't really do that. They really
are mimicking the data. They are mimicking all the human ingenuity and everything, which they have
seen from all this data that's coming from the internet that's originally derived from humans.
These models can blend things. They can do Harry Potter in the style of a Kanye West
rap or something, even though it's never happened, they can blend things together.
But if you want a system that can go truly beyond that and not just generalize in novel ways and do
something that's truly creative, that is not just a blending of existing things,
that requires searching through a space of possibilities and finding these hidden gems
that are hidden away in there somewhere. And that requires search. So I don't think we'll
see systems that truly step beyond their training data until we have powerful search in the process.
There are rumors that Google DeepMind is training newer models, and you don't have
to comment on those specifically, but when you do that, if it's the case that something like
search is required to go to the next level, are you training in a completely different way than
how GPT-4 or other transformers are trained? And I can't say much about how we're training.
I think it's fair to say we're roughly doing the sorts of scaling and training that you see many
people in the field doing but we have our own take on it and our own different tricks and techniques.
Okay, maybe we'll come back to it and get another answer on that.
Let's talk about alignment briefly. What will it take to align human level and superhuman AIs?
It's interesting because the sorts of reinforcement learning and self-play kinds
of setups that are popular now, like Constitution AI or RLHF, DeepMind obviously has expertise in
it for decades longer. I'm curious what you think of the current landscape and how DeepMind pursues
that problem of safety towards human level models. Do you want to know about what we're currently
doing or do you want me to have a stab at what I think needs to be done?
Needs to be done. Currently we're doing lots
of things. We're doing interpretability. We're doing our process supervision. We're doing red
teaming. We're doing evaluation for dangerous capabilities. We're doing work on institutions
and governance and tons of stuff, right? Anyway, what do I think needs to be done?
I think that powerful machine learning, powerful AGI, is coming in some time and if the system is
really capable, really intelligent, really powerful, trying to somehow contain it or
limit it is probably not a winning strategy because these systems ultimately will be very,
very capable. So what you have to do is you have to align it. You have to get it such that it's
fundamentally a highly ethical value aligned system from the get go. How do you do that?
Maybe this is slightly naive, but this is my take on it — How do people do it? If you have
a really difficult ethical decision in front of you, what do you do? You don't just do the
first thing that comes to mind, because there could be a lot of emotions involved
in other things. It's a difficult problem. What you have to do is to calm yourself down.
You've got to sit down and you've got to think about it. You've got to think, “Well, okay, what
could I do?” I could do this. I could do this. I could do this. If I do each of these things,
what will happen? So that requires a model of the world. And then you have to think about ethically,
how do I view each of these different actions and the possibilities and what might happen from it?
What is the right thing to do? And as you think about all the different possibilities and your
actions and what can follow from them and how it aligns with your values and your ethics,
you can then come to some conclusion of what is really the best choice that you should be making
if you want to be really ethical about this. I think AI systems need to essentially do the same
thing. When you sample from a foundation model at the moment, it's blurting out the first thing.
It's like System 1, if you like, from psychology, from Kahneman et al. That's not good enough.
And if we do RLHF without human feedback (RLAIF), Constitutional AI tries to do that sort of thing,
you're trying to fix the underlying System 1 in a sense. That can shift the distribution
and that can be very helpful but it's a very high dimensional distribution and you're sort of poking
it in a whole lot of points. So it's not likely to be a very robust solution. It's like trying to
train yourself out of a bad habit. You can sort of do it eventually. But what you need to do is
you need to have a System 2. You need the system to not just sample from the model. You need the
system to go, “Okay, I'm going to reason this through. I'm going to do step by step reasoning.
What are the options in front of me? I'm going to use my world model now and I'm going to use a good
world model to understand what's likely to happen from each of these options.” And then reason about
each of these from an ethical perspective. So you need a system which has a deep
understanding of the world, a good world model, and has a good understanding of people,
and has a good understanding of ethics, and it has robust and very reliable reasoning.
And then you set it up in such a way that it applies this reasoning and this understanding
of ethics to analyze the different options which are in front of it and then execute
on which is the most ethical way forward. But when a lot of people think about the
fundamental alignment problem, the worry is not that it's not going to have a world model
to understand the effects of its actions, the worry is that the effects it cares about are
not the ones we will care about. So even if you improve its systems, and its thinking and
do better planning, the fundamental problem is — We have these really nuanced values about what
we want. How do we communicate those values and make sure they're reinforced in the AI?
It needs not just a good model of the world, but it needs a really good understanding of
ethics. And we need to communicate to the system what ethics and values it should be following.
And how do we do that in a way that we can be confident that a super human level
model will preserve those values or have learned them in the first place?
It should preserve them because if it's making all its decisions based on a good understanding
of ethics and values, and it's consistent in doing this, it shouldn't take actions which
undermine that. That would be inconsistent. Right, so then how do we get to the point
where it has learned them in the first place? Yeah, that's the challenge. We need to have
systems. The way I think about it is this: to have a profoundly ethical AI system, it also has to be
very, very capable. It needs a really good world model, a really good understanding of ethics,
and it needs really good reasoning. Because if you don't have any of those things,
how can you possibly be consistently profoundly ethical? You can't. So we actually need better
reasoning, better understanding of the world, and better understanding of ethics in our systems.
It seems to me that the former two would just come along for the
ride as these models get more powerful. Yeah. That's a nice property because it's
actually a capabilities thing to some extent. But if the third one, the ethical model, is a
bottleneck, or if it’s a thing that doesn't come along with the AI itself, what is the
actual technique to make sure that that happens? First of all, we should train the system on ethics
generally so that it understands human ethics well. There's a lot of lectures and papers
and books and all sorts of things. We need to make sure it understands humans ethics well,
at least as well as a very good ethicist because that's important.
And we then need to decide, of this general understanding of ethics, what do we want the
system to actually value and what sort of ethics do we want it to apply? Now, that's
not a technical problem. That's a problem for society and ethicists and so on to come up with.
I'm not sure there's such a thing as optimal ethics but I'm pretty sure that it's possible to
come up with a set of ethics, which is much better than what the so-called doomers are worried about
in terms of the behavior of these AGI systems. And then what you do is you engineer the system
to actually follow these things so that every time it makes a decision, it does an analysis
using a deep understanding of the world and of ethics and very robust and precise reasoning to
do an ethical analysis of what it's doing. And of course, we would want lots of other
things. We would want people checking these processes of reasoning. We’d want
people verifying that it's behaving itself in terms of how it reaches these conclusions.
But I still feel like I don't understand how that fundamental problem of making sure it follows that
ethic works. Because presumably, it has read Mao’s books so it understands Maro’s ethics
and understands all these other ethics. How do we make sure the ethic that ethicists say
is the one is what it ends up following and not the other ones it understands?
Right. So you have to specify to the system, these are ethical principles that you should follow.
And how do we make sure it does that? We have to check it as it's doing it. We
have to assure ourselves that it is consistently following these ethical principles at least
as well as a group of human experts. Are you worried that if you do it the
default way, which is just reinforcing it whenever it seems to be following them,
you could be training deception as well? Reinforcement does have some dangerous
aspects to it. I think it's actually more robust to check the process of reasoning and
check its understanding of ethics. To reassure ourselves that the system has a really good
understanding of ethics, it should be grilled for some time to try to really pull apart its
understanding and make sure it is very robust. And also, if it's deployed, we should have
people constantly looking at the decisions it’s making and the reasoning process that goes into
those decisions to try to make sure that it is correctly reasoning about these types of things.
Do you have some sort of framework for that at Google DeepMind?
This is not so much a Google DeepMind perspective on this. This is my take on how I think we
need to do this kind of thing. There are many different views within and there are different
variants on these sorts of ideas as well. So then do you personally think there needs
to be some sort of framework for as you arrive at certain capabilities, these are the concrete
safety benchmarks that you must have instated at this point, or you should pause or slow down?
I think that's a sensible thing to do but it's actually quite hard to do. There are some people
thinking about that. I know Anthropic has put out some things like that. We were thinking
about similar things but actually putting concrete things down is quite a hard thing
to do. I think it's an important problem and I certainly encourage people to work on it.
It's interesting because you have these blog posts that you wrote when
you started DeepMind, back in 2008, where the motivation was to accelerate safety.
On net, what do you think the impact of DeepMind has been on safety versus capabilities?
Ooh, interesting. I don't know. It's hard to judge, actually.
I've been worried about AGI safety for a long time, well before DeepMind. But it
was always really hard to hire people to work on AGI safety, particularly in the early days.
Back in 2013 or so, we had our first hire and he only agreed to do it part-time because he
didn't want to drop all the capabilities work because of the impact it could have on his
career. And this was someone who had already previously been publishing in AGI safety.
I don't know. It's hard to know what is the counterfactual if we weren't there doing
it. We have been a group that has talked about this openly. I've talked about the
importance of it on many occasions. We've been hiring people to work on these topics. I know
a lot of other people in the area and I've talked to them over many, many years. I've
known Dario since 2005 or something and we've talked on and off about AGI safety and so on.
The impact that DeepMind has had: I guess we were the first AGI company and as the first AGI
company, we always had an AGI safety group. We've been publishing papers on this for many years. I
think that's lent some credibility to the area of AGI safety. AGI was a fringe term not that long
ago. I hope that creates some space for people. Where do you think AI progress itself
would have been without DeepMind? This is not just a point that people make
about DeepMind. I think this is a general point people make about OpenAI and Anthropic as well,
that these people went into the business to accelerate safety and the net effect might have
been to accelerate capabilities far more. Right, right. I think we have accelerated
capabilities, but again, the counterfactuals are quite difficult. We didn't do ImageNet,
for example, and ImageNet was very influential in attracting investment
to the field. We did do AlphaGo, and that changed some people's minds. But,
the community is a lot bigger than just DeepMind. If you went back more than five years in the
future, we were able to do bigger projects with bigger teams and take on more ambitious things
than a lot of the smaller academic groups, right? And so the sort of nature of the
type of work we could do was a bit different. And that affected the dynamics in some ways.
But, the community is much, much bigger than DeepMind. There are a number
of other players with significant resources. Maybe we've sped things up a bit, but I think
a lot of these things would have happened before too long anyway. Often good ideas are in the air,
and as a researcher, when you're about to publish something, you see somebody else
has got a very similar idea coming out with some good results. Often it's kind of like the time
is right for things. So I find it very hard to reason about the counterfactuals there.
Speaking of the early years, it's really interesting that in 2011,
you had a blog post where you said — “I’ve decided to once again leave my prediction for when human
level AGI will arrive unchanged. That is, I give it a log-normal distribution with a mean of 2028
and a mode of 2025, under the assumption that nothing crazy happens like a nuclear war.”
This is before deep learning, this is when nobody's talking about AI,
and it turns out that if the trends continue, this is not an unreasonable prediction.
How did you have that accurate of an estimate before all these trends came into effect?
First I'd say it's not before deep learning. Deep learning was getting started around 2008.
Oh, sorry. I meant to say before ImageNet. Before ImageNet? Yeah, that was 2012.
I first formed those beliefs around 2001 after reading Ray Kurzweil's The Age of Spiritual
Machines. There were two really important points in his book that I came to believe as true. One is
that computational power would grow exponentially for at least a few decades. And that the quantity
of data in the world would grow exponentially for a few decades. And when you have exponentially
increasing quantities of computation and data, then the value of highly scalable algorithms gets
higher and higher. There's a lot of incentive to make a more scalable algorithm to harness
all this computing data. So I thought it would be very likely that we'll start to discover scalable
algorithms to do this. And then there's a positive feedback between all these things, because if your
algorithm gets better at harnessing computing data, then the value of the data and the compute
goes up because it can be more effectively used. And that drives more investment in these areas. If
your compute performance goes up, then the value of the data goes up because you can utilize more
data. So there are positive feedback loops between all these things. That was the first thing.
And then the second thing was just looking at the trends. If the scalable algorithms were to
be discovered, then during the 2020s, it should be possible to start training models on significantly
more data than a human would experience in a lifetime. And I figured that that would be a
time where big things would start to happen that would eventually unlock AGI. So that was
my reasoning process. And I think we're now at that first part. I think we can start training
models now with the scale of the data that is beyond what a human can experience in a lifetime.
So I think this is the first unlocking step. And so, yeah, I think there's a 50% chance that
we have AGI by 2028. Now, it's just a 50% chance. I'm sure what's going to happen is we’re going to
get to 2029 and someone's going to say, “Shane, you were wrong.” Come on, I said 50% chance.
I think it's entirely plausible but I'm not going to be surprised if it doesn't happen by
then. You often hit unexpected problems in research and science and sometimes
things take longer than you expect. If we're in 2029 and it hasn't happened
yet, if there was a problem that caused it, what would be the most likely reason for that?
I don't know. At the moment, it looks to me like all the problems are likely solvable with a number
of years of research. That's my current sense. And what does the time from here to 2028 look
like if 2028 ends up being the year? Is it just that we have trillions of
dollars of economic impact in the meantime and the world gets crazy or what happens?
I think you'll see the existing models maturing. They'll be less delusional,
much more factual. They'll be more up to date on what's currently going on when they answer
questions. They'll become multimodal, much more than they currently are. And
this will just make them much more useful. So I think probably what we'll see more than
anything is just loads of great applications for the coming years. There can be some misuse
cases as well. I'm sure somebody will come up with something to do with these models that is
quite unhelpful. But my expectation for the coming years is mostly a positive one. We'll
see all kinds of really impressive, really amazing applications for the coming years.
And on the safety point, you mentioned these different research directions that are out
there and that you are doing internally in DeepMind as well. Interpretability, RLAIF and
so on. Which are you most optimistic about? Oooh. I don't know. I don't want to pick
favorites. It's hard picking favorites. I know the people working on all these areas. I think
things of the sort of system 2 flavor. There's work we have going on that Jeffrey Irving leads
called Deliberative Dialogue, which has the System 2 flavor where a sort of debate takes place about
the actions that an agent could take or what's the correct answer to something like this. And
people then can sort of review these debates and so on. And they use these AI algorithms to help
them judge the correct outcomes and so on. And so this is sort of meant to be a way in which to try
to scale the alignment to increasingly powerful systems. I think things of that kind of flavor
have quite a lot of promise in my opinion, but that's kind of quite a broad category.
There are many different topics within that. That's interesting. So you mentioned two areas
in which LLMs needs to improve. One is the episodic memory and the other is the System
2 thinking. Are those two related or are they two separate drawbacks?
I think they're fairly separate, but they can be somewhat related. You can learn different ways
of thinking through problems and actually learn about this rapidly using your episodic memory.
All these different systems and subsystems interact so they're never completely separate.
But I think conceptually you can probably think of them as quite separate things.
I think delusions and factuality is another area that's going to be quite important and
particularly important in lots of applications. If you want a model that writes creative poetry,
then that's fine because you want to be able to be very free to suggest all kinds of possibilities
and so on. You're not really constrained by a specific reality. Whereas if you want something
that's in a particular application, normally you have to be quite concrete about what's
currently going on and what is true and what is not true and so on. And models are a little
bit sort of freewheeling when it comes to truth and creativity at the moment. And that I think
limits their applications in many ways. The final question is this. You've been
in this field for over a decade, much longer than many others, and you've seen different
landmarks like ImageNet and Transformers. What do you think the next landmark will look like?
I think the next landmark that people will think back to and remember is going much
more fully multimodal. That will open out the sort of understanding that you see in language
models into a much larger space of possibilities. And when people think back, they'll think about,
“Oh, those old fashioned models, they just did like chat, they just did text.” It just felt like
a very narrow thing whereas now they understand when you talk to them and they understand images
and pictures and video and you can show them things or things like that. And they will have
much more understanding of what's going on. And it'll feel like the system's kind of opened up
into the world in a much more powerful way. Do you mind if I ask a follow-up on that?
ChatGPT just released their multimodal feature and you, in DeepMind, you had the Gato paper,
where you have this one model where you can throw images, video games and even actions in there. So
far it doesn't seem to have percolated as much as ChatGPT initially from GPT3 or something.
What explains that? Is it just that people haven't learned to use
multimodality? They're not powerful enough yet? I think it's early days. I think you will see
understanding images and things more and more. But I think it's early days in this
transition is when you start really digesting a lot of video and other things like that,
that the systems will start having a much more grounded understanding of the world and all kinds
of other aspects. And then when that works well, that will open up naturally lots and lots of new
applications and all sorts of new possibilities because you're not confined to text chat anymore.
New avenues of training data as well, right? Yeah, new training data and all kinds of different
applications that aren't just purely textual anymore. And what are those applications? Well,
probably a lot of them we can't even imagine at the moment because there are just so many
possibilities once you can start dealing with all sorts of different modalities in a consistent way.
Awesome. I think that's an actionable place to leave it off. Thank you so much for coming on
the podcast Shane. Thank you.
Browse More Related Video
2 Ex-AI CEOs Debate the Future of AI w/ Emad Mostaque & Nat Friedman | EP #98
No Priors Ep. 39 | With OpenAI Co-Founder & Chief Scientist Ilya Sutskever
Demis Hassabis - Scaling, Superhuman AIs, AlphaZero atop LLMs, Rogue Nations Threat
6. Layers of Mental Activities
7. Layered Knowledge Representations
3. Cognitive Architectures
5.0 / 5 (0 votes)