Ilya Sutskever | AI will be omnipotent in the future | Everything is impossible becomes possible
Summary
TLDR在这段视频脚本中,讨论了深度学习的发展和人工通用智能(AGI)的未来。演讲者表达了对大型神经网络能够展现出惊人行为的信念,并讨论了实现AGI所需的两个关键信念:人类大脑的复杂性和人工神经元与生物神经元的相似性。同时,也探讨了Transformer模型的潜力、算法选择的重要性以及模型规模增长对性能的影响。最后,讨论了AI安全性问题,包括超级智能带来的挑战和如何确保AI技术的健康发展。
Takeaways
- 😀 深度学习的信念:信念来自于大脑的复杂性和人工神经元与生物神经元的相似性。
- 😀 AGI 的定义:一个能够自动化大多数智力劳动的计算机系统,可以被视为一个计算机同事。
- 😀 当前模型的局限:我们现有的架构还可以进一步改进,尽管现有架构已能实现显著进步。
- 😀 对 Transformer 的看法:尽管 Transformer 很强大,但仍有改进空间,可以期待更高效的架构。
- 😀 扩展规律的理解:对扩展规律的理解尚不完善,但在一些特定任务上有不错的预测能力。
- 😀 意料之外的能力:神经网络在编码能力上的快速提升让人印象深刻。
- 😀 神经网络工作的惊讶:早期神经网络的表现有限,现在它们的进步超出了预期。
- 😀 AI 安全的三大担忧:对齐问题、控制问题、自然选择问题。
- 😀 超智能的潜力与风险:超智能可能带来难以想象的力量和好处,但也伴随巨大的风险,需要谨慎处理。
- 😀 人工智能的未来:AI 未来可能会帮助解决其自身带来的挑战,创造出令人难以置信的生活。
Q & A
为什么早期就坚信深度学习模型的规模扩大会带来意想不到的行为?
-需要两个信念:一是人脑很大,不同生物的脑大小与能力成正比;二是人工神经元与生物神经元在信息处理上可能足够相似,即使生物神经元更复杂。
AGI的定义是什么?
-AGI是能够自动化绝大多数智力劳动的计算机系统,可以视为与人类智能相当的电脑同事。
目前是否拥有实现AGI的所有要素?
-目前的技术栈已经很复杂,Transformers等模型已经相当强大,但仍有提升空间,可能需要更高效的模型或训练方法。
不同算法如LSTM和Transformer在规模扩大后是否会有相同的效果?
-几乎可以肯定,如果对LSTM进行适当修改并扩大其隐藏状态,即使效率可能有所不同,也能实现类似的功能。
我们对模型规模扩大后能力的理解有多好?
-我们对规模法则有一定的理解,但预测模型的特定能力,尤其是新兴行为,仍然是一个挑战。
在模型规模扩大过程中,哪些能力的出现最令人惊讶?
-最令人惊讶的是神经网络的工作原理得到了验证,以及它们在编程等任务上的能力迅速提高。
如何看待AI安全问题?
-随着AI能力的提升,安全问题变得尤为重要。需要解决对齐问题、人类利益问题以及自然选择的挑战。
超级智能与AGI有何不同?
-超级智能意味着比AGI更强大的能力,可能远远超出人类的想象,需要特别的安全措施来控制其巨大的力量。
如何确保超级智能的对齐问题得到解决?
-可能需要国际组织制定高标准,确保超级智能的发展方向与人类利益一致。
人类如何利用超级智能解决它自身带来的挑战?
-希望超级智能能理解现实更深刻,帮助我们解决由其强大能力带来的问题。
自然选择的挑战在超级智能时代意味着什么?
-即使解决了对齐问题和人类利益问题,自然选择仍然会导致变化,可能需要如人机合一等新解决方案来适应。
Outlines
🤖 深度学习的信念与AGI定义
在第一段中,讨论者表达了对深度学习的信念,特别是对大型神经网络能够展现出意想不到行为的坚信。他提出,要相信这一点,需要两个信念:一是大脑的大小与能力成正比,二是人工神经元与生物神经元在信息处理上可能足够相似。此外,讨论者还分享了他对人工通用智能(AGI)的定义,即能够自动化大多数智力劳动的计算机系统,以及他对AGI的直观理解——一个像人一样聪明的计算机同事。
🔧 深度学习架构的探讨与扩展性
第二段中,讨论者对Transformers架构的必要性进行了讨论,他认为Transformers已经足够好,但可能还有更高效或更快的改进空间。讨论者还提到了LSTM与Transformers的比较,指出如果对LSTM进行适当的修改和扩展,它们也能实现类似的性能,尽管可能不如Transformers。此外,讨论者还探讨了对模型扩展性的理解,包括对现有模型性能预测的信心程度。
📈 神经网络的扩展法则与挑战
在第三段中,讨论者深入讨论了神经网络的扩展法则,指出虽然这些法则能告诉我们输入与简单性能指标之间的关系,但它们在预测模型的高级能力时存在挑战。讨论者提到了OpenAI在GPT-4开发过程中对编码问题解决准确性的扩展法则研究,这比传统的下一个词预测准确性更为相关和有价值。
😲 神经网络的惊人能力与AI安全
第四段中,讨论者表达了对神经网络工作的惊讶,尤其是它们在编程能力上的快速进步。他提到了程序合成领域的历史和深度学习如何在短时间内实现了该领域长期未能达成的目标。此外,讨论者还转向了AI安全的话题,强调了随着AI能力的提升,确保其安全使用的重要性,并提到了OpenAI最近发布的关于超级智能的文档和Sam Altman在国会前的证词。
🌐 超级智能的挑战与未来展望
最后一段中,讨论者讨论了超级智能带来的挑战,包括对齐问题、人类利益的冲突以及自然选择的影响。他强调了超级智能的潜力,以及如果我们能够克服这些挑战,将能够创造出难以想象的美好生活。讨论者还提到了可能的解决方案,如国际组织制定标准和超级智能本身帮助我们解决它所创造的挑战。
Mindmap
Keywords
💡深度学习
💡人工神经网络
💡AGI(通用人工智能)
💡Transformers
💡LSTM(长短期记忆网络)
💡扩展性法则
💡AI 安全
💡超级智能
💡编码能力
💡神经网络的工作原理
💡自然选择
Highlights
深度学习的最大主义者从早期就开始推动大型模型的发展,以期待发现意想不到的有趣行为。
对大型神经网络的信念基于两个观点:人脑的规模大,以及人工神经元与生物神经元在信息处理上可能相似。
AGI被定义为能够自动化绝大多数智力劳动的计算机系统,即与人类智能相当的电脑。
Transformers模型虽然有效,但并不意味着它们是达到AGI的唯一途径,未来可能有更好的模型。
LSTM和Transformers在理论上可以互换,但Transformers在实践中显示出更好的可扩展性。
尽管我们对模型的扩展性有一定了解,但预测模型的特定行为仍然是一个挑战。
神经网络的实用性和有效性是一个惊喜,因为早期它们并不被看好。
编码能力的快速提升,特别是程序合成领域,是深度学习带来的一个显著进步。
AI安全是随着AI能力增强而日益重要的问题,特别是当AI变得极其强大时。
超级智能的概念指的是远超人类智能的AI能力,它可能带来巨大的变革和挑战。
对齐问题(alignment problem)是超级智能安全中的关键挑战,需要确保AI的目标与人类的目标一致。
国际组织可能在制定超级智能的全球标准和规则中发挥关键作用。
人类控制超级智能可能带来的风险,需要通过智能本身来帮助解决。
自然选择的挑战,即随着时间的推移,AI和人类社会可能需要适应不断变化的环境。
克服这些挑战可以使我们创造出难以想象的美好生活。
AI的发展和应用需要在创新和安全之间找到平衡。
Transcripts
all right welcome um there were lots of
people who had lots of interesting
questions so I gave myself some note
cards so I'll I'll I'll be prepared but
um uh maybe we start with this um you
have always been an deep learning
maximalist um even very very early on
what gave you the conviction to say look
if you just push this to larger and
larger models we're going to to see
really unexpected interesting Behavior
what what gave you the conviction that
early
on
so I
claim that to get this conviction that
to believe that large neural networks
can do amazing
things you need to have two beliefs
one of the belief one of the beliefs is
a little bit harder to get to the other
one is
easier so the easy belief is that the
human brain is Big the human brain is
big and the brain of a cat is smaller
and the brain of an insect is smaller
still and we correspondingly see that
humans can do things which cats cannot
do and so on that's
easy the hard part is to kind of say
well maybe an artificial neuron the
kinds the kind of neurons that we have
in artificial neural
networks is not that different from a
biological neuron as far as the
essential information processing is
concerned so in other words of course
the AR the biological neuron is very
complicated and it does so many
different
things but when it comes down to it you
have signals in Signal out maybe it's a
pretty not maybe you can explain a lot
with a pretty simple artificial neuron
and if you just allow yourself to say
yeah yeah they're different yeah yeah
biological neurons are more complex but
let's just say suppose they are similar
enough then you say yeah okay V now have
an existence proof that large neural
nets all of us can do all these amazing
things so the existence is
there can we then somehow make it
so for that we need to be able to train
but if you that's the kind of chain of
reasoning
which you know in the environment of my
you know when I was in graduate school
with
Jeff I think it
was we were thinking about neural Nets
it was
perhaps more possible more feasible to
make this realization than it would have
been elsewhere
yeah certainly we tried Neal Nets
before and we didn't quite get to the
same results because we're doing at a
much smaller scale and so on
um interesting where um let's start with
so what's your definition of AGI how
what's your mental
picture yeah
so
AGI so at open AI we have
a document which we call the open a
charter which outlines the goal of open
Ai and there we offer a definition of
AGI and we say that an AGI is a computer
system which can automate the great
majority of intellectual
labor that's one useful definition m in
some sense an AGI would
be the intuition there is it's a
computer that's as smart as a person so
you might for example have a coworker MH
that's a
computer so that would be a def a
definition of AGI which I think is
intuitively satisfying the term is a bit
ambiguous because AGI the g means
general so is it generality that we want
that we care about in the AGI but it's
actually a bit more than generality we
care about generality and competence
needs to be General in a sense that it
can respond sensibly when you throw
things at it but it needs to be comp
competent so that you when it does
something you ask it a question or ask
it to do something it will do it yeah I
like the sort of very practical
definition at the end of the day because
it gives you some measurement where you
can can figure out how close are you do
do you think we have all the ingredients
to to get to AGI um if not what's
missing kind of in the stack it's a
complicated stack
already um I trans per forers really all
we need kind of paying homage to the
famous U attention paper
yeah you know I won't be overly specific
in my answer to this question but I will
say that I think
that no I'll comment on the second part
of the question is is is Transformers is
all we
need and I think that the question is a
bit wrong because it implies something
binary it implies Transformers are are
either good enough or not good
enough
but I think it's better to think about
it in terms of tax where we have
Transformers and they're pretty good mhm
maybe we could have something better
that would be maybe more efficient or
maybe you'll be
faster but we as we know when you make
the Transformers large they still become
better they might just become big might
be becoming better more
slowly so while I am totally
sure that it will be possible to improve
very significantly on the on the current
architectures that we have even if we
didn't we would be able to go extremely
far do you think it
matters what the algorithm is so so for
example an lstm versus a
Transformer just scaled up sufficiently
maybe that's an efficiency Delta or
something like that but don't we end up
in the same same place at the end
so I would say almost entirely yes with
a caveat so there are two
caveats Lis so I'm just thinking of how
what level of detail to go here you know
maybe I will I will I will skip the
detail how many people in the audience
know what an lstm is Oh see it's a
around here so I think we're mostly okay
let's dig let's let's dig in then
so I would argue that with a few if we
made a few simple modifications to the
lstm their hidden states are quite small
if you somehow made it larger and then
we were to go through the trouble of
figuring out how to train them
cuz lstms are recurrent neural network
works and we kind of forgot about them
we haven't put in the effort to cuz you
know how neural training works you have
the hyper parameters well how do you set
them it's like you don't
know how do you set your learning rates
if it doesn't learn can you explain why
and so this kind of work has not been
done for lstms so that's why our ability
to train them is more reduced but had we
done that work so that we were able to
train the lstms and we just did some
simple things to increase their hidden
State size I think they would be worse
than
Transformers but we would still be able
to go extremely far with them also okay
um how good is our understanding of
scaling laws like if we if we scale
these models up how confident are you in
being able to predict capabilities of
these particular models how good is that
science so that's a very good question
the answer is so
so I was hoping for a more definitive
answer well for it so so is a very
definitive answer it means we are not
great but we are not absolutely terrible
either but we are not great definitely
not great so what the scaling LW tells
you it uh relates it's a relationship
between the inputs that you put into the
neural network and some kind of a simple
to
measure performance simple to evaluate
performance measure like you your next
word prediction accuracy
M and that relationship is very strong
but what is challenging is that we don't
really care about next word prediction
we care about it indirectly we care
about the other incidental benefits that
we get out of
it and our and so our so for example you
all know that if you predict the next
word accurately enough you get all kinds
of interest in emerging properties
those have been quite hard to predict or
at least I'll say I'm not aware of such
work and if anyone is looking for
interesting research work pro problems
to work on that would be one I will say
I will mention one example something
that we've done at open AI in our in in
our runup to GPT 4 where we tried to do
a scaling law for a more interesting
task which is predicting accuracy at
solving coding problems
we were able to do that accurately very
accurately and that's a pretty good
thing because this is a more tangible
metric it's not it's still it's it's an
improvement over next step next word
prediction accuracy as far as things
that are relevant to us so in other
words it's more relevant to us to know
what the coding accuracy is going to be
ability to solve coding problems
compared to just ability to predict and
Export it still doesn't answer the
really important question of can you
predict some emergent behavior that you
haven't seen
before okay
um speaking of these capabilities that
are kind of emerging capabilities which
one surprised you the most as these
models scaled what what was the thing
where you said like well I'm kind of
astonished these models can do
this it's a very difficult question to
answer
because it's too easy to get used to
where things
are so there definitely have been times
when I was surprised but you adapt so
fast it's kind of
crazy I think maybe the big surprise for
me
is you know it may it may sound a little
odd probably to most people in this
audience but the big surprise for me is
that neural networks work at
all because when I was starting my work
in this area they didn't work or it was
like let's define what it means to work
at all it means they could do they could
work a little bit but not really not in
any serious way not in a way that anyone
except for the most intense enthusiasts
would care
about and so now we see yeah like those
neural Nets work so I guess the
artificial neuron really is
at least somewhat related to the
biological neuron or at least that basic
assumption has been validated to some
degree what about like an emergent
property was the one that sticks out to
to you like for example I don't know
code generation or did you may maybe it
was different in your mind maybe you you
just once you saw like hey neural Nets
can work and they can scale yeah of
course all these sort of properties will
emerge because you know at at the limit
point we're building a human brain and
humans know how to code and humans know
how to reason about tasks and so on um
was that did you just expect all of that
or did uh I've definitely been surprised
and I'll mention why because the human
brain can do those things it's true but
does it follow that our training process
will produce something similar so so it
was definitely very amazing I
think yeah seeing seeing
the coding ability improved quickly that
was
quite quite a sight to be seen and for
coding in particular because you know it
went from no one has ever seen a
computer code anything at all ever there
was a little area of computer science
called program synthesis mhm which
maybe it was very Niche and it was very
Niche because they couldn't have any
accomplishments it was a very they had a
very difficult experience and then this
neural came in and said oh yeah code
synthesis like we're going to do we're
going to accomplish what you hope were
hoping to achieve one day like
tomorrow so that was
yeah deep
learning just just out of curiosity when
you write code how much of your code is
yours how much of your code is I mean
like collaboration but I
I I do eny en jooy I do enjoy it when
the neural net writes most of
it all right let's let's switch TCT here
a little bit um as these models get more
and more
powerful um it's worthwhile to to also
talk about AI safety and uh uh and open
AI has has released the document just uh
just recently that where you're one of
the unders signers um uh Sam has
testified in front of
Congress what what worries you most
about AI
safety yeah I can talk about
that
so let's take a step back and talk about
the state of the world so you know
you've had the AI research happening and
it was exciting and now you have the GPT
models and now you all get to play with
all the different chat bot and
assistance and you know B and chat GPT
and you say okay that's pretty cool it
can do
things and indeed there already
are you can start perhaps worrying about
the implications of the tools that we
have today and I think that it is a very
valid thing to do but that's not where
I allocate my concern
M the place where things get really
tricky is when you imagine fast forward
in some number of years a decade let's
say how powerful will a I be of course
with this incredible future power of AI
which I think will be difficult to
imagine frankly with an AI this powerful
you could do incredible amazing
things that are perhaps even outside of
our
dreams like if you can really have a
dramatically powerful
AI but the place where things get
challenging are directly connected to
the power of the AI it is powerful it is
going to be extremely unbelievable
unbelievably powerful and it is because
of this
power that's where the safety issues
come up and I'll mention
three I I personally see
three you know when when you get so you
you alluded to the letter M that uh we
posted at open AI a few days ago
actually
yesterday about what we about some ideas
that we
think would be good to implement to
navigate the challenges of super
intelligence now what is super
intelligence why did we choose CH to use
the term super
intelligence the reason is that super
intelligence is meant to convey
something that's not just like an AGI
with AGI we said well you have something
kind of like a person kind of like a
coworker super intelligence is meant to
convey something far more capable than
that when you have such a capability
it's like can we even imagine how it
will be but without question it's going
to be unbelievably
powerful it could be used to solve
incomprehensible hard problems if it is
used well if we navigate the challenges
that super intelligence POS poses we
could we
could radically improve the quality of
life but the power of super intelligence
is so vast so the concerns the concern
number one has been expressed a lot and
this is the scientific problem of
alignment you might want to think of it
from the as as an analog to nuclear
safety you know build a nuclear reactor
you want to get the energy you need to
make sure that it won't melt down even
if there's an earthquake and even if
someone tries to I don't know smash a
truck into it y so this is the super
intelligence safety and it must be
address in order to contain the vast
power of the super intelligence this
called the alignment problem one of the
suggestions that we had in our in the
PST was an approach that an
international organization could do
to create various standards at this very
high level of capability and I want to
make this other point you know about the
post and also about um R CEO Sam Alman
Congressional testimony where he
advocated for regulation of AI the
intention is primarily to put rules and
standards of various kinds on the very
high level of
capability you know you could maybe
start looking at gp4 but that's not
really what is interesting what is
relevant here but something which is
vastly more powerful than that when you
have a technology so powerful it becomes
obvious that you need to do something
about this
power that's the first concern the first
challenge to overcome the Second
Challenge to overcome is that of course
we are people we are humans humans of
interests and if you have super
intelligence is controlled by people
well who knows what's going to happen I
do hope that at this point we will have
the super intelligence itself try to
help us solve the challenge in world
that it creates this is not no longer an
unreasonable thing to say like if you
imagine a super intelligence that indeed
sees things more deeply than we do much
more
deeply to understand reality better than
us we could use it to help us solve the
challenges that it creates then there is
the third challenge which
is the challenge maybe of natural
selection you know what the Buddhists
say that change is the only constant so
even if you do have your super
intelligences in the world and they are
all we managed to solve alignment we
managed to solve no one wants to use
them in very destructive ways we managed
to create a life of unbelievable
abundance which really like not just not
just material abundance but Health
longevity like all the things we don't
even try dreaming about because they're
so obviously impossible if you've got to
this point then there is the third
challenge of natural selection
things
change you know you know that natural
selection applies to ideas to
organizations and that's a challenge as
well maybe the neuralink solution of
people becoming part AI will be one way
we will choose to address this I don't
know but I would say that this kind of
describes my concern and specifically
just as the concerns are big if you
manage man it is so worthwhile to
overcome them because then we could
create truly unbelievable lies
lives for ourselves that are completely
even
unimaginable so it is it is like a
challenge that's really really worth
overcoming
Browse More Related Video
![](https://i.ytimg.com/vi/sitHS6UDMJc/hq720.jpg)
Possible End of Humanity from AI? Geoffrey Hinton at MIT Technology Review's EmTech Digital
![](https://i.ytimg.com/vi/3CvaAr72BnQ/hq720.jpg?sqp=-oaymwEmCIAKENAF8quKqQMa8AEB-AH-CYAC0AWKAgwIABABGBEgZShyMA8=&rs=AOn4CLAKsyfsBn9IEi_eLwhQTCCiFZcq4g)
《與楊立昆的對話:人工智能是生命線還是地雷?》- World Governments Summit
![](https://i.ytimg.com/vi/NI9ziIAjO6E/hq720.jpg)
Ilya Sutskever | The birth of AGI will subvert everything |AI can help humans but also cause trouble
![](https://i.ytimg.com/vi/aPtDDPT1gZQ/hq720.jpg)
2 Ex-AI CEOs Debate the Future of AI w/ Emad Mostaque & Nat Friedman | EP #98
![](https://i.ytimg.com/vi/Ft0gTO2K85A/hqdefault.jpg?sqp=-oaymwEXCJADEOABSFryq4qpAwkIARUAAIhCGAE=&rs=AOn4CLCllBdTYbd9FzgOJbJu27Xg4yVgkg)
No Priors Ep. 39 | With OpenAI Co-Founder & Chief Scientist Ilya Sutskever
![](https://i.ytimg.com/vi/qJZ_1a-t_sA/hq720.jpg)
6. Layers of Mental Activities
5.0 / 5 (0 votes)