No Priors Ep. 39 | With OpenAI Co-Founder & Chief Scientist Ilya Sutskever
Summary
TLDR在这段深入的对话中,OpenAI的联合创始人和首席科学家Ilas Suk分享了他对AI研究领域的见解,从深度学习的早期挑战到实现人工通用智能(AGI)的愿景。他回顾了神经网络的边缘化历史和AlexNet之前的研究环境,以及OpenAI是如何通过大规模的模型和计算力投资,推动AI技术的边界。Ilas讨论了OpenAI的成立初衷,从一个非营利组织到采用独特的盈利上限公司结构的转变,以及这一战略如何使他们能够追求更大规模的AI项目。他还探讨了模型可靠性、开源AI的复杂性和超级对齐的重要性,暗示了人类与高级AI关系的未来。整个对话展示了AI领域的快速进展和未来方向的深刻洞察。
Takeaways
- 🚀 OpenAI的创立源于对深度学习和人工智能的深刻洞察,以及对神经网络潜力的早期认识。
- 🌌 早期AI领域并不被看好,但OpenAI的创始人们相信神经网络像小型大脑,具有巨大潜力。
- 💡 使用GPU进行机器学习是OpenAI成功的一个关键因素,尽管当时对GPU的具体应用并不十分明确。
- 🧠 神经网络的规模是其性能的关键,更大的神经网络能够实现前所未有的功能。
- 📈 OpenAI的目标始终是确保人工通用智能(AGI)的发展能够惠及全人类。
- 🌐 OpenAI最初采用开源技术,后来转变为非营利组织,最终成为独特的“Capped-profit”公司结构。
- 🔄 从Dota 2项目到Transformer模型,OpenAI的研究方向不断演进,以适应技术发展和市场需求。
- 📚 早期的机器学习工作逐渐转向大型项目,如GPT系列,这些项目展示了显著的进步。
- 🤖 随着模型变得更加可靠,它们将在更多任务中变得更加有用,提供更深入的洞察力。
- 🔢 模型规模的增加不仅提高了性能,还可能解锁新的、前所未有的有价值应用。
- 🔄 尽管小型模型在特定应用中可能足够,但大型模型将在更广泛的应用中提供更好的性能。
- 🌐 开源模型在短期内有助于公司开发有用的产品,但长期来看,随着AI能力的提升,开源模型的角色可能变得更加复杂。
Q & A
OpenAI的创立初衷是什么?
-OpenAI的创立初衷是确保人工智能的发展能够惠及全人类。从一开始,OpenAI的目标就是推动自主系统,即能够完成人类大部分工作、活动和任务的人工智能,确保其对人类整体有益。
OpenAI在技术策略上有哪些演变?
-OpenAI的技术策略从开源技术转向了非营利组织,最终成为了CAP公司,这是为了解决大量计算资源的需求。他们意识到,为了在人工智能领域取得实质性进展,需要大量的计算资源,而非营利组织无法满足这一需求。
为什么OpenAI选择转型为CAP公司?
-OpenAI选择转型为CAP公司是因为他们认识到,为了实现人工智能的实质性进展,需要大量的计算资源,而非营利组织无法提供足够的资源。此外,他们考虑到如果人工智能技术可能导致大量人失业,那么构建这种技术的公司不应该能够获得无限的利润。
Ilias Suk对神经网络的早期直觉是什么?
-Ilias Suk早期对神经网络的直觉是,神经网络就像小型的大脑,尽管当时无法证明数学定理,但他认为这些小型的“大脑”可能有一天能够做到一些惊人的事情。他相信通过训练更大的神经网络,可以实现前所未有的成果。
OpenAI如何克服早期人工智能研究的挑战?
-OpenAI通过认识到需要更大的神经网络、拥有足够的数据集来约束大型神经网络,以及掌握训练这些网络的技术知识来克服早期的挑战。他们还意识到,GPU的使用对于神经网络的训练非常有利。
OpenAI在研究上的主要方向是什么?
-OpenAI的主要研究方向是不断扩大和改进Transformer模型,尤其是训练更大、更复杂的神经网络来处理文本预测和生成任务。他们通过不断增加计算和数据的投入,使得模型的性能得到显著提升。
Ilias Suk认为当前人工智能模型的最大瓶颈是什么?
-Ilias Suk认为当前人工智能模型的最大瓶颈是可靠性。他指出,当模型在处理难度较高的问题时,用户需要对模型的回答有高度的信心。
OpenAI如何看待开源模型的角色?
-OpenAI认为开源模型在短期内对于公司来说是有益的,因为它允许公司自主决定模型的使用方式和应用场景。但在长期来看,随着人工智能模型的能力越来越强,开源模型的角色将变得更加复杂,需要更多的考虑。
Ilias Suk如何看待Transformer架构的未来?
-Ilias Suk认为Transformer架构是目前非常有效的,并且通过不断的改进和扩展,可以继续取得进步。他提到,虽然人脑看起来有专门化的区域,但实验表明大脑的可塑性非常高,这支持了单一、统一的神经网络架构的理念。
Ilias Suk如何看待超级智能和超级对齐?
-Ilias Suk认为超级智能是指那些比人类更聪明的数据中心,它们不仅能做所有人类能做的事情,而且能更快地学习。他强调,我们需要确保这些超级智能对人类持有积极、温暖的感觉,这就是超级对齐项目的目标,即在超级智能出现之前,就开始着手研究如何控制它们,确保它们对人类友好。
OpenAI如何看待人工智能的未来发展?
-OpenAI认为人工智能目前正处于加速发展的阶段,但未来可能会有多种力量影响其发展速度,包括成本、数据规模、工程复杂性、投资额、工程师和科学家的兴趣等。他们认为,尽管可能会有一些减速的因素,但人工智能的发展仍将继续。
Outlines
🚀 创业初期与AI的黑暗时代
本段讲述了OpenAI的创立初期,当时深度学习还没有什么成果,整个AI领域处于所谓的“黑暗时代”。在这样的环境下,创始人之一的Ilya Sutskever决定投身于神经网络的研究,尽管当时这个领域并不被看好。他谈到了GPU在机器学习中的应用,以及如何通过扩大神经网络的规模来实现前所未有的成果。
🌟 从非营利到有限利润公司
Ilya Sutskever解释了OpenAI从非营利组织转变为CAP(有限利润)公司的原因。他提到,为了实现真正的AI进步,需要大量的计算资源,而非营利组织无法满足这一需求。因此,OpenAI采用了一种独特的结构,即使投资者投入资金,但如果公司表现非常好,他们也只能获得一定倍数的回报,而不是无限制的利润。这样的结构旨在确保AI技术的发展能够惠及全人类。
🧠 研究议程的演变与Transformer模型
Ilya Sutskever讨论了OpenAI的研究议程是如何随着时间的推移而演变的。最初,他们从事的是更传统的机器学习工作,但很快意识到,为了实现AGI(人工通用智能),需要转向更大规模的项目。他们开始探索生成模型,并最终发现了Transformer模型的巨大潜力。从GPT-1到GPT-3,模型的能力有了显著的提升,这标志着OpenAI在大型神经网络研究上取得了重要进展。
🤖 AI的可靠性与模型规模
Ilya Sutskever强调了AI模型的可靠性对于其实用性的重要性。他提到,随着模型规模的增大,其可靠性也会提高,但同时也会带来成本的增加。他讨论了在模型规模和特定用例之间的权衡,以及如何通过微调来提高模型在特定任务上的表现。他还提到,随着模型的不断进步,它们将变得更加可靠,能够处理更广泛的任务。
🌐 开源在AI生态系统中的角色
Ilya Sutskever探讨了开源在AI生态系统中的作用。他认为,短期内,开源模型将帮助公司开发有用的产品。然而,随着AI模型的能力越来越强,开源模型可能带来不可预测的后果。他提出了一个未来研究项目,即确定何时开源模型的能力足够强大,以至于需要考虑其潜在的影响。
🧬 人工智能与生物智能的相似性
Ilya Sutskever讨论了人工智能、生物智能和人类智能之间的相似性。他提到,就像生物系统倾向于使用统一的架构一样,人工智能似乎也在朝着统一的神经网络架构发展。他还提到了人类大脑的某些实验,这些实验表明大脑的某些区域可以重新配置以处理不同的任务,这进一步支持了单一、统一的架构对于AI的适用性。
🌟 超智能与超级对齐
Ilya Sutskever讨论了超级智能的概念以及为什么我们现在就应该投资于超级对齐的研究。他认为,随着AI的不断进步,我们可能会在未来五到十年内拥有比人类更聪明的数据中心。他强调,我们希望这些智能体对人类有积极的情感,因此超级对齐项目的目标是确保未来的超级智能对社会和人类友好。
🚀 AI的未来加速与挑战
Ilya Sutskever讨论了AI未来发展的可能趋势,包括加速和挑战。他提到,尽管有一些减速因素,如数据的有限性和工程复杂性的增加,但加速因素,如投资的增加、工程师和科学家的兴趣,以及AI领域的易进入性,可能会推动AI继续快速发展。他强调,尽管未来的发展路径尚不明确,但AI的进步可能会继续加速,至少在短期内如此。
🎙️ 节目结束与感谢
节目主持人感谢Ilya Sutskever的参与,并邀请听众在Twitter上关注节目,订阅YouTube频道,以及在Apple Podcast、Spotify等平台收听节目。同时,听众可以通过电子邮件订阅或在no-pri.com网站上查找节目的每集文字稿。
Mindmap
Keywords
💡人工智能
💡深度学习
💡神经网络
💡通用人工智能
💡Transformer模型
💡计算资源
💡数据集
💡自主性
💡开源
💡超级对齐
Highlights
In the AI 'Dark Ages' before deep learning's success, various approaches competed without hope until the neural network approach, considered marginal for its lack of provable theorems, began to show promise due to its brain-like processing.
The breakthrough with AlexNet was attributed to three factors: the advent of GPUs in machine learning, the realization that larger neural networks could perform unprecedented tasks, and the technical insight to effectively train these larger networks.
OpenAI's shift from a nonprofit to a 'capped profit' model was driven by the realization that significant compute resources were essential for AI progress, which a nonprofit model could not support.
The goal of OpenAI, to ensure AI benefits all of humanity, has remained constant, although the tactics to achieve this goal have evolved over time.
The transition from GPT-2 to GPT-3 was a significant leap that demonstrated the potential of scaling up Transformer-based models, leading to emergent capabilities.
Improvements in AI models have made them more reliable and insightful, with the capability to understand and generate human-like text improving significantly over time.
Reliability remains the biggest challenge for AI models, highlighting the need for models to consistently perform well across a wide range of tasks without making significant errors.
The current acceleration in AI progress is driven by various factors, including the scalability of Transformer architectures and the growing investment and interest in AI research.
The future of AI, potentially leading to AGI, involves significant challenges and uncertainties, including the need for super alignment to ensure AI's pro-social behavior towards humanity.
OpenAI's exploration into super alignment is motivated by the anticipation of AI systems becoming highly autonomous and capable, raising the need to ensure they are aligned with human values.
The potential for AI models to autonomously execute complex tasks and projects raises ethical and societal questions about their role and impact on the future.
OpenAI's research agenda has evolved from focusing on conventional machine learning projects to pioneering large-scale, transformative projects like GPT, highlighting the organization's strategic pivot towards scaling up AI models.
The debate around open-source AI models and their role in the AI ecosystem reflects the evolving nature of AI development, with considerations around autonomy, capability, and societal impact becoming increasingly important.
The interview emphasizes the importance of envisioning the future of AI, advocating for a proactive approach in research and policy to navigate the challenges and opportunities of increasingly capable AI systems.
The conversation highlights a cautious optimism towards the progress of AI, recognizing both the technological potential and the complex ethical considerations that accompany the development of advanced AI systems.
Transcripts
[Music]
open aai a company that we all know now
but only a year ago was 100 people is
changing the world their research is
leading the charge to AGI since Chachi
captured consumer attention last
November they show no signs of slowing
down this week elad and I sit down with
ilas Suk co-founder and chief scientist
at open aai to discuss the state of AI
research where will hit limit the future
of AGI and what it's going to take to
reach super alignment IO welcome to no
priors thank you it's good to be here
let's start with the beginning pre Alex
net nothing in deep learning was really
working and then given that environment
you guys took a um a very unique bet
what motivated you to go in this
direction indeed in those Dark Ages AI
was not an area where people
had hope and people were not accustomed
to any kind of success at all and
because there wasn't there hasn't been
any success there was a lot of debate
and there were different schools of
thoughts that had different arguments
about how machine learning in AI should
be and you had people who were into
knowledge representation from the good
old fashioned you had people who were
beian and they liked beian
non-parametric methods you had people
who like graphical models and you had
the people who like neural networks
those people were
marginalized because neural networks did
not had the property that you can't
prove math theorems about
them if you can't prove theorems about
something it means that your research
isn't good that's how it has been but
the reason why I gravitated to neural
networks from the beginning is because
it felt like those are small little
brains and who cares if he can prove any
theorems about them because we are
training small little brains and maybe
they'll become maybe they'll do
something one day and the reason that we
were able to do Alex NBD is because a
combination of two factors three factors
the first factor is that this was
shortly after gpus started to be used in
machine learning people kind of had an
intuition that that's a good thing to do
but it wasn't like today where people
exactly knew what the npus is for it was
like oh let's like play with those cool
fast computers and see what we can do
with them it was an especially good fit
for neural networks so that was a very
that definitely helped us I was very
fortunate in that I was able to realize
that the reason neural networks of the
time weren't good is because they were
too small so like if you try to solve a
vision task with a neural network which
has like a thousand neurons what can it
do it can't do anything it doesn't
matter how good your learning is and
everything else but if you have a much
larger neural network you'll do
something unprecedented what what gave
you the intuition to think that that was
the case because I think at the time it
was reasonably um contrarian to think
that despite to your point you know a
lot of the the human brain in some sense
works that way or different you know
biological neural circuits but I'm just
curious like what gave you that
intuition early on to think that this
was a good direction I think yeah
looking at the brain and specifically
the if you like all those things follow
very easily if you allow yourself if you
allow yourself to accept the idea right
now this idea is reasonably well
accepted back then people still talked
about it they haven't really accepted it
or internalize the idea that maybe an
artificial neuron in some sense is not
that different from a biological neuron
so now whatever you imagine animals do
with their brains you could perhaps
assemble some artificial neural network
of similar size maybe if you train it it
will do something similar so there so
that leads to the so that leads you to
start to imagine okay like almost
imagine the computation being done by
the neural network you can almost think
like if you have a high resolution image
and you have
like one neuron for like a large group
of pixels what can the neuron do it's
just just not much it can do if you but
if you have a lot of neurons then they
can actually do something and compute
something so I think it was like our
like it was this was it was
considerations like this plus a
technical realization the technical
realization is that if you have a large
training set that specifies the behavior
of the neur Neal Network and the
training set is large enough such that
it can constrain the large neural
network sufficiently and furthermore if
you have the algorithm to find that
neural network because what we do is
that we turn the training
set into a neural network which
satisfies a training set neural network
training can almost be seen as solving a
neural
equation solving a neural equation where
every data point is is an equation and
every parameter is a
variable and so it was multiple things
the realization that a bigger neural
network could do something unprecedented
the realization that if you have a large
data
set together with the
compute to solve the neural equation
that's what gradient descent comes in
but it's not gradian descent gradient
descent was around for a long time it
was certain technical insights about how
to make it work because back then the
prevailing belief was well you can't
train those neuron Nets anything it's
all hopeless so it wasn't just about the
size it was about even if someone did
think gosh it would be cool to try a big
neural net they didn't have the
technical ability to turn this idea into
reality you needed not only to code the
neural net you need to do a bunch of
things right and only then it will work
and then another fortunate thing is that
the person with whom I work with Alex
kvki he just discovered that he really
loves gpus and he was perhaps one of the
the first person who really
mastered writing really like really
performant code for for the gpus and
that's why we were able to squeeze a lot
of performance out of two gpus and do
something and produce something
unprecedented so to sum up it was
multiple
things the idea that a big neural
network in this case a a vision neural
network a convolutional neural network
with many layers one that's much much
bigger than anything that's ever been
done before could do something very
unprecedented because the brain can see
and the brain is a large neural network
and we can see quickly so our neurons
don't have a lot of time then the
compute needed the technical knowhow
that in fact we can train such neural
networks and it was not at all widely
distributed most people in machine
learning would not have been able to
train such a neural network even if they
wanted to did you guys have any um like
particular goal from a size perspective
or was it just as as uh you know and if
that's biologically inspired or where
that number comes from or just as large
as we can go definitely as large as we
can go because keep in mind I mean we
had a certain amount of compute which we
could usefully consume and then what can
it do maybe if we think about just like
the origin of open Ai and uh the goals
of the organization like what was the
original goal and how's that evolved
over time the goal did not evolve over
time the tactic evolved over time
so the goal of open AI from the very
beginning has been to make sure that
artificial general intelligence by which
we mean autonomous
systems AI that can actually do most of
the jobs and activities and tasks that
people do benefits all of humanity that
was the goal from the beginning the
initial thinking has been that maybe the
best way to do it is by just open
sourcing a lot of
Technology we later and we also
attempted to do it as a nonprofit seemed
very sensible this is the goal nonprofit
is the way to do it what changed some
point at open AI we realized and we were
perhaps among among the earlier the
earliest to realize that to make
progress in AI for real you need a lot
of compute now what does a lot mean the
appetite for compute is truly endless as
as now as as now clearly seen but we
realize that we will need a
lot
and a nonprofit was wouldn't wouldn't be
the way to to to get there wouldn't be
able to build a large cluster with a
nonprofit that's why we became we
converted into this unusual structure
called CAP
profit and to my knowledge we are the
only cap profit company in the world the
idea is that investors put in some money
but even if the company does incredibly
well they don't get more than some
multiplier on top of their original
investment and the reason to do this the
reason why that Mak
sense you know there are arguments one
could make arguments against it as well
but the argument for it is that if you
believe that the technology that we are
building
AGI could
potentially be so capable as to do every
single task that people do does it mean
that it might unemploy
everyone well I don't know but it's not
impossible and if that's the case it
makes sense it will make a lot of sense
if the company that buil such a
technology would not be able to make U
infinite would not be incentivized
rather to make infinite profits I don't
know if it will literally play out this
way because of competition in AI so
there will be M multiple companies and I
think that
will have some unforeseen implications
on the argument which I'm making but
that was the thing
I remember visiting the offices back
when you were I think housed at YC or
something or you know cohabited some
space there and at the time there was uh
a suite of different efforts there was
robotic arms uh that were being
manipulated and then there was um you
know some video game related work which
was really cutting edge um how did you
think about how the research agenda
evolved and what really drove it down
this path of Transformer based models
and other forms of of learning so our
thinking has been evolving over the
years from when we started
openi and the first year we indeed did
some of the more conventional machine
learning work the conventional machine
learning work I mean because the world
has changed so much a lot of things
which
were known to everyone in 2016 or 2017
are completely and utterly forgotten
it's like the Stone Age almost so in
that in that Stone Age the world the
world of machine learning looked very
different it
was dramatically more
academic the goals values and objectives
were much more academic they were about
discovering small bits of knowledge and
sharing them with the other researchers
and getting scientific recognition as a
result and it's a very valid goal and
it's very understandable I've been doing
a for 20 years now more than half of my
time that I spent in AI was in that
framework and so what do you do you
write papers you share your small
discoveries two realizations the first
realization is just at a high level it
doesn't seem like it's the way to go to
for a dramatic impact and why is that
because if you
imagine how an AGI should look like it
has to be some kind of a big engineering
project that's using a lot of
compute right even if you don't know how
to build it what that should look like
you know that this is the ideal you want
to strive towards so you want to somehow
move towards larger projects as opposed
to small projects so while we
attempted a first large project where we
trained the neural network to play a
real real time strategy game as well as
as well as the best humans it's the Dota
2 project and it was it was driven by
two people um yakob botski and Greg
Brockman they they really dropped this
project and make it made it a success
and this was our first attempt at a
large project but it wasn't quite the
right formula for us because that the
neural networks were a little bit too
small it was just an arrow domain just a
game I mean it's cool to play a game
they kept looking and at some point we
realized that hey if you train a large
neural network a very very large
Transformer to predict text better and
better something very surprising will
happen this realization also arrived a
little bit gradually we were exploring
generative models we were exploring
ideas around next word
prediction those are ideas also related
to compression we were exploring them
Transformer came out we got really
excited we like this is this is the
greatest thing we're going to do
Transformers now it's clearly Superior
than anything else before it we started
doing Transformers we did gpt1 gpt1
started to show very interesting signs
of life and that led us to doing gpt2
and then ultimately gpt3 gpt3 really
opened everyone else's eyes as well to
hey this thing has a lot of traction
there is one specific formula right now
that everyone is doing and this formula
is train a larger and larger Transformer
on more and more data I mean for me the
big wake up moment to your point was
gpt2 to gpt3 transition where you saw
such a big step function and
capabilities and then obviously with
four um open AI published some really
interesting uh uh research around some
of the different domains of knowledge or
domains of expertise or Chain of Thought
or other things that the models can
suddenly do in an emergent form what was
the most surprising thing for you in
terms of emergent behavior in these
models over time you know it's very hard
to answer that question it's very hard
to answer because I'm too close and I've
seen it progress every step of the
way so as much as I'd like I find it
very hard to answer that question I
think if I had to pick one I think maybe
the the most surprising thing for me is
the whole thing works at all you know
it's hard it's and I'm not sure I I know
how to convey this what what I have in
mind here because if you see a lot of
neural networks do amazing things well
obviously neural networks is the thing
that works but I have witnessed
personally what it's like to be in a
world for many years where the neural
networks not work at all and then to
contrast that to where we are today just
the fact that they work and they do
these amazing things I think maybe the
most surprising the most surprising if I
had to pick one it would be the fact
that when I speak to it I feel
understood yeah there's a there's a
really good um saying from I'm trying to
remember maybe it's Arthur Clark or one
of the Sci-Fi authors which is
effectively it says advanced technology
is sometimes indistinguishable from
Magic yeah I'm I'm fully in this Camp
yeah yeah it definitely feels like
there's some magical moments with with
uh some of these models now is there a
way that you guys decide internally uh
given all of the different capabilities
you could pursue how to continually
choose the set of big projects you've
sort of described that centralization
and committing to certain research
directions at scale is really important
to open AI success given the breath of
opportunity now like what's the process
for deciding what's worth working on I
mean I think there is some combination
of bottom up and top down where we have
some top down ideas that we believe
should work but we not 100% sure so we
still we need to have good top- down
ideas and there is a lot of bottomup
exploration Guided by those top down
ideas as well and their combination is
what informs us as to what to do
next and uh if you think about those
bottom I mean either Direction top down
or bottom up ideas like clearly we have
this dominant continue to scale
Transformers Direction um do you explore
additional like architectural directions
or is that just not relevant it's
certainly possible that various
improvements can be
found I think I think improvements can
be found in all kinds of places both
small improvements and large
improvements I think the way to think
about it is that while the current thing
that's being
done keeps getting better as you keep on
increasing the amount of compute and
data that you put into it so we have
that property the bigger you make it the
better it
gets it is also the property that
different things get better by different
amount as you keep on improving as you
keep on scaling them up so not only you
want to of course scale up what we doing
we also want to SC keep scaling up the
best thing
possible what is uh a I mean you you
probably don't need to predict because
you can see internally what do you think
is um improving most from a capability
perspective in the current generation of
scale the best way for me to answer this
question would be to point out the to
point to the models that are publicly
available and you can see how they
compare from this year to last year and
the difference is quite significant I'm
not talking about the difference between
not only the difference between let's
you can look at the difference between
gpt3 and GPT 3.5 and then chat GPT chat
GPT 4 chat GPT 4 with vision and you can
just see for yourself it's easy to
forget where things used to be but
certainly the big way in which things
are changing is that these models become
more and more
reliable before they were very they were
only very partly there right now they
are mostly there but there are still
gaps and in the future perhaps these
models will be there even more you could
trust their answers they'll be more
reliable they'll be able to do more
tasks in general across the board and
then another thing that they will do is
that they'll have deeper insight as we
train them they gain more and more
insight into the true nature of the
human world and their Insight will
continue to deepen I I was just going to
ask about how that relates to sort of um
model scale over time because a lot of
people are really stricken by the
capabilities of the very large scale
models and emergent behavior in terms of
understanding of the world and then in
parallel as people incorporate some of
these things into products which is a
very different type of path they often
start worrying about inference costs
going up with the scale of the model and
therefore they're looking for smaller
models that are fine-tuned but then of
course you may lose some of the
capabilities around some of the insights
and ability to to reason and so I was
curious in your thinking in terms of how
all this evolves over the coming years I
would actually point out that the main
thing that's lost when you switch to the
smaller models is reli ability I would
argue that at this point it is
reliability that's the biggest
bottleneck to these models being truly
useful how you defining reliability so
it's like when you ask a question that's
not much harder than other questions
that the model succeeds at then you'll
have very high degree of confidence that
it will continue to succeed so I'll give
you an example let's suppose that I want
to learn about some historical thing and
I can ask what tell me what is the
prevailing opinion about this and about
that and I can keep asking questions and
let's suppose it answered 20 of my
questions correctly I really don't want
the 21st question to have a gross gross
mistake that's what I mean by by
reliability or like let's suppose I
upload some documents some financial
documents suppose they say something I
want you to do some analysis and to make
some conclusion and I want to take
action on this basis on this conclusion
and it's like it's not a super hard task
and the model these models clearly
succeed on this task most of the time
but because they don't succeed all the
time and if it's a consequential
decision I actually can't trust the
model any of those times and I have to
verify the answer somehow so that's how
I Define reliability it's very similar
to the self-driving situation right if
you have a self-driving car and it's
like does things mostly well that's not
good enough situation is not as Extreme
as with a cell driving car but that's
what I mean by reliability my perception
reliability is that a um to your point
it goes up with model scale but also it
goes up in if you tune for specific in
uh use cases or instances or data sets
and so there is that trade-off in terms
of size
versus uh you know specialized
fine-tuning versus reliability so
certainly people who care about some
specific application have every
incentive to get the smallest model
working well
enough I think that's true it's
undeniable I think anyone who cares
about a specific application will want
the smallest model for it that's
self-evident I do think though that as
models continue to get larger and better
then they will unlock new and
unprecedentedly valuable applications so
yeah the small models will have their
Niche for the less interesting
applications which are still very useful
and then the bigger models will be
delivering on applications okay let's
let's pick an example consider the task
of producing good legal advice it's
really valuable if you can really trust
the answer maybe you need a much bigger
model for it but it justifies the cost
there's been a lot of investment this
year uh at the 7B in particular but 7B
13B 34b sizes do you do you think
continued research at those scales is
wasted no of course not I mean I think
that in the kind of Med like medium term
medium term by I time scale anyway there
will be an ecosystem there will be
different uses for different model sizes
there will be plenty of people who are
very excited for whom it's the best 7B
model is good enough they'll be very
happy with it and then there'll be very
plenty of very very exciting and amazing
applications for which it won't be
enough I think that's all I mean I think
the big models will will be better than
the small models but not all
applications will justify the cost of a
of a large model what do you think the
role of Open Source is in this ecosystem
well open source is complicated I'll
describe to you my mental picture I
think that in the near term open source
is just helping companies produce
useful like let's see why would one want
to have an open source to use an open
source model instead of a Clos Source
model that's hosted by some other
company I mean I think it's very valid
to want
to be the final decider
on the exact way in which you want your
model to be used and for you to make the
decision of exactly how you want the
model to be used in which use case you
wish to support and I think there's
going to be a lot of demand for open
source models and I think there will be
quite a few companies that will use them
and I'd imagine that will be the case in
the near term I would say in the long
run I think the situation with open
source models will become more
complicated and I'm not sure what the
right answer is there right now it's a
little bit difficult to imagine so we
need to put our future
hat maybe futurist hat it's not too hard
to get into sci-fi into a Sci-Fi mode
when you remember that we are talking to
computers and they understand us but so
far these computers these models
actually not very competent they can't
do tasks at
all I do think that there will come a
day
where the level of capability of models
will be very high like in the end of the
day intelligence is power right right
now these models their main impact I
would say at least least popular impact
is primarily around entertainment and
like simple question answer so you talk
to a model about this is so cool you
produce some images you had a
conversation maybe you had some question
you could answer it but it's very
different from completing some large and
complicated task
like what about if you had a model which
could autonomously start and build a
large tech
company I think if these models were
open source they would have a difficult
to predict consequence like we are quite
far from these models right now and by
quite far I mean by by it time scale but
still like this is not what you're
talking about but the day will come when
you have models which can do science
autonomously like be deliver on big
science
projects and it becomes more complicated
as to
whether it is desirable that models of
such power should be open
sourced I think the argument there is a
lot less clearcut a lot less
straightforward compared to the current
level models which are very useful
and I think it's fantastic that the
current level models have been built so
like that is maybe maybe I answered a
slightly bigger question rather than
what is the role of Open Source models
like what's the deal with open source
and the deal is up to a certain
capability it's great but not difficult
to imagine model sufficiently powerful
which will be built where it becomes a
lot less obvious to the benefits of
their open source is there signal for
you that we've reached that level or
that we're approaching it like what's
the what's the boundary so I think
figuring out this boundary very well is
an urgent research Pro research project
I think one of the things that help is
that the closed Source models are more
capable than open source models so the
Clos Source models could be studied and
so
on and so you'd have some experience
with the generation of close Source
model and then then you know like oh
these models capabilities it's fine
there's no big deal there then in a in
like couple years the open source models
catch up maybe a day will come when we
going to say w like these close Source
models they're getting a little too a
little too drastic and then some other
approaches needed if we have our you
know future hat on maybe let's like
think about like a several year timeline
um what are the limits you see if any in
the in the near- term in scaling is it
like data token scarcity cost of compute
architectural issues so the most
near-term limit to scaling is obviously
data this is well known and some
research is required to address it
without going into the details I'll just
say that the data limit can be
overcome and progress will continue one
question I've heard people debate a
little bit is the degree to which the
Transformer based models can be applied
to sort of the full set of
areas that you'd need for AGI and if you
look at the human brain for example you
do have reasonably specialized systems
or allal networks be specialized systems
for the visual cortex versus you know um
areas of higher thought areas for
empathy or other sort of aspects of
everything from personality to
processing do you think that the
Transformer architectures are the main
thing that will just keep going and get
us there or do you think we'll need
other architectures over time so I have
to I understand precisely what you're
saying and have two answers to this
question the first is that in my opinion
the best way to think about the question
of Architecture is not in terms of a
binary is it enough but how much effort
how what will be the cost of using this
particular architecture like at this
point I don't think anyone doubts that
the Transformer architecture can do
amazing things but maybe something else
maybe some modification could have have
some computer efficiency benefits so so
better to think about it in terms of
computer efficiency rather than in terms
of can it get there at all I think at
this point the answer is obviously yes
to the question about well what about
the human brain then with its brain
regions I actually
think that the situation there is subtle
and deceptive for the following reasons
so what I believe you alluded to is the
fact that the human brain has known
regions it has like it has a speech
perception region it has a speech
production region it has an image region
it has a face region has like all these
regions and it looks like it's
specialized but you know what's
interesting sometimes there are cases
where very young children have severe
cases of epilepsy at a young age and the
only way they figure out how to treat
such children is by removing half of
their
brain because it happened at such a
young age these children grow grow up to
be pretty functional adults and they
have all the same brain regions but they
are somehow compressed onto one
hemisphere so maybe some you know
information processing efficiency is
lost it's a very traumatic thing to
experience but somehow all these brain
regions rearrange themselves there is
another experiment where that which was
done maybe 30 or 40 years ago on ferrets
so the ferret is a small animal it's a
pretty mean experiment they took the
optic nerve of the feret which comes
from its eye
and attached it to its auditory cortex
so now the inputs from the eye starts to
map to the speech processing area of the
brain and then they recorded different
neurons after it had a few days of
learning to C and they found neurons in
the auditory cortex which were very
similar to the visual cortex or vice
versa it was either they mapped the eye
to the ear to the auditory cortex or the
ear to the visual cortex but something
like this has happened these are fairly
well-known ideas in AI that the cortex
of humans and animals are extremely
uniform and so that further supports the
a like you just need one big uniform
architecture so yeah in general it seems
like every biological system is
reasonably lazy in terms of taking one
system and then reproducing it and then
reusing it in different ways and that's
true of everything from DNA in coding
you know there's 20 amino acids and
protein sequences and so everything is
made out of the same 20 amino acids on
through to uh to your point sort of how
you think about tissue architectures so
it's remarkable that that carries over
into the digital world as well depending
on the you use I mean the way I see it
is that this is an indication that from
a technological point of view we are
very much on the right track because you
have all these interesting analogies
between human intelligence and
biological intelligence and artificial
intelligence we've got artificial
neurons biological neurons
unified brain architecture for
biological intelligence unified neural
network architecture for artificial
intelligence at what point do you think
we should start thinking about these
systems in digital life I can answer
that question I think that will happen
when those systems become reliable in
such a way as to be very autonomous
right now those systems are clearly not
autonomous they're inching there but
they're not and that makes them a lot
less useful too because you can't ask it
hey like do my homework or do my taxes
or you see what I mean so the usefulness
is greatly limited as the usefulness
increases they will indeed become more
like artificial life which is also makes
it more I would argue um
trepidacious right like if you imagine
actual artificial life with brains that
are smarter than humans go gosh that's
like that seems pretty Monumental why is
your uh definition based on autonomy
because you know if you often look at
the definition of biological life it has
to do with reproductive
capability plus I guess some form of
autonomy right like a virus isn't really
necessarily considered alive much of the
time right but a bacteria is and you
could imagine situations where you have
um a symbiotic relation a ships or other
things where something can't really
quite function autonomously but it's
still considered a life form so I'm a
little bit curious about autonomy being
the definition versus some of these
other aspects well I mean definitions
are chosen for our convenience and it's
a matter of debate in my opinion
technology already has the reproduction
the reproductive function right and if
you look at for examp I don't know if
you seen those images of the evolution
of cell phones and then smartphones over
the past 25 years you got this like what
almost looks like an evolutionary tree
or the evolution of cars over the past
Century so technology is already
reproducing using the minds of people
who copy ideas from previous generation
of technology so I claim that the
reproduction is already there the
autonomy piece I claim is not and indeed
I also agree that there is no autonomous
reproduction but that would be like can
you imagine if you have like
autonomously reproducing AIS I actually
think that that is pretty dramatic and I
would say quite a scary thing if you
have an autonomous reproducing AI if
it's is also very capable should we talk
about uh super alignment yeah very much
so can you um just sort of Define it and
then you know we were talking about what
the boundary is for we when we when you
feel we need to begin to worry about uh
these capabilities being in in open
source like what is super alignment and
like why invest in it now the answer to
your question
really depends to where you think AI is
headed you just try to imagine look into
the future which is of course a very
difficult thing to do but let's make
let's let's try to do it anyway where do
we think things will be in five years or
in 10 years mean progress has been
really stunning over the past few years
maybe it will be a little bit slower but
still if you if you extrapolate this
kind of progress you'll be in a very
very different place in 5 years L Al on
10
years it doesn't seem implausible it
doesn't doesn't seem at all implausible
that we will have
computers data centers that are much
smarter than people and by smarter I
don't mean just have more memory or have
more knowledge but I also have mean have
deeper insight into the same subjects
that we people are studying and looking
into it means learn even faster than
people like what could such AIS do I
don't know certainly if such an AI were
the basis of some artificial life it
would be well how do you even think
about it if you have some very powerful
data center that's also alive in a sense
that's what you're talking about and
when I imagine this world I my reaction
is Gush this is very unpredictable
what's going to happen very
unpredictable but the bare minimum but
there is a bare minimum which we can
articulate
that if such super if such very very
intelligent super intelligent data
centers buil being built at all we want
those data centers to have warm and
positive feelings towards people towards
Humanity because those this is going to
be nonhuman life in a sense potentially
it could be potentially be that and so I
would want that any instance of
such super intelligence to have warm
feelings towards humanity and so this is
what we doing with the super alignment
project we saying hey if if you just
allow yourself if you just accept that
the progress that we've seen maybe it
will be slower but it will
continue if you allow yourself
that then can you can start doing
productive work today to build the
science so that we
will be able to handle the problem of
controlling such Future Super
intelligence of imprinting onto them a
strong desire desire to be nice and kind
to people because those data centers
right they'll be they'll be really quite
powerful you know there'll probably be
many of them they will be the world will
be very complicated but
somehow to the extent that they are
autonomous to the extent that they are
agents to the extent they are beings I
want them to
be to be pro-social Pro Human Social
that's the goal what do you think is the
likelihood of that coal I mean some of
it it feels like a a outcome you can
hopefully affect right but uh are we are
we likely to have pro-social AIS that we
are friends with individually or you
know as a species well I mean friends be
I think that that part is not
necessary the the the Friendship piece I
think is optional but I do think that we
want to have very Pro social AI I think
it's I think it's possible I don't think
it's guaranteed but I think it's
possible possible I think it's going to
be possible and the possibility of that
will increase in so far as more and more
people allow themselves to look into the
future into the five to 10 year future
and just ask yourself
what what do you expect AI to be able to
do then how capable do you expect it to
be
then and I think that with each passing
year if indeed AI continues to improve
and as people get to experience because
right now you're talking making
arguments but if you actually get to
experience oh gosh the AI from last year
which was really
helpful this year puts the previous one
to shame and you go okay and then one
year later and one starting to do
science the AI software engineer is
starting to get really quite good let's
say I think that you create a lot more
desire in people
for what you just described for the
future super intelligence to indeed be
very pro-social you know I think there
going to be a lot of disagreement it's
going to be a lot of political questions
but I think that as people see AI
actually getting better as people
experience it the desire for the
pro-social super intelligence the
humanity loving super intelligence you
know as much as this is as as much as it
can be done will increase and on the
scientific problem you know I think
right now it's still being an area where
not that many people were working
on are AI are getting powerful enough
you can really start studying it
productively we'll have some very
exciting research to to share
soon but I would say that's the big
picture situation here just really it
really boils down to look at what you've
experienced with AI up until now ask
yourself like is it slowing down will it
slow down next year like we will see and
we'll experience it again and again
and I think it will keep keep and what
needs to be done will keep becoming
clear do you think we're just on an
accelerative path because I think
fundamentally if you look at certain
technology waves they tend to inflect
and then accelerate versus
decelerate and so it really feels like
we're in an acceleration phase right now
versus the deceleration phase yeah I
mean we are right now it is indeed the
case that we are in an acceleration
phase you know it's hard to
say you know multiple forces will come
in to play some forces are accelerating
forces and some forces are decelerating
so for example the cost and scale are a
decelerating force the fact that our
data is finite is a decelerating force
to some to some degree at least I don't
want to overstate yeah it's kind of a
within an ASM toote right like at some
point you hit it but one it's the
standard S curve right or sigal well
with the data in particular I just think
it won't be it just won't be an issue
because we'll figure out some something
else but then you might might argue like
the size of the engineering project is a
accelerating Force just the complexity
of management on the other hand the
amount of investment is an accelerating
Force the amount of interest from people
from Engineers scientists is an
accelerating force and I think there is
one other accelerating force and that is
the fact that biological evolution has
been able to figure it out and the fact
that up until now progress in AI has had
up until this point this weird property
that it's kind of been
you know it's been very hard to execute
on but in some sense it's also been more
straightforward than one would have
expected
perhaps like in some
sense I don't know much physics but my
understanding is that if you want to
make progress in quantum physics or
something you need to be really
intelligent and spend many years in grad
school studying how these things work
whereas with AI you have people come in
get up to speed quickly start making
contributions quickly it has the flavor
is somehow different somehow it's very
there is some kind of there's a lot of
give to this particular area of research
and I think this is also an accelerating
Force how will it all play out remains
to be seen like it may be that somehow
the scale required the engineering
complexity will start to make it so that
the rate of progress will start to slow
down it will still continue but maybe
not as quick as we had before or maybe
the forces which are coming together to
push it will be such that it will be as
fast for maybe a few more years before
it will start to slow down if at all
that's that would be my articulation
here Ilia this has been a great
conversation thanks for joining us thank
you so much for the conversation I
really enjoyed it find us on Twitter at
no prior pod subscribe to our YouTube
channel if you want to see our faces
follow the show on Apple podcast Spotify
or wherever you listen that way you get
a new episode every week and sign up for
emails or find transcripts for every
episode at no- pri.com
Browse More Related Video
Ilya sutskever | Humanity will eventually move towards AGI | The intelligent body will soon appear
Interview with Dr. Ilya Sutskever, co-founder of OPEN AI - at the Open University studios - English
6. Layers of Mental Activities
Highlights of the Fireside Chat with Ilya Sutskever & Jensen Huang: AI Today & Vision of the Future
"To have dinner with Elon Musk" Ilya Sutskever
Heroes of Deep Learning: Andrew Ng interviews Geoffrey Hinton
5.0 / 5 (0 votes)