Ilya sutskever | Humanity will eventually move towards AGI | The intelligent body will soon appear
Summary
TLDR在这段访谈中,Ilya Sutskever回顾了深度学习和神经网络在计算机视觉领域的重大突破。他分享了自己对于如何训练深度网络的见解,以及这些网络如何彻底改变了机器学习领域。Sutskever讨论了他在OpenAI的工作,包括开发GPT系列模型和CLIP模型,以及这些技术如何推动了AI的边界。他还探讨了AI的未来,包括对提高模型可靠性和效率的期望,以及AI如何可能改变我们的工作和生活方式。
Takeaways
- 📈 神经网络在计算机视觉领域的突破性进展,大幅超越了以往的方法。
- 🤖 深度学习的成功部分归功于James Martin等人的研究,他们证明了可以从监督数据中端到端训练深度网络。
- 🧠 人类视觉的快速反应意味着不需要太多层次的神经网络就能实现可观的视觉效果。
- 🚀 随着大型数据集和GPU的兴起,为训练大型神经网络提供了可能,从而推动了深度学习的发展。
- 🌐 深度学习的成功也改变了人们对机器学习模型的看法,从追求简单优雅的数学证明转向接受更强大的模型。
- 🔑 神经网络被视为一种编程工具,通过反向传播算法对这些“小计算机”进行编程。
- 🔑 神经网络的并行计算特性使其能够处理复杂的模式识别任务,如围棋和机器翻译。
- 🌟 深度学习在图像识别和语言处理等领域的成功,展示了其在解决人类能够快速解决的问题上的巨大潜力。
- 💡 深度学习的发展推动了对神经网络更深层次理解的探索,包括其在强化学习、自然语言处理和计算机视觉等领域的应用。
- 🔮 未来的AI研究可能会继续探索如何使神经网络更加可靠、高效,并在更广泛的任务中发挥作用。
Q & A
神经网络是如何在计算机视觉领域取得突破的?
-神经网络通过端到端的训练方法,利用大量标记数据进行学习,从而在计算机视觉领域取得了突破。这种训练方式最初由Hinton等人提出,并在ImageNet竞赛中得到验证,神经网络的表现大幅超越了以往的方法。
为什么深度学习在早期没有被广泛接受?
-早期深度学习没有被广泛接受的原因是存在一些普遍的误解,比如认为深度网络无法训练,因为梯度消失或梯度爆炸问题。此外,当时普遍认为需要复杂的数学证明来保证机器学习模型的有效性,而神经网络则没有这样的证明。
人类视觉的快速识别能力对神经网络设计有何启示?
-人类视觉系统能够在几百毫秒内识别物体,而我们的神经元反应速度相对较慢。这说明不需要太多的层次就能实现有效的视觉识别,这对神经网络设计提供了启示,即可以通过构建较大但不需要太多层的网络来实现高效的计算机视觉。
为什么选择使用GPU来训练神经网络?
-GPU提供了并行处理大量数据的能力,这对于训练大型神经网络至关重要。在Alex Krizhevsky开发出能够在GPU上高效训练的卷积神经网络(CNN)之后,这种方法开始变得可行,因为它大大缩短了训练时间,使得在ImageNet等大规模数据集上训练成为可能。
神经网络在语言处理方面的应用是如何实现的?
-神经网络在语言处理方面的应用是通过将语言视为一种可以通过模式识别来处理的连续信号来实现的。通过训练神经网络来预测文本序列中的下一个词,网络能够学习语言的结构和语义,从而在机器翻译等任务中取得显著效果。
为什么选择DOTA作为强化学习的研究项目?
-DOTA是一个复杂的实时策略游戏,具有高度的挑战性,它要求玩家具有快速反应、战略思维和直觉。选择DOTA作为研究项目是为了测试和推动强化学习技术的极限,特别是在处理多变量、多步骤决策和长期规划方面。
GPT模型是如何通过预测来实现语言理解的?
-GPT模型通过预测文本序列中的下一个词来实现语言理解。如果模型能够准确预测接下来的单词,这意味着它已经理解了前面的文本内容。随着预测精度的提高,模型能够捕捉到更复杂的语言特征,如词汇、语法和语义。
什么是CLIP模型,它如何将语言和视觉结合起来?
-CLIP模型是一个多模态神经网络,它通过大规模的自然语言监督学习视觉概念。CLIP能够将文本描述与图像内容关联起来,从而实现对图像的理解。这种结合语言和视觉的方法使得模型能够在没有大量标注数据的情况下,学习丰富的视觉概念。
为什么说神经网络是通用的计算设备?
-神经网络被视作通用的计算设备,因为它们可以接受任何形式的输入,通过学习内部表示来处理复杂的任务。无论是视觉识别、语言理解还是策略游戏,神经网络都可以通过适当的训练来适应这些任务,显示出它们的通用性和灵活性。
如何提高神经网络的可靠性和可控性?
-提高神经网络的可靠性和可控性可以通过多种方式实现,包括使用更大的数据集进行训练、设计更复杂的网络结构、以及通过强化学习从人类反馈中学习。此外,还可以通过精心设计的提示(prompting)来引导模型产生期望的输出。
Outlines
🤖 神经网络在计算机视觉领域的突破
在这段对话中,讨论了神经网络如何在计算机视觉领域取得突破性进展,超越了以往的所有方法。关键的转折点是James Martin的一篇论文,首次展示了可以通过监督数据端到端训练深度网络。这一发现颠覆了当时普遍的看法,即深度网络无法训练。此外,还讨论了神经网络的潜力,它们本质上是小型并行计算机,可以通过反向传播算法进行编程。这一认识促使人们开始探索神经网络在计算机视觉等领域的应用,最终导致了在ImageNet竞赛中的突破。
🧠 神经网络的快速发展和硬件的利用
这段对话讲述了神经网络如何迅速发展并利用当时的硬件。提到了人类视觉的快速反应和神经元的慢速特性,指出不需要太多层次就能实现有效的视觉识别。随着大型数据集的出现和GPU的普及,神经网络迎来了发展的黄金时期。特别是Alex Krizhevsky在ImageNet上使用GPU训练小型卷积网络取得显著成果,这进一步推动了神经网络在计算机视觉领域的应用。
🌐 神经网络在语言处理和游戏领域的应用
在这部分对话中,讨论了神经网络在语言处理和游戏领域的应用。提到了神经网络在语言翻译和围棋等游戏中的潜力。强调了神经网络的直觉力,它们能够像人类专家一样快速做出决策。此外,还探讨了神经网络如何通过深度学习来解决需要大量思考的问题,以及它们在语言模型和游戏策略中的应用。
🚀 神经网络的创新和未来发展
这段对话探讨了神经网络的创新过程和未来发展的可能性。讨论了如何通过不断的实验和探索来推动神经网络技术的进步。提到了在神经网络研究中,即使是简单的方法也可能带来意想不到的成果,如在DOTA游戏中的应用。同时,也强调了在神经网络研究中保持开放心态和勇于尝试的重要性。
🎓 从学术到实践:神经网络的商业化之路
在这部分对话中,讨论了神经网络从学术研究到商业应用的转变。提到了在Google的工作体验和DeepMind在AlphaGo项目中的成功,这些经历激发了对神经网络商业潜力的认识。讨论了如何将神经网络技术转化为实际产品,以及在这一过程中面临的挑战和机遇。
🤖 神经网络的自我学习和自我优化
这段对话探讨了神经网络的自我学习能力和自我优化潜力。讨论了如何通过训练神经网络来提高其在特定任务上的表现,以及如何利用神经网络的自我学习能力来解决更复杂的问题。提到了在神经网络训练过程中,如何通过调整网络结构和参数来优化其性能。
🧠 神经网络与人类智能的结合
在这部分对话中,讨论了如何将神经网络与人类智能相结合,以实现更高效的问题解决。提到了通过训练神经网络来模拟人类的认知过程,以及如何利用神经网络来增强人类的决策能力。强调了在神经网络研究中,理解和模拟人类智能的重要性。
🌐 神经网络在多模态学习中的应用
这段对话探讨了神经网络在处理多模态数据(如视觉和语言)中的应用。讨论了如何训练神经网络来理解和生成图像和文本,以及如何利用神经网络来实现跨模态的学习和理解。提到了CLIP和DALL-E等模型,它们能够将视觉和语言结合起来,实现更丰富的数据理解和生成。
🚀 神经网络的未来和对社会的影响
在这部分对话中,讨论了神经网络技术的未来发展方向,以及它可能对社会产生的深远影响。提到了随着神经网络能力的增强,它们将在各行各业中发挥越来越重要的作用。讨论了如何确保神经网络的可靠性和安全性,以及如何通过合理的政策和法规来引导神经网络技术的发展。
🎨 创造力与神经网络的结合
这段对话探讨了创造力在神经网络研究中的作用,以及如何通过结合艺术和科学来推动神经网络技术的发展。提到了个人如何在神经网络研究中保持创造力和创新精神,以及如何通过跨学科合作来探索新的研究方向。
Mindmap
Keywords
💡深度学习
💡神经网络
💡计算机视觉
💡卷积神经网络(CNN)
💡端到端学习
💡GPU
💡ImageNet
💡Transformer
💡无监督学习
💡强化学习
💡多模态学习
Highlights
神经网络在计算机视觉领域取得了巨大突破,大幅超越了以往的方法。
深度学习的出现标志着可以从监督数据中端到端训练深度网络。
神经网络本质上是小型并行计算机,可以通过反向传播算法进行编程。
人类视觉快速,意味着不需要太多层次就能实现可观的视觉识别。
ImageNet数据集的出现和GPU的普及为训练大型神经网络提供了可能。
Alex Krizhevsky的GPU代码能够在60秒内训练小型卷积网络取得显著结果。
深度学习在计算机视觉领域的成功,引发了对其他领域应用的思考。
深度学习在语言处理领域的应用,尤其是机器翻译,取得了显著进展。
神经网络的并行计算能力使其在处理连续信号如语音和视觉信号方面表现出色。
AlphaGo的成功展示了深度学习在解决复杂策略问题上的巨大潜力。
深度学习的发展推动了AI领域从个体研究者向大规模工程项目的转变。
OpenAI的成立初衷是探索深度学习在工程领域的应用和潜力。
DOTA项目的挑战和成功展示了简单强化学习策略的有效性。
GPT系列模型的发展和成功,特别是在语言生成和理解方面。
CLIP模型的创新之处在于将自然语言和视觉理解结合在一起。
通过强化学习和人类反馈训练,提高了AI系统的可控性和可靠性。
未来AI的发展将更加注重效率和成本,以及在特定领域的专业化应用。
AI的最终目标是实现自动化生产力,使人们能够享受由AI创造的成果。
Transcripts
a neural network beat all past
approaches to computer vision by a very
large margin and of course you were one
of the people making that happen and so
I'm really curious from your
perspective H how did that come about um
everybody else is working on different
approaches to computer vision and there
you are working on nets for computer
vision and then you drastically
outperform everyone um how do you even
decide to do this yeah I'd say
that what led me to this
result
was a set of
realizations over over the time period
of a number of years which I'll describe
to you so I think the first really
pivotal pivotal moment was
when James Martins has written a paper
called Deep learning by H in free
optimization and that was the first time
anyone has shown that you can train deep
networks end to end from supervised data
MH but for some context back in those
days everybody knew that you cannot
train deep
networks it cannot be done back
propagation is too weak you need to do
some kind of pre-training of some sort
and then maybe you'll get some kind of
an NPH but if it is the case that you
can train them end to
end then what can they do and the thing
is you know
why there is one more piece of context
that's really important so today we take
deep learning for granted of course a
large neural network is what you need
and you get you you sh sh you shove data
into it and you'll get amazing result
everyone knows that every child knows
that how can it be how can it be that we
did not know that how could such an
obvious thing was not known
well people were really focused on
machine learning models where they can
prove that there is an algorithm which
can perfectly train them but whenever
you put this condition on yourself and
you require to find a simple elegant
mathematical proof you really end up
restricting the power of your model in
contrast neural networks like the
fundamental thing about neural networks
is that they are basically little
computers little parall computers that
are no longer so little anymore that
definitely are they can be as little or
as large as you want but basically it is
a computer it is a parallel computer and
when you train a neural network you
program this computer with a back
propagation algorithm
and so the the thing that really clicked
for me is when I saw this these results
with the H and fre optimized I realized
wait a second so we can actually program
those things now it's no longer the case
that you know maybe you could so the
prevailing view was aspirationally maybe
someone could train those things but
it's obviously impossible local minimas
will get you but no you can train a
neural net then the second realization
is human vision is fast it takes several
hundred milliseconds at most to
recognize something and yet our neurons
are slow so that means that you don't
even need that many layers to get
respectable Vision so you put this so
what does that mean it means that if you
have a neural network which is pretty
large then there exist some
parameters which achieve good results on
Vision now if only there was a data set
which we could train from and then imet
came up and then the gpus came up and
then I was this has to happen and then
at some point I had a with Alex kki
where he said that he has GPU code which
can train a small conet to get
respectable results on Sear in 60
seconds and I was like oh my God so let
let's let's do this on imet it's gonna
it's gonna Crush everything and that's
how it happen that's how it came to
be I I love the backstory here Ilia and
how it reminds me a lot of our days at
open ey where many things to you just
look unavoidable and and just so clearly
that they have to be that way I remember
the first time you you articulated to me
um that an net is just a computer
program um and this is like several
years before even karpathy started
talking about software 2.0 being you
know programming with neural Nets and
it's just parallel and serial compute
it's it's really it's really amazing
that you saw this even before there was
real success in neur Nets um when did
you realize it was actually working on
image net what was was that like I mean
I I had very I had very little doubt
that it would work but it was kind of
you know at this point you know Alex was
training the neural net and the results
were getting better week after
week and that's about it but I felt but
I felt like the big risk for my
perspective was can we have can we have
that do we have the ability to utilize
the gpus well enough MH train a big
enough you know big enough there's no
such thing it's more like an
interestingly large neural network it
has to be a neural network that is large
enough to be interesting whereas all
previous neural networks are small if
you're just going to have something
which is going to be way larger than
anything before then it should do much
better than anything anyone's ever seen
of course we are far beyond that our
computers are faster and your networks
are larger but the goal was not the goal
was just to go as far as possible with
the hardware we had back then that was
the risk and fortunately Alex had the
kernels that eliminated that risk right
that's a very good point I mean at the
time it wasn't I mean today you put
something in pytorch tensor flow
whatever your favorite framework is and
you can train in your network back then
you actually had to build some pretty
specialized tools yourself to to make
this all
run
now as that breakthrough happens I'm
curious what are you thinking next what
do you think like okay we do this you
probably knew this this breakthrough
happened before everybody else in the
world because I mean you you had the
results before the public workshop and
so before everybody else in the world
even knew that neural Nets are going to
be the new state-of thee art and a new
way of doing computer vision you already
knew that and so where was your mind
going at that point so I think there
were two things which I was thinking
about so the the thing the the belief so
my belief has been that we' proven that
neural Nets can solve problems that
human beings can solve in a short amount
of
time because with the risk we've proven
that we can train neural Nets with
modest numbers of layers and I thought
we can make the neural networks wider
but making and that will be pretty
straightforward making them deeper is
going to be harder and so I thought okay
well depth is how you solve problems
that require a lot of thinking so can we
find some other interesting problems
that don't require a lot of thinking and
I actually was thinking a little bit
about reinforcement learning but the
other problem was problems in language
that people can Sol can understand
quickly as well so with language you
also have the property that you don't
need to spend a lot of time thinking
know what did what did they say exactly
you know sometimes you do but often you
don't so problems in language and
translation was
the the preeminent problem in language
at the time and so that's why I was
wondering if he could do something there
another thing which I was thinking about
was was actually go as well I was
thinking that using a
convet could
potentially provide very good intuition
for the non neural network go plane
system that existed back then can you
say a bit more about the the go system
um how how neur Network could and
actually has changed then from there how
that's
done I
mean basic the thing about neural
networks is that okay
so before deep
learning anything you had to do with AI
involved some kind of maybe search
procedure with some kind of hardcoded
heris sixs where you have really
experienced Engineers spend a lot of
time thinking really hard
about how exactly under what conditions
they should continue something or
discontinue something or expand
resources and they just spent all their
time
trying to figure out those fistic but a
neural network is formalized
intuition it is actually intuition it
gives you the kind of expert gut feel
because I read I read this thing that an
expert player in any game you can just
look at the situation and it instantly
get a really strong gut feel it's either
this or that and then I spend all their
time thinking which one of those which
which of those two it is it's say great
the neural network should have
absolutely no trouble if you buy the
theory that we can replicate functions
that humans can do in a short amount of
time like less than a second and it felt
like
okay in case of something like go which
was a big unsolved problem back
then and neur should be able to do that
back in the time Elia with the first
time I heard that you know maybe use a
confet for go my my naive reaction
obviously because clearly it it
succeeded my na naive reaction was
confidence are famous for translation
invariance and there's no way that we
want to be translation invariant on on
on the board of go because you know it
really matters whether a pattern is you
know in one place or another place um
but obviously you know that that didn't
stop The Confidence from succeeding and
and just capturing the patterns
nevertheless yeah I mean you know that's
that's again the power of the parallel
computer can you imagine programming a
convet to do the right thing well it's a
little bit hard to imagine that but
it's it's true that that part may have
been a small a small leap of faith and
maybe to cl to close the loop on go so
my my interesting
go ended up in me participating on the
alpha go paper as well in in a modest
way you know like I I I got I had an
intern Chris Madison and we wanted to
apply super continents to go and at the
same time Google acquired Deep Mind and
all the Deep Mind folks have visited
Google and so we spoke with David silver
and Aang and be thought it would be a
cool project to try out but then Deep
Mind really they put a lot of effort
behind it and they really had a
fantastic execution in this
project yeah I think while the imet
moment is the moment most AI researchers
saw the coming of age of deep learning
and a whole new era starting alphao is
probably the moment most of the world
saw that AI is now capable of something
very different from what was possible
before
um it's interesting though because While
most of the world's focused on that
around the same time um actually a New
York Times article comes out saying that
actually something very fundamental has
been happening in national language
processing which you alluded to and that
actually the whole Google translate
system was had been revamped with neural
networks um even though a lot of people
think of neur net as at the time as
pattern recognition and patterns should
be signals like speech or or visual
signals and language is discret and so
I'm really curious about that um how how
do you make the leap from these
continuous signals where NE Nets to many
people seemed a natural fit to language
which most people would look at as
discrete symbols and very different yeah
so I think that leap is very natural if
you believe relatively strongly that
biological neurons and artificial
neurons are not the different because
then you can say Okay human beings let's
let's find let's think of the single
best professional translator in the
world someone who is extremely fluent in
both languages that person could
probably translate language almost
instantly so there exists some neural
network with a relatively small number
of layers in that person's
mind that can do this task okay so if
you have a neural network in outside our
computer which might be a little bit
smaller and it's trained on a lot of
input output examples we already know
that we will succeed in finding the
neural net that will solve the problem
so therefore the existence of that that
single really good instantaneous
translator
or the existence of such one such person
is proof that the neural network can do
it now it's a large neural network our
brains are quite large but maybe you can
take a leap of faith and say well maybe
our digital neurons we can train them a
little bit more and maybe they're a
little bit less noisy and maybe it will
still work out now of course the neural
networks are still not at the level of a
really amazing human translator so
there's a gap but that was the chain of
reasoning that humans can do it quickly
biological neurons are not unlike
artificial neurons so why can't the
neural network do it let's find out with
your collaborators at Google you
invented the modern way of of doing a
machine translation with neural networks
which is uh really amazing can you say a
little bit more about how that works all
you need is a large neural network with
some way of ingesting some
representations of words and when the
representation of words so what is it
mean a representation it's a word that
we use in AI often a representation is
basically okay so you have the letter A
how do you show it or the word cat how
do you present it to the computer to the
neural network and you basically just
need to agree with yourself that hey
we're going to create some kind of a
mapping
between the words or the letters into
some kind of signals that happen to be
in the format that the neuronet can
accept so you have this one you you just
say I'll just design this dictionary
once and feed those signals to the
neural net and now you need to have some
way for the neural networ to ingest
those signals one at a time and then it
emits the words one at a time of the
translation and that's literally it it's
called the auto regressive modeling
approach and it's quite popular right
now but it's not because it's so but
it's not because it's necessarily
special it's just convenient the neural
networks do all the work the neural
networks figure out how to con build up
their inner Machinery how to build up
their neurons so that they will
correctly interpret the words as they
come in one at a time and then
somehow you know break them into little
pieces and transform them and then do
exactly the right orchestrated dents to
Output the correct words one at a time
it's probably possible to design other
neural networks that other ways of
ingesting the words and people are
exploring this right now you know you
may have seen some you know if you
follow ml Twitter you may have seen some
words like phrases like diffusion
models so maybe they will be able to
ingest the words in parallel and then do
some sequential work and then output
them in parallel it doesn't actually
matter what matters is that you just
present the words to the neuronet
somehow and you have some way that the
neuronet can output the words of the
Target and that's what matters yeah to
me it was a very big surprise at the
time that that it worked so well for
language I I was 100% certain that it
will work great for anything continuous
and then all of a sudden the sequence to
sequence models that you pioneered was
like okay well I guess now it's going to
work for everything was my conclusion
because if it can work for for language
what's what's left in terms of signals
we we work with right um now you of
course didn't start um working on neur
Nets from from the day you're born and
I'm really curious you know where did
you grow up and how did that lead you to
ending up you know being an AI
researcher yeah so I was born in Russia
I grew up in Israel and then I moved to
Canada when I was
16 according to my parents I've been
talking about AI at a relatively early
age and I definitely remember at some
point thinking about a I and reading
about this whole business with playing
chess using brute force and it was
totally clear it was it seemed that yeah
you could do the chess stuff no problem
but the learning stuff that's where the
real meat of AI is that's why AI is so
terrible because it doesn't learn and
humans learn all the time so can we do
any learning at all so my par so when my
family moved to Canada to Toronto and I
entered the University of Toronto I
sought out the learning professors and
that's how I found Jeff
Hinton and then the other thing is that
he had he had this he was into training
neural networks and neural networks
seemed
like a much more promising Direction
than the other approaches
because they didn't have obvious
computational limitations like things
like decision trees which were those
words were those that phrase was popular
back in the day
mhm now Jeff of course have a has a very
long history working in Ai and
especially neural networks deep
learning um you know coming out of
England coming to the US then moving to
Canada and and his move to Canada in
some sense helped spark the the AI the
beginning of the new AI era in Canada of
of all places right you're there at the
same time which is really interesting
kind of curious you know
do you think there's any reason your
parents decided to go to Toronto and
that it is like the place where both you
and Jeff ended up and Alex I mean the
three of you were there together to make
that happen I think it's a h a bit of a
happy coincidence I think it has to do
with the way immigration works it it is
it is a fact that it
is qu quite a bit easier to immigrate
into Canada and if you immigrate into
Canada Toronto is perhaps the most um
appealing City to settle in
now that coincidence brings you to
University of
Toronto and you find Jeff Hinton working
on your networks but I gotta imagine
when you you looked into his history you
must have noticed he'd been working on
it for 30 40 years and was there any
moment you thought well maybe if it
doesn't work after 30 40 years it's not
going to work now either I see what
you're saying but my motivation was
different I had I had a very explicit
motivation
to
make even a very very small but a
meaningful contribution to AI to
learning because I thought learning
doesn't work at all completely and if it
works just a little bit better because I
was there I would declare it a success
and so that was my goal and do you
remember anything from your first
meetings with jef how was that I mean so
I was I was a thirdy year undergrad when
I met him for the first time I mean I
thought it was great so my major in
undergrad was math but the thing about
math is that math is very hard and lots
of and all the really talented people
would go into math and so one of the
things which I thought was great about
machine learning is that not only it is
the thing but also all the really clever
people going into math and
physics so I was very pleased about that
what I remember from actually reading
Kade metz's book um is actually my my
possibly my favorite anecdote from the
book tell has Jeff telling the story
about him meeting you Ilia and so here
here's how the book tells the story
maybe you've read it maybe not but
essentially the book says yeah there's
Jeff you know and this this young
student comes in il it's of her
undergrad still and Jeff gives you a
paper and
um
you you go read
it and you you you come back and um you
tell him I don't understand it and
Jeff's like oh that's okay you know
you're still underground what don't you
understand I can explain it to you and
essentially you say actually I don't
understand why it's why they don't
automate the whole process of learning
it's it's still too too much handholding
um I understand the paper I just don't
understand why they're doing it that way
and Jeff's like okay wow this is this is
is interesting it gives you another
paper and um again you go read you come
back so goes the story and you say oh I
don't understand this one either and
Jeff's like what do you understand don't
you understand about this one I'm happy
to explain and you go I don't understand
why they train a separate neural network
for every application why can't we train
one gigantic Network for everything it
should you know it should help to be
trained jointly and to me that that
that's really I mean that reminds me a
lot of our times at open the eye where
it always felt like you are you know
already thinking you know several steps
into the future of how things are going
to shape up just from the evidence we
have today you know how it really should
be several years down the line uh that
at least according to the book that's
how Jeff remembers the first two
meetings with you yeah I mean some some
something like this did happen it's true
so the field of AI back then when I was
starting out was not not a hopeful field
it was a field of desolation and despair
no one was making any progress at all
and it was not clear if progress was
even
possible and so that's why well H how
what do you do when you are in this
situation so you say you walking down
this path this is the path the most
important path but you have no idea how
long it is you have no idea how hard
it's going to be what would be a
reasonable goal in this case well the
goal which I chose was can I make a
useful step one use useful step so that
was
my explicit motivation at least for
quite a while before it became clear
that actually the path is going to
become a lot Ste a lot you know a lot
sloper and a lot more rapid where
Ambitions became grew very rapidly but
at first when there was no no gradient
the goal was just make any Step at all
anything useful that would be meaningful
progress towards Ai and I think that's
really intriguing actually because I
think that's what drives a lot of
researchers is is
to just find a way to make to make some
progress knowing not knowing actually
ahead of time how far you can get but
just being so excited about the topic
that you you just want to find a way to
at least make some progress and and then
keep going um and it's it of course very
interesting in your case that you know
then that whole thing switched from you
know slow progress to ever faster
progress all of a sudden thanks to the
thing that you're like look you you're
trying to make that bit of progress and
it turns out to open up the floodgates
for for massive
progress now you start you start in
Canada you your PhD research of course
you know completely changes the field
you start a company that gets acquired
um by Google and you're at Google then
the big thing and also the the moment
actually our paths start start Crossing
or about to cross is that you you know
you're on this role at Google you're
doing some of the most amazing
pioneering work in AI you're clearly in
an amazing situation where you are you
know doing some some of the best work
that's happening in the world and you
you decide to change your
situation how did that come about I
remember being at Google and feeling
really
comfortable and also really restless
I think two two factors contributed to
that one is that I I somehow I could
look 10 years into the future and I had
a little bit too much clarity about how
things will look like and I didn't enjoy
that very much but there was another
thing as well and that's
the the experience of seeing Deep Mind
bu work on alphao and I and it was it
was very inspiring and I thought
that it's a sign of things to come that
the field is starting to sure up until
that point all progress in AI has been
driven by individual researchers working
on small projects maybe small groups of
researchers with some advice by their
professors and maybe some other
collaborators but usually it would be
small groups it would most it would most
of the work would be idea heavy and then
it would be some kind of a some effort
on an engineering on the engineering
execution to prove that dat is valid but
I felt that alphago was a little
different
it showed that in
fact it showed to me that the
engineering is critical and in fact the
field will change and you become the
engineering field that it is
today because the tools were getting
very solid and the question then becomes
okay how do you really Trin those
networks how do you debug them how do
you set up the distributed training and
it's a lot of work and the stack is
quite deep
and I felt that the culture at Google
was very similar to the Academia culture
which is really good for generating
radical novel ideas and in fact Google
has generated a lot of radical and
revolutionary ideas in AI over the years
and most most notably the Transformer
from from the past few
years but I felt that that's not going
to be the whole of progress in AI I felt
that it's not now only a part of
progress in AI so if you think of it as
of of the body you can say you need both
the the muscles and the skeleton and the
nervous system and if you only have one
it's amazing but the whole thing won't
won't really move you need all things
together and so I felt that I had a
vague feeling that it would be really
nice if there
was some kind of a company which
would have these elements together but I
didn't know how to do it I didn't have
any any path to it I was kind of I just
daydreaming about it and then at some
point I got an email from Sam Altman
saying hey let's get dinner with some
cool people and I said sure and and I
showed up and and Greg Greg Brookman was
there and Elon Musk was there and a few
others and we just chatted about
wouldn't it be nice to start an a new AI
lab and I found that really the time was
right because I was thinking about the
same thoughts
independently and I really wanted it to
be engineering heavy and you know no no
seeing that Elon was going to be
involved I thought well who better to
who would be
better can't imagine a better person
from whom to learn the you know big
engineering project side of
things so I think this was the Genesis
there is kind of there there is more to
it but I think that was the real the
real Genesis of open AI from my
perspective that yeah like I was
thinking about something and then it
just one day I woke up with this email
hey the thing from my perspective it was
like I was daydreaming about something
and then my Daydream come came true
almost like this the the dream Daydream
becomes true what you're really saying
there is that you know there is a group
of
people very highly accomplished and
ambitious people who are in some sense
aligned with your dream and want to want
to make this happen together but all
that gets you is essentially you know
some some paperwork that a new company
now exists and um maybe some money to to
get going but you actually still need to
decide what to do with those resources
and with your time I'm kind of curious
at the beginning of of open eye what
what was going on in your mind in terms
of how to shape this up um I mean
obviously it's been been a massive
success but I'm really curious about you
know that the beginning part and how how
that played out for you so the beginning
part I would describe it as a whole lot
of
stress and it wasn't exactly clear how
to get going right
away there was only clarity about a few
things which is there need to be some
kind of a large
project
and I also was excited about the idea
that maybe if you can predict really
well you make progress on un supervised
learning but beyond that it wasn't clear
what to do so we tried a whole lot of
different things and then we decided
that maybe it would be good to solve a
difficult computer game Dota and if this
is this and this is where Greg just
showed his strength and he just took on
this project even though it seemed
really impossible genuinely impossible
and just went for it and
somehow it worked in the
most stereotypical deep learning way
where the simplest method that he tried
just ended up working the simplest
policy gradient method as we kept
scaling it up just never never stopped
improving with more scale and more
training just to double click on that
for for a moment I don't think everybody
knows what DOTA is can you say a bit
about that and I mean I fully agree why
it's so surprising that the simplest
approach ultimately work is a very hard
problem so for some context the state of
the field back then was okay so if you
look at reinforcement learning in
particular Deep Mind has made some very
exciting progress first by training rein
a neural net with reinfor ment learning
to play Simple computer
games and then and then the reaction was
okay that's exciting and interesting and
kind of cool but what else can it do and
then alphago happened and then the
opinion is shifted okay reinforcement
learning maybe can do some things but
you know go it's funny by the way go
used to seem this used to to be this
impossible game and now everyone says oh
such a single game the board is so small
our
perceptions change quickly but then
Deep Mind we're talking about how
Starcraft is the next logical step after
to go and it made a lot of sense to me
as well it seemed like a much harder
game not necessarily in its not
necessarily from the for for a if a
person to play but for our tools it
seemed harder because it had much
more you had a lot more moving Parts
it's much more chaotic it's a realtime
strategy game and we thought that it
would be nice to have our own twist on
it and to try to make a bot which can
play DotA and DOTA is another real-time
strategy game that's really popular it's
been the it had I believe it definitely
had I don't know if it still has the
largest prize
pool the largest annual prize pool of
any professional esport
game so it was very it has a very
vibrant very strong professional scenes
people dedicate their lives play in this
game
they it's it's a game of of reflex and
strategy and Instinct and a lot of
things happen you don't get to see the
whole game the the point is that it
definitely felt like a grand challenge
for reinforcement learning at that time
and our opinion about the tools of
reinforcement learning was so let's put
it this way so the Grand Challenge felt
like it's here and The Field's opinion
about the tools and their ability to
solve a problem like this was like here
there was a huge mismatch and so when we
started working on it we we thought oh
yeah we're going to need to develop all
kinds of crazy planning methods and
hierarchical reinforcement learning
methods and whatnot but let's just get a
baseline let's just see when the
Baseline breaks
and that's when the Baseline just didn't
break it just kind of kept improving all
the time and it's interesting with
each with each so what would happen on
over the course of this project we would
have these public demonstrations of our
progress as we reach different
Milestones to Performance we would have
some kind of a public exhibition game
against a professional of different
level of accomplishment so at first we
had a a public exhibition game
against retired professionals then we
had them against active professionals
and then finally we had a game against
the strongest professionals and we
defeated them but the interesting thing
is that at each step you'd have people
who you'd have very knowledgeable
experts in a
who would come out on Twitter and say
well this was really cool great
successful reinforcement learning but
obviously the next step would require
the plan the explicit planning thing or
the hierarchy thing and somehow it's did
not so that was that was a very
important
result for us I felt like it really it
really proved to us that we can do large
projects I remember I was not part of
this project uh just to be clear um but
but I was there at open ey when it was
all happening working on on other
projects and I remember being very very
surprised and that no explicit structure
was
needed though OB though well in my mind
obviously but maybe you know it's not
even true but in my mind there is this
large lstm model neural network
that maybe somehow through back
propagation actually internalize the
structure that we all at least not all
of us but maybe me I thought we would
have to put in explicitly and maybe the
ne network was able to just absorb that
intuition through back propagation
without the need to to hardcode it which
was really intriguing to me because it
just seemed like wow um you know a lot
of intuitions might be better uh
provided through through data than
through hard coding which seems a very
common Trend in all of deep learning but
maybe in reinfor wining at the time
wasn't that strongly believed yet till
till that result came out yeah I mean I
I agree I agree with your assessment I
feel
like yeah
I I I like to think that this result had
changed the fields view at least a
little bit about the capability of
simple reinforcement learning now to be
fair you still need quite a hefty amount
of experience to get a very strong
result on such a game
and then we also use the similar
approach so I would say if you if you
have the ability to generate a very
large amount of experience against some
kind of a simulator then this style of
reinforcement learning can be extremely
successful and in fact we have
also another important
result in open a history was to use the
same exact approach to train a robot to
solve the Rubik Cube So Physical robot a
physical robot hand actually soled the
physical Rubik's Cube and it was a quite
challenging project the training was
done entirely in
simulation and the simulation was
designed in such a way so that it's
extra hard and it requires the the
neural net to be very adaptive so that
when you give it the real robot the the
real physical robot it will still
succeed but at core it was the same
exact approach as the one we used with
the DOTA project which was very large
scale reinforcement learning in fact it
was even the same
code
so that was a case where we had this
General technique these General powerful
results which we were able to use in
more than one place and that was what
you've done on reinforcement L now I
know that right now there's other
reinforcement learning happening at
openi in the context of of language
actually before we we get to and I'm
really curious about about that but
before we we get to
that
um language modeling
GPT is probably you
know the the most visible thing in
recent
years in the public eye of what AI is
capable of and you know opening ey
generated these GPD generations of
models that can complete articles in
very credible always um and it's been
very surprising how capable it is and so
what I'm really curious about again in
some sense is you know you decided that
I mean not alone but together with
collaborators at open ey you decided
that you know it was it was time was
right to to go down this path of you
know building language models and I'm
really curious what what was it for you
that made you believe that you know this
was the thing to start doing yeah so
from my
side a really important thing that
happened to me is that I was really
interested in unsupervised
learning and for
context the the the results that we
spoke about earlier on about vision and
even about you know go and DOTA all
these results translation they are all
cases where you have somehow you train a
neural network by presenting it with
inputs
and desired
outputs you have your random input not
random you have a typical input a
sentence an image something and you have
the desired output and the neural
network you run it and you compare the
predicted output with the desired output
and then you change the neural network
to reduce this error and you just do it
a lot you do it a lot and that's how
learning works and it's completely
intuitive that if you will do this the
neural network will succeed I should say
maybe not completely intuitive but
definitely pretty intuitive today
because you say hey here is my input
here's my desired output don't make the
mistakes eventually the mistakes will go
away and it is something where you can
at least have a reasonably strong
intuition about why it should work why
supervised learning works and why
reinforcement learning works in contrast
at least in my mind unsupervised
learning is much more mysterious now
what is unsupervised going exactly it's
the idea that you can understand the
world whatever that means you can
understand the World by simply observing
it without there being a teacher that
will tell you what the desired Behavior
should
be so there is pretty obvious question
which is okay so
like why would like how could that
possibly work how could it possibly be
that you have okay so what would you do
then what was the typical prevailing
thinking the prevailing thinking has
been that maybe you have some kind of
task
like you take your input your
observation an image let's say and then
you you ask the neural network to
somehow transform it in some way and
then to reproduce the same image back
but why would that be a good thing for
the test you care about is there some
mathematical reason for it I found it
very unsatisfying in my mind it felt to
me like the there is no good
mathematical basis for unsupervised
learning at all
whatsoever and I was really bothered by
it and after a lot of thinking I had I
had the I had the I developed the belief
that actually if you predict the next
bit really well you should have a really
good on supervis
level the idea is that if you can
predict the next bit really well then
you have extracted all the meaningful
information that somehow the model knows
about all the meaningful information
exist in the signal and therefore it
should have a representation of all the
concepts and it's the idea in the
context of language modeling it's very
intuitive you know if you can predict
the next word moderately accurately
maybe the model will know that words
that just clusters of characters
separated by space if you predict better
you might know that there is a
vocabulary but you won't be good at
syntax if you improve your prediction
even further you'll get better at the
syntax as well and suddenly we'll be
producing syntactical mambo jumbo but if
you improve prediction even further
necessarily the semantics has to start
kicking in I felt that the the same the
same argument can be made about
predicting pixels as well so at some
point I started to believe that maybe
doing a really good job on
prediction they'll give us on supervis
larel which back then felt like a Grand
Challenge and other interesting thing
that now everyone knows that
unsupervised learning just
works but not that long ago it seemed
like this completely intractable
thing so anyway to come back to the
story of how the gpts were created so
then you know I'd say the first project
that really uh was a step in this
direction was led by Alec Ratford who is
an important hero of the GPD Saga where
we trained a neur an lsdm to predict the
next character on reviews on on on um on
Amazon reviews of products
and we discovered that this lsdm has a
neuron which corresponds to
sen in other words if you are reading a
review which is positive the sentiment
neuron will fire and if you're reading a
review which is negative the sentiment
NE will not
F so that's interesting and that felt to
us like it validated the conjecture of
yeah of course eventually if you want to
predict what comes next really
well you need
need you need to discover the truth
about the
data and so then what happened is that
the Transformer came
out and then we saw the Transformer and
I think it was it was
pretty like it got us really excited
because we were really struggling we
believe that long-term dependencies were
really
important and the Transformer had a very
clean elegant and compute efficient
answer to long-term dependency and for
context the Transformer is this neural
network
architecture and in some sense it's just
really
good but a little bit more technically
so we discussed that these neural
networks are deep in some way and we
know and it's been the case until
relatively recently that it was pretty
hard to train deep neural networks and
previous neural networks for training
models on sequences of language the
longer the sequence was the deeper the
Network would get the harder it would be
to
train but the Transformer decoupled the
depth of the Transformer from the length
of the sequence so you could have a
Transformer of manageable depth with
very long sequences and that was
exciting and this investigation led to
gpt1 and then I would say further then
we we we contined to believe in scale
and that led and that led to gpt2 and
three and here it's really I want to I
want to call out dark mod who really
believed that if he were to scale up the
gpts it would be the most amazing thing
ever and that's how we got
gpt3 in GB3 I mean when it came out it
wasn't just I think was so exciting to
the entire Community it wasn't just
something that could
complete text when you start with a
prompt it could maybe say oh this is
likely your next
sentence you I mean it could complete
all kind kinds of things people would
write web pages even write some very
basic code that gets completed with gpt3
they they would
um and and they would be able to prompt
it and and that really intrigued me this
notion of of prompting where you have
this gigantic model that's trained on I
don't know how much text out there but
that somehow when you then briefly feed
it a little bit of extra text in in the
moment you can actually prime it to
start doing something that you wanted to
do can you say a bit more about that
where did that come from and how how how
how does that work you think so what is
a language model
exactly you just have a neural network
that takes takes some text and tries to
Output an educated guess of what the
next word might be and it outputs an
educated guess it might
say you know it's 30% the word the some
kind of a prob guess of probabilities
what the words might be then you can
pick a word according to this
probability that the neural net outputs
and then commit to it and then ask the
neural to predict the next word again
and again and
again now we know that real the we know
that real text in some sense is very
responsive to its beginning like we know
that text has a lot of very complicated
structure and if you read a document
which says this document below will'll
describe a list of questions that were
given in the um uh
MIT entrance exam in the 1900 I just
made it up then I I strongly expect that
in fact there will be 10 or so questions
in math of the kind of math that was
usually in math exams in the
1900 if the model is good enough it
should actually do
that now how good enough is good enough
well this is a little bit of a
qualitative statement but if it is
definitely good enough it should be able
to do it so then you train a gpt3 and
you see can it actually do it and
sometimes cannot but very often indeed
it is responsive it is very responsive
to whatever you whatever text you give
it because to predict what comes
next correct well enough you need to
really understand the text you're
given and I think this is kind of in
some way the centrality of
prediction good enough prediction gives
you everything you could ever dream
about now one of the things that I think
also stood out to me with GPT is
that it's it's a research breakthrough
it's a major research
breakthrough but it also feels very
practical like I mean whenever I'm
typing
something I mean I know what I want to
type
next it's already in my head but I still
have to type it but with a
GPT you know gbt2 onwards probably it it
could complete it fairly accurately and
so it it seemed like very different in
that sense from for example the Rubik's
Cube breakthrough or the um DOTA
breakthroughs which
were fundamental research breakthroughs
but it was hard to dream of the direct
applications and here with
GPT it was so easy to to dream of so
many applications and I'm curious if
that you know in in your own uh kind of
evolution on things when GPT started
working did you start thinking about
applications or did you know more
generally people around you at open a
start thinking about applications what
was going on yeah we were
definitely excited about the potential
applications I mean we were so excited
about them that we built a whole API
product around gpt3 so that people could
go and build their new and convenient
and sometimes unprecedented applications
in
language I mean I think it's it's it's a
general so maybe another a way of
looking at what's happening is that AI
is just continues to continuing to get
more and more
capable
and it can sometimes be tricky to tell
if a particular research Advance is real
or not
real suppose you have some cool demo of
something like what do you make of it it
can be hard to understand the magnitude
of the advance especially if you don't
know how similar the demo is to their
training data for
example but if you have a product that's
useful then the advance is real and I
feel that maybe in a sense we have moved
away
from the field has matured so much that
we no longer need to rely on demos and
even benchmarks as indicators of as the
only indicators of progress but
usefulness as the truest indicator of
progress and so that's why I and and so
I think this
is a good sign for gpt3 for sure and
yeah the applications we were excited
about them and people are using gpt3 all
the time right now are there any uses
that you've seen that you're able to
share the applications being built
there's plenty of applications I
remember seeing something that helps you
write a
resume and cify it something that helps
improve your emails I think I've seen
something like this I don't remember but
they all have this kind of
flavor I know that there is a lot of
users unfortunately I don't remember
specific examples of the top of my head
this is jumping ahead a little bit in
the progression of of of the research
trajectory you've gone through with open
eye but maybe the biggest application of
course and maybe it's not called GPT
anymore it's called codex but it it's
very similar it's a system that can help
you write
programs can you say a bit about that
and how is it I'm curious is it just
like GPT but trained on on GitHub code
instead of text or are there some
differences so the the the system that
we described in the paper is essentially
a GPT train on code it's that
simple the thing that's the thing that's
interesting about it is that it works as
well as it
does because you can say like what what
have you even done you've done nothing
you just took a large neural net and you
train it and code from GitHub
but the result is not bad at
all the abil its ability to it can solve
real coding
problems much better than I think most
people would have
expected and again this comes back to
the the power of deep learning the power
of these neural Nets they don't care
what problem to
solve and you can all kind of say well
you know people can code so why can't
neuron if you believe that in a
biological neuron is not very different
from an artificial one then it's not an
unreasonable belief at all so then the
question becomes what's the training
data you know predicting giab is not
exactly the same as coding so maybe it
won't quite do the right
thing but it turned out to be good
enough and it turned out to be very
useful especially in situations you have
a library which you don't know because
it's right all of GitHub it has such
familiarity with all the major libraries
and if you don't know it but you kind of
just write a comment use this library to
do X you come up with code which is
going to often be correct or pretty
close and then you have something to
work with and you edit it a little bit
and you have something
working but yeah it's just it's just the
GPT training to predict code pretty
well I think in many ways it's really
mind-blowing in terms of potential
societal impact because if I think
about a lot of the the way we create
impact in the world as people we're
often sitting behind a computer right
and we're we're typing things and
whether it's typing emails or or writing
up documents on on work we've been doing
or writing code
um this could really accelerate any
anybody's work and and the kind of
things we could do in one day I don't
know if we're already seeing metrics for
this but I would imagine that you know
if it's not now in the Next Generation
and I'm curious about your thinking you
know what kind of productivity we can
expect from from people thanks to these
tools so I'd say that in the near
term productivity will continue to
increase
gradually I think that as time goes by
and the capability of AI systems
increases productivity will increase
absolutely dramatically I feel very
confident in that we will have we will
witness dramatic increases in
productivity
eventually in the long term a day will
come when the systems will in fact just
the world will be kind of like the AI is
doing all the work and then that work is
given to people to enjoy that what I
think is the long-term
future will hopefully be
life so in the medium term it's going to
be amazing productivity increases and
then in the longterm future it's going
to be like infinite productivity or
fully automated productivity now one of
the things that of course people think
about a lot in that context when you're
giv AI a lot of productivity it better
be productive doing the right thing and
better not be productive I don't know
you know blowing something up by mistake
and so forth or just misunderstanding
what it's supposed to be doing
and in that sense I've been really
curious about this project at open ey
where reinforcement learning is combined
with
GPT can you say a bit more about that
step back so we have these AI systems
that are becoming more and more
powerful and a great deal of their power
is coming from us training them on very
large data sets we don't understand for
which we have an intuitive understanding
of what they do so they learn all kinds
of
things and then they act in ways which
we can inspect but perhaps
not we can inspect but it might be but
and we do have
for these large language models for
example we do have some ability to
control them through the prompts and in
fact the better the language model will
get the more controllable it will become
through the prompt but we want more we
want our models to do exactly what we
want
or act closer to what we want as much as
possible so we had this project indeed
that you alluded to of training these
language models with reinforcement
learning from feedback where now you do
reinforcement learning not against a
simulator but against human judges that
tell you whether the output was
desirable or undesirable and if you
think about it this this reinforcement
learning environment is really exciting
you could even argue that reinforcement
learning has kind of maybe slowed down a
little bit because there weren't really
cool environments in which you could do
it but doing reinforcement learning with
language models and with people that
feels like such it's such a it opens
such a par Vista such a you can do so
many things there and what we've
shown is that these large neural
networks these large GPT models when you
do reinforcement learning from
these from these teachers essentially
and I should also say there is a small
technicality which again this is a
technical thing for the ml focused
subset of the audience in reinforcement
learning you're usually providing reward
good or bad but the way we do it with
reinforcement learning from Human
feedback is
that the the teacher needs to look at
two outputs by the model and to say
which one is better because it's an
easier
task it's an easier task to compare two
things than to say whether one thing is
good or bad in absolute and then we do a
little bit of machine learning in order
to then create a reward out a reward out
of it a reward model and then use this
reward model to train the neural net and
this is a pretty sample efficient thing
to do and you you you obtain a very fine
grained way of controlling the behavior
of these neural networks of these
language models and we've been using it
quite a bit like recently we've trained
we've been training these instruction
following models which actually people
can use through the API through the open
AI API where in
gpt3 the model is just train of the
instrument so you need to be quite
clever about specifying your prompt
specifying your prompt into design into
kind of and getting the model to do what
you want providing some examples whereas
the instruction following model has been
trained in this way to literally do what
we tell it to
so there's a word which I think is known
in some subsets of the machine Learning
Community but not in all of it and it's
called the model is is this this is an
attempt to align the model so that the
model with its power and with great
power and unclear capabilities will in
fact be trained and incentivized to
literally do what you want and with the
instruction following model you just
tell it what you want do X write y
modify Z and it will do it so it's
really convenient to use and this is an
example of the technique of
reinforcement learning from Human
feedback in practice but moving forward
you of course you want to learn from
teachers in all kinds of ways
and you want to use machine learning to
not not just have people you know
provide supervised examples or provide
rewards but you would really want to
have a conversation where you ask
exactly the right question to learn the
information that you need to understand
the concept so that's how things will be
in the future but right now this
approach has been used fairly
successfully to all to make our GPT
models more aligned than than they are
naturally I'm going to say aligned as I
understand it you can also align them in
a personalized way so align to a
specific person's preferences like I
could teach you to follow my preferences
and you could have a different one I
mean So Def the answer is definitely yes
so the specific model that I mentioned
to you the instruction following model
this model it's a single model and it's
been
aligned you know it's been you know we
say it's aligned which is which is
another which is a way to say that it's
been trained and incentivized to follow
the instruction you
g so it it's an interface and it's a
very convenient interface of course it
is possible with these neural Nets they
can do whatever you want you can train
them in literally any way you want you
can personalize them in arbitrary ways
you could say okay for this user you do
this for that user you do that and the
user can be specified with some with the
paragraph or maybe with some of the past
actions
so almost anything is
possible now when you say almost
anything is possible that also reminds
me of a lot of our past conversations
always seems like you know no no limits
to your imagination of what might be
possible and and you know angles to to
try to get there and maybe maybe one of
the other most surprising recent results
is um you know traditionally a lot of
work in computer vision in language
processing in reinforcement learning
kind of SE separate research Arenas
almost but then uh recently you together
with collaborators at op released the
clip and Dolly models that bring
language and vision in some sense
together in into the same network to to
Really
somehow have a single Network that can
handle both at the same time I'm kind of
again I'm curious about you know H how
how do you come
to conclude okay this is the direction
that maybe we should push now maybe it
becomes possible now to have this
combined model that can handle both
vision and language in the same model
and effectively translate between them
as desired well I think the the
underlying motivation here is that it
seems
implausible that the neural networks of
the future will not have both vision and
language
and that was the motivation to begin
thinking in this
direction and as to
whether this should be possible I mean I
think I think at least in my view there
was plenty of evidence
that neural networks who just succeeded
this task if you make it large and you
have an appropriate data set if they can
generate language like they do why can't
they generate the language of images or
going in the other direction as well so
it was
more maybe it's maybe it's good to think
of it as of an exploration of training
neural networks in both images and text
and with di for context di is literally
a
gpt3 that is strained on text followed
by almost like a textual representation
of an image so we use those tokens to
represent an image so that from the
perspective of the model it's just some
kind of a fun
language but it's kind of like you know
you could you can train gpt2 on on on
English text on French text it doesn't
care so what if you just had a different
language which had some human language
and the language of images and that's Di
and it worked exactly as you'd
expect and it was still a lot of fun to
see a neural network generate images
like it did and with clip it was an
exploration in the opposite direction
which is can a neural network learn to
see using a lot of loose natural
language supervision can it learn a huge
variety of visual context Concepts and
can it do so in a way that's very robust
so that you know and I think the
robustness point is something which I
think is you know it's also very
flexible but I I think the robustness
point is is especially important in my
eyes and let me explain what I mean by
robustness so there is one thing which I
think is especially notable and un is
fying in neural networks provision is
that they make these mistakes that a
human being would never make so we we
spoke earlier about the image net data
set and about training neural networks
to recognize the images in this data set
and you'd have neural Nets which achieve
super human performance in this data set
then you put it on your phone and start
taking photos and you would make all
these disappointing mistakes what's
going on and then it turns out that
what's really going on is that there are
all kinds of peculiarities data set
which are hard to notice if you don't
pay close attention and so people have
built all kinds of test sets with the
same objects but from maybe unusual
angles or in a different presentation
for which the image of neural is just
fil but the clip neural network it was
trained on this vast and Loosely labeled
data from the in of his text this neural
network
was able to do well
on all these variants of image was much
more robust to the presentation of the
visual concept and I think this kind of
robustness is very important because
human beings are in in when it comes to
our vision you know a third of our brain
is dedicated to Vision our vision is
unbelievably
good
and and I feel like this is a step
towards making neural Nets a little bit
more robust a little bit more
neural Net who's capability is a little
bit more in line with the capability of
of our own Vision now you say image Net
versus the clip data set um the clip
data set is a lot larger how much larger
is it that I mean what's the difference
in size between those like hundreds of
times larger it has it has open-ended
categories because the categories are
just free form text but it's really kind
of the size but also the coverage and
the variety you need the data set needs
to be verse it needs to have a lot of
stuff in if data set is narrow it will
hurt the neural netwk when I look back
at the last 10 well nineish years right
since um since the image net
breakthrough it seems
like year after year there are new
breakthroughs new capabilities that
didn't exist before many of them thanks
to you IIA and your
collaborators and I'm kind of curious
how do you kind of from looking back at
the last nine years and then as you
project forward know are there some
things that you are particularly excited
about that we can't get to today but
you're hopeful that you know maybe
become feasible in the next few years
yeah so I'd say that there is a sense in
which the Deep learning Saga is actually
a lot older than the past nine years you
know it's funny if you read if you read
some of the statements made by Rosen
blad I think in the 60s so the Rosen
blad invented the perceptron which was
the one of the first neural networks
that could learn something interesting
on a real computer it could learn some
image classification and then Rosen blat
went to onto the New York Times and he
said you know one day a neural network
will see and hear and translate and be
conscious of itself and be your friend
something something like
this and he had he he was trying to
raise money to build in increasingly
larger computers and he had academic
detractors who didn't like the way
funding was misallocated in their mind
and that let to the you know to the
first major neural network
winter and then I think now these ideas
were kind of always there in the
background just that the environment
wasn't ready because you needed both the
data and the compute then as soon as the
data and the compute became ready you
were able to jump on this opportunity
and materialize the progress and I I
fully expect that progress will continue
I think that we will have far more
capable neural networks I think that you
know I don't want to be too specific
about what I think like about what
exactly may happen because it's hard to
predict those things but I would say one
thing which would be nice if is to see
our new oics being even more reliable
than they are being so reliable that you
can really trust their output and when
they don't know something they'll just
tell you and maybe ask for
clarification I think that would be
quite impactful
I think they'll be they will be taking a
lot more action than they are right now
I think our neuron networks is still
quite inert and passive and they'll be
much more useful the usefulness will
continue to
grow
and I mean for sure I I'm totally
certain that we will need some kind of
new ideas even if those new ideas may
have the form of looking at things
differently from the way looking at them
right now and I would argue that a lot
of the major progress in deep learning
has this formed well for example the
most recent progress with on supervised
learning like what what was what was
done what what's different we just train
larger language models but they existed
in the past it just we realized that
language models were were the right
thing all along so I think there will be
more realizations like this where things
that are right in front of our
noses are actually far more powerful and
far more capable than we
expected and yeah I do expect that the
capability of these systems will
continue to increase they will become
increasingly more impactful in the world
it will become a much greater topic of
conversation I think that the product we
will see unbelievable truly unbelievable
applications incredible applications
positive
very like even transformative
applications I think you know we could
we could imagine lots of them with very
powerful Ai and eventually I really do
think that you'll be in a world where
the AI does the work
and we the
people enjoy enjoy this work and we we
use that work to to our to our benefit
and enjoyment you know
this part part of the reason open AI is
a cap profit company where after we
return our obligations to our investors
we turn back into a nonprofit so that we
could help materialize this Future
Vision where you have this useful AI
That's doing all the work and all the
people get to enjoy it
and and that's really beautiful I I like
the model you have there um because it
essentially I mean it reflects
the in some sense the vision that the
benefits of you know really capable AR
could be unlimited and it's not great to
concentrate an unlimited benefit into a
very small group of people because I
mean that's just not not great for the
rest of the world um so love the model
you have there one of the things that
ties into this Zia is um that maybe AI
is also becoming more expensive a lot of
people talk about it that you know
training
models um you want a bigger model is
going to be more capable but then you
know you need the resources to train
those bigger models and I'm really
curious about your thinking on that you
know is is it just going to be you know
the more money the bigger the model the
more capable or is it possible that the
future is different so so there is there
is a huge amount of incentive to
increase the efficiency of our models
and to find ways to do more with less
and this incentive is very strong and it
affects everyone in the field and I
fully expect that in the future we'll be
able to do much more using a fraction of
the cost that we do right now I think
that's just going to happen for sure I
think costing hard will drop I think
methods will become more efficient in
all sorts of ways there are multiple
dimensions of efficiency that a models
could utilize that they
aren't at the same time I also think
that it is true that bigger models will
always be better
and I think it's just a fact of life and
I expect there should be almost like a
kind of a power law of different models
doing different things I think you'll
have very powerful models in small
numbers that are us for certain tasks
then you'd have many more smaller models
that are still hugely useful but and
then you have even more models which are
smaller and more specialized so you have
this kind of Continuum of size
specialization and it's going to be an
ecosystem it's going to be not unlike
how in nature there are animals that
occupy any Niche and so I expect that
the same thing will happen with comput
that for every level of compute there
will be some optimal way of using it and
people will find that way and
create very interesting applications
love your vision IIA
um I think we we actually covered a
tremendous amount already and I'm really
intrigued by everything we cover but
there there is there's one question
that's really still on my mind that I'm
I'm hoping we can uh we can get through
which is um
Helia you've been behind a lot of the
the breakthroughs in AI in the last 10
years even actually even a bit before
that um
and I'm just kind of
curious what what does your day look
like what do you think are some habits
and things in your schedule or or things
you do that help you be creative and
productive it's hard to give useful
blanket advice like this but maybe two
two answers consist of protecting my
time and just trying really
hard you know I don't think I don't
think there is an easy way you need to
just just got to embrace the
suffering and and push through it and
that's and and push through those walls
and that's where the good stuff is
found now when you say protecting your
time which which really resonates of
course
um then you get to choose how you fill
it in and I'm kind of curious if you
just look at let's say maybe you know
the last week or or the week before and
they're like protected time you know
what are you doing are you going on
walks are you reading papers are you
brainstorming with people what what's
going on yeah I'd say I'd say mostly in
my case it would
be not necessarily going in works but
lots of solitary work and yeah there are
people with whom I have very intense
research conversations which are very
important and I think those are those
are the main things I do
I do know that you're also an artist or
you know aspiring artist whatever we
want to call it at the same time do you
think that plays a role at all in in
boosting your
creativity I mean I'm sure it doesn't
hurt so now it's hard hard to know with
these things obviously
but yeah I I think it can only help
Weitere ähnliche Videos ansehen
Heroes of Deep Learning: Andrew Ng interviews Geoffrey Hinton
Season 2 Ep 22 Geoff Hinton on revolutionizing artificial intelligence... again
No Priors Ep. 39 | With OpenAI Co-Founder & Chief Scientist Ilya Sutskever
Ilya Sutskever: Deep Learning | Lex Fridman Podcast #94
Possible End of Humanity from AI? Geoffrey Hinton at MIT Technology Review's EmTech Digital
Highlights of the Fireside Chat with Ilya Sutskever & Jensen Huang: AI Today & Vision of the Future
5.0 / 5 (0 votes)