Geoffrey Hinton | Ilya's AI tutor Talent and intuition have already led to today’s large AI models
Summary
TLDR这段视频脚本记录了一位研究者在英格兰的科研单位经历,以及他在卡内基梅隆大学和爱丁堡大学从事人工智能研究的回忆。他讲述了自己对脑科学、哲学和人工智能的探索,以及与Terry Sejnowski和Peter Brown等人的合作。讨论了神经网络、深度学习、语言模型和多模态学习的发展,以及GPU在神经网络训练中的重要性。他还分享了对人工智能未来的看法,包括模型的推理能力和创造力。
Takeaways
- 🍺 在英国的研究单位,晚上六点大家都会去酒吧喝一杯。
- 🔬 在卡内基梅隆,周六晚上实验室里充满了学生,他们相信自己的工作会改变计算机科学的未来。
- 🧠 剑桥的脑研究课程让人失望,只讲述了神经元如何传导动作电位。
- 📚 读了Donald Hebb和John von Neumann的书后,对AI产生了兴趣。
- 🤔 脑学习的方式并非通过逻辑推理,而是通过修改神经网络中的连接权重。
- 👨🏫 与Terry Sejnowski的合作研究Boltzmann机器最为令人兴奋。
- 🎤 Peter Brown教会了我很多关于语音识别和隐藏马尔可夫模型的知识。
- 🤝 Ilia展示了出色的直觉,他认为只要模型足够大,就会表现得更好。
- 💡 大规模数据和计算使得新的AI算法得以实现。
- 📈 多模态模型将会更加高效,并能更好地理解空间关系。
Q & A
为什么在英格兰的周六晚上9点,实验室里会有很多学生?
-在英格兰的实验室里,学生们之所以在周六晚上9点还聚集在实验室,是因为他们相信他们正在进行的工作是未来,他们所做的下一步将会改变计算机科学的方向。
为什么作者最初在剑桥学习生理学时感到失望?
-作者在剑桥学习生理学时感到失望,因为他们所教授的只是关于神经元如何传导动作电位,而这并没有解释大脑是如何工作的。
作者最终为何选择了去爱丁堡学习人工智能?
-作者选择去爱丁堡学习人工智能是因为他发现至少在人工智能领域,他可以通过模拟来测试理论,这比他在剑桥学习生理学和哲学时的经历要有趣得多。
Donald Hebb的哪本书对作者产生了影响?
-Donald Hebb的书中关于学习如何改变神经元网络中的连接强度的内容对作者产生了很大的影响。
作者在卡内基梅隆大学的主要合作伙伴是谁,他们一起研究了什么?
-作者在卡内基梅隆大学的主要合作伙伴是Terry Sejnowski,他们一起研究了玻尔兹曼机,这是一种他们认为能够解释大脑工作方式的模型。
Peter Brown教授给作者带来了哪些影响?
-Peter Brown教授不仅在统计学方面给予了作者很多指导,还向作者介绍了隐马尔可夫模型,这些对作者后来的研究有着深远的影响。
Ilya Sutskever在作者的办公室提出了什么有洞察力的问题?
-Ilya Sutskever提出了关于为什么不将梯度直接给一个优化函数的问题,这个问题后来花费了他们好几年时间去思考。
为什么作者认为大型神经网络模型能够超越其训练数据?
-作者认为大型神经网络模型能够超越其训练数据,因为它们可以从错误标记的数据中学习并得出更好的结果,这表明它们能够识别训练数据的错误并进行自我修正。
作者如何看待当前的神经网络模型在多模态学习方面的潜力?
-作者认为当神经网络模型能够处理图像、视频和声音等多模态数据时,它们将更好地理解空间关系,从而在理解对象和进行创造性推理方面变得更加强大。
为什么作者认为使用GPU进行神经网络训练是一个好主意?
-作者认为使用GPU进行神经网络训练是一个好主意,因为GPU擅长进行矩阵乘法,这是神经网络训练中的基本操作,使用GPU可以显著加快训练速度。
作者如何看待数字计算与模拟计算在未来的发展?
-作者认为虽然模拟计算可能更接近大脑的工作方式,但数字计算在知识共享和效率方面具有优势,因此数字系统可能在未来的发展中占据主导地位。
Outlines
😀 实验室的夜与AI的启蒙
在英格兰的研究单位,作者回忆了周六晚上实验室里的情景。当时,尽管是休息日,但学生们都在实验室里工作,因为他们相信他们的工作将改变计算机科学的方向。这种氛围与作者之前在英格兰的经历截然不同,给他带来了新的启发。作者还提到了自己在剑桥学习生理学和哲学的经历,以及最终转向爱丁堡大学从事人工智能研究的过程。他对AI的兴趣部分受到了Donald Hebb和John von Neumann的书籍的影响,这些书籍探讨了大脑如何学习和计算。
🤖 AI研究的早期探索与合作
作者讲述了他在卡内基梅隆大学的主要合作经历,特别是与巴尔的摩约翰霍普金斯大学的Terry Sejnowski的合作,他们共同研究玻尔兹曼机。这段合作对作者来说是极具启发性的,尽管后来他们意识到玻尔兹曼机可能并不是大脑工作的真正方式。此外,作者还提到了与统计学家Peter Brown的合作,Peter Brown向他介绍了隐马尔可夫模型,并从他们的交流中互相学习。
👨💻 与Ilya的合作和AI的直觉
作者回忆了与Ilya的合作经历,Ilya是一位对AI有深刻直觉的年轻学生。Ilya对反向传播的理解提出了质疑,认为应该将梯度信息提供给一个更合理的函数优化器,这个观点后来启发了他们多年的思考。作者认为Ilya的直觉和创新能力对AI领域的发展有着重要影响。他们一起进行了许多有趣的项目,包括数据映射和自动编程接口的开发。
🧠 大型语言模型的学习和创新
作者讨论了大型语言模型如何通过寻找共同结构来学习,并预测这些模型将如何变得更加创造性。他提到了AlphaGo的例子,说明了AI如何在特定领域超越现有知识。作者认为,随着模型规模的扩大,它们将能够进行更多的推理,并可能通过自我对弈等方式发展出创新的策略。
🔢 神经网络与GPU的结合
作者分享了他如何最早提出使用GPU来训练神经网络的想法。他的一个学生Rick Zisy建议使用图形处理卡来进行矩阵乘法,这启发了作者。他们最初使用游戏GPU,后来转向了Tesla系统,大大加快了训练速度。作者还在NIPS会议上推荐了Nvidia GPU,并尝试向Nvidia索要免费的GPU作为感谢,虽然没有成功,但后来Jensen给了他一个。作者对数字计算的不朽性表示赞赏,因为它允许知识在不同的硬件之间高效地共享。
🌐 多模态学习与人类思维
作者探讨了多模态学习如何使模型更好地理解空间概念,并提高推理能力。他认为,尽管理论上可以从语言中学习很好的模型,但多模态系统可以使学习变得更容易。此外,作者还讨论了语言与认知的关系,提出了三种不同的语言观点:符号主义、向量空间模型和嵌入模型,并认为嵌入模型可能是人类思维的一个更合理的模型。
Mindmap
Keywords
💡神经网络
💡深度学习
💡反向传播
💡隐藏层
💡嵌入
💡梯度下降
💡逻辑规则
💡多模态
💡强化学习
💡GPU
Highlights
在英格兰的研究单位,晚上六点后大家会去酒吧喝酒,而在卡内基梅隆大学,学生们在周六晚上9点还在实验室工作,因为他们相信他们的工作将改变计算机科学的未来。
作者在剑桥学习生理学时对大脑的理解感到失望,因为所学仅限于神经元如何传导动作电位,并未解释大脑如何工作。
作者转向哲学寻求对心智的理解,但同样感到失望,最终选择去爱丁堡学习人工智能(AI),因为AI可以模拟事物来测试理论。
Donald Hebb的书籍对作者影响很大,特别是关于神经元连接强度如何通过学习改变的理论。
作者早期对AI的直觉是大脑学习的方式肯定不是通过预设的逻辑规则进行推理,而需要找出大脑如何修改神经网络的连接。
作者在卡内基梅隆大学的主要合作是与巴尔的摩约翰霍普金斯大学的Terry Sejnowski,他们共同研究玻尔兹曼机。
作者与统计学家Peter Brown的合作,学习了隐马尔可夫模型,并将其应用于神经网络的隐藏层。
Ilya Sutskever作为一个有直觉和独立思考能力的学生,对作者的研究产生了重要影响。
Ilya的一个关键直觉是,增加模型规模可以使模型工作得更好,这一观点最初被作者认为是逃避问题,但后来被证明是正确的。
2011年,作者、Ilya和James Martins使用字符级预测的方法在Wikipedia上取得了显著的成功。
作者认为,模型预测下一个词或符号不仅仅是预测,而是需要理解之前的内容,这涉及到推理过程。
大型语言模型通过寻找共同结构来编码信息,这种能力使它们能够进行创造性的类比和推理。
AlphaGo的例子显示了通过强化学习,AI可以在特定领域超越当前的科学知识。
作者认为,即使没有强化学习,大型神经网络也有可能通过自我修正错误来超越其训练数据。
多模态模型的引入,如结合视觉、触觉等,将使模型在空间理解方面更加强大。
作者提出,语言可能既进化以适应大脑,大脑也可能进化以适应语言,两者相互影响。
使用GPU进行神经网络训练的早期尝试极大地加速了机器学习的发展,作者在这方面有先见之明。
作者对数字计算的欣赏,以及对模拟计算的探索,表明了计算技术与AI领域的共同进步。
Transcripts
in England at a Research Unit it would
get to be six o'cl and you'd all go for
a drink in the pub um at Caril melon I
remember after I've been there a few
weeks it was Saturday night I didn't
have any friends yet and I didn't know
what to do so I decided I'd go into the
lab and do some programming because I
had a list machine and you couldn't
program it from home so I went into the
lab at about 9:00 on a Saturday night
and it was swarming all the students
were there and they were all there
because what they were working on was
the future they all believed that what
they did next was going to change the
course of computer science and it was
just so different from England and so
that was very refreshing take me back to
the very beginning Jeff at Cambridge uh
trying to understand the brain uh what
was that like it was very disappointing
so I did physiology and in the summer
term they were going to teach us how the
brain worked and it all they taught us
was how neurons conduct Action
potentials which is very interesting but
it doesn't tell you how the brain works
so that was extremely disappointing I
switched to philosophy then I thought
maybe they'd tell us how the mind worked
um that was very disappointing I
eventually ended up going to Edinburgh
to do Ai and that was more interesting
at least you could simulate things so
you could test out theories and did you
remember what intrigued you about AI was
it a paper was it any particular person
that exposed you to those ideas I guess
it was a book I read by Donald Hebb that
influenced me a lot um he was very
interested in how you learn the
connection strengths in neuronet I also
read a book by John Fon neyman early on
um who was very interested in how the
brain computes and how it's different
from normal computers and did you get
that conviction that these ideas would
work out at at that point or what would
was your intuition back at the edur days
it seemed to me there has to be a way
that the brain learns
and it's clearly not by having all sorts
of things programmed into it and then
using logical rules of inference that
just seemed to me crazy from the outset
um so we had to figure out how the brain
learned to modify Connections in a
neural net so that it could do
complicated things and F noyman believed
that churing believed that so F noyman
and churing were both pretty good at
logic but they didn't believe in this
logical approach and what was your split
between studying the
from from
neuroscience and just doing what seem to
be good algorithms for for AI how much
inspiration did you take early on so I
never did that much study of
Neuroscience I was always inspired by
what I'd learned about how the brain
works that there's a bunch of neurons
they perform relatively simple
operations they're nonlinear um but they
collect inputs they weight them and then
they give an output that depends on that
weighted input and the question is how
do you change those weights to make the
whole thing do something good it seems
like a fairly simple question what
collaborations do you remember from from
that time the main collaboration I had
at Carnegie melon was with someone who
wasn't at Carnegie melon I was
interacting a lot with Terry sinowski
who was in Baltimore at Johns Hopkins
and about once a month either he would
drive to Pittsburgh or I would drive to
Baltimore it's 250 miles away and we
would spend a weekend together working
on boltzman machines that was a
wonderful collaboration we were both
convinced it was how the brain worked
that was the most exciting research I've
ever done and a lot of technical results
came out that were very interesting but
I think it's not how the brain works um
I also had a very good collaboration
with um Peter Brown who was a very good
statistician and he worked on speech
recognition at IBM and then he came as a
more mature student to Cary melon just
to get a PhD um but he already knew a
lot he taught me a lot about speech and
he in fact taught me about hidden Markov
models I think I learned more from him
and then he learned from me that's the
kind of student you want and when he
taught me about hidden Markov models I
was doing back propop with hidden layers
only they weren't called hidden layers
then and I decided that name they use in
Hidden Markov models is a great name for
variables that you don't know what
they're up to um and so that's where the
name hidden in neural NS came from me
and P decided that was a great name for
the hidden hidden L of neural Mets um
but I learned a lot from Peter about
speech take us back to when Ilia showed
up at your at your office I was in my
office I probably on a Sunday um and I
was programming I think and there was a
knock on the door not just any knock but
it won't
cutter that's sort of an urgent knock so
I went and answer the door and this was
this young student there and he said he
was cooking Fries over the summer but
he'd rather be working in my lab and so
I said well why don't you make an
appointment and we'll talk and so said
how about now
and that sort of was Ila's character so
we talked for a bit and I gave him a
paper to read which was the nature paper
on back
propagation and we made another meeting
for a week later and he came back and he
said I didn't understand it and I was
very disappointed I thought he seemed
like a bright guy but it's only the
chain rule it's not that hard to
understand and he said oh no no I
understood that I just don't understand
why you don't give the gradient to a
sens a sensible function Optimizer which
took us quite a few years to think about
um and it kept on like that with AIA he
had very good his raw intuitions about
things were always very good what do you
think had enabled those uh those
intuitions for for Ilia I don't know I
think he always thought for himself he
was always interested in AI from a young
age um he's obviously good at math so
but it's very hard to know and what was
that collaboration between the two of
you like what part would you play and
what part with play it was a lot of fun
um I remember one occasion when we were
trying to do a complicated thing with
producing maps of data where I had a
kind of mixture model so you could take
the same bunch of similarities and make
two maps so that in one map Bank could
be close to Greed and in another map
Bank could be close to River um because
in one map you can't have it close to
both right because River and greed along
way so we'd have a mixture maps and
we were doing it in mat lab and this
involved a lot of reorganization of the
code to do the right Matrix multiplies
and then got fed up with that so he came
one day and said um I'm going to write a
an interface for matlb so I program in
this different language and then I have
something that just converts it into
Matlab and I said no Ilia um that'll
take you a month to do we've got to get
on with this project don't get diverted
by that and said it's okay I did it this
morning and that's that's quite quite
incredible and throughout those those
years the biggest shift wasn't
necessarily just the the algorithms but
but also the the skill how did you sort
of view that skill uh over over the
years Ilia got that intuition very early
so Ilia was always preaching that um you
just make it bigger and it'll work
better and I always thought that was a
bit of a copout that you're going to
have to have new ideas too it turns out
I was basically right new ideas help
things like Transformers helped a lot
but it was really the scale of the data
and the scale of the computation and
back then we had no idea computers would
get like a billion times faster we
thought maybe they' get a hundred times
faster we were trying to do things by
coming up with clever ideas that would
have just solved themselves if we had
had bigger scale of the data and
computation in about
2011 Ilia and another graduate student
called James Martins and I had a paper
using character level prediction so we
took Wikipedia and we tried to predict
the next HTML character and that worked
remarkably well and we were always
amazed at how well it worked and that
was using a fancy Optimizer on
gpus and we could never quite believe
that it understood anything but it
looked as though it
understood and that just seemed
incredible can you take us through how
are do models Trend to predict the next
word and
why is it the wrong way of of thinking
about them okay I don't actually believe
it is the wrong way so in fact I think I
made the first neuronet language model
that used embeddings and back
propagation so it's very simple data
just
triples and it was turning each symbol
into an embedding then having the
embeddings interact to predict the
embedding of the next symbol and then
from that predict the next symbol and
then it was back propagating through
that whole process to learn these
triples and I showed it could generalize
um about 10 years later yosua Benji used
a very similar Network and showed it
work with real text and about 10 years
after that linguist started believing in
embeddings it was a slow process the
reason I think it's not just predicting
the next symbol is if you ask well what
does it take to predict the next symbol
particularly if you ask me a question
and then the first word of the answer is
the next symbol um you have to
understand the question so I think by
predicting the next
symbol it's very unlike old fashioned
autocomplete oldfashioned autocomplete
You' store sort of triples of words and
then if you store a pair of words you
see how often different words came third
and that way you can predict the next
symbol and that's what most people think
autocomplete is like it's no longer at
all like that um to predict the next
symbol you have to understand what's
been said so I think you're forcing it
to understand by making it predict the
next symbol and I think it's
understanding in much the same way we
are so a lot of people will tell you
these things aren't like us um they're
just predicting the next symbol they're
not reasoning like us but actually in
order to predict the next symbol it's
have going to have to do some reasoning
and we've seen now that if you make big
ones without putting in any special
stuff to do reasoning they can already
do some reasoning and I think as you
make them bigger they're going to be
able to do more and more reasoning do
you think I'm doing anything else than
predicting the next symbol right now I
think that's how you're learning I think
you're predicting the next video frame
um you're predicting the next sound um
but I think that's a pretty plausible
theory of how the brain is learning what
enables these models to learn such a
wide variety of of fields what these big
language models are doing is they're
looking for common structure and by
finding common structure they can encode
things using the common structure and
that's more efficient so let me give you
an example if you ask G before why is a
compost heap like an atom
bomb most people can't answer that most
people haven't thought they think atom
bombs and compost he are very different
things but gp4 will tell you well the
energy scales are very different and the
time scales are very different but the
thing that's the same is that when the
compost Heep gets hotter it generates
heat faster and when the atom bomb
produces more NE neutrons it produces
more neutrons faster and so it gets the
idea of a chain reaction and I believe
it's understood they're both forms of
chain reaction and it's using that
understanding to compress all that
information into its weights and if it's
doing that then it's going to be doing
that for hundreds of things where we
haven't seen the analogies yet but it
has and that's where you get creativity
from from seeing these analogies between
apparently very different things and so
I think gp4 is going to end up when it
gets bigger being very creative I think
this idea that it's just regurgitating
what it's learned just pasing together
text it's learned already that's
completely wrong it's going to be even
more creative than people I think you'd
argue that it won't just repeat the
human knowledge we've developed so far
but could also progress beyond that I
think that's something we haven't quite
seen yet we've started seeing some
examples of it but to a to a large
extent we're sort of still at the
current level of of of science what do
you think will enable it to go beyond
that well we've seen that in more
limited context like if you take Alpha
go in that famous competition with liso
um there was move 37 where Alpha go made
a move that all the experts said must
have been a mistake but actually later
they realized it was a brilliant move um
so that was created within that limited
domain um I think we'll see a lot more
of that as these things get bigger the
difference with u alphao as well was
that it was using reinforcement learning
that that subsequently sort of enabled
it to to go beyond the current state so
it started with imitation learning
watching how humans play the game and
then it would through selfplay develop
Way Beyond that do you think that's the
missing component of the current I think
that may well be a missing component yes
that the the self-play in Alpha in Alpha
go and Alpha zero are are a large part
of why it could make these creative
moves but I don't think it's entirely
necessary so there's a little experiment
I did a long time ago where you your
training in neuronet to recognize
handwritten digits I love that example
the Mist example and you give it
training data where half the answers are
wrong um and the question is how well
will it
learn and you make half the answers
wrong once and keep them like that so it
can't average away the wrongness by just
seeing the same example but with the
right answer sometimes and the wrong
answer sometimes when it sees that
example half for half of the examples
when it sees the example the answer is
always
wrong and so the training data has 50%
error but if you train up back
propagation it gets down to 5% error or
less in other words from badly labeled
data it can get much better results it
can see that the training data is wrong
and that's how smart students can be
smarter than their advisor and their
advisor tells them all this stuff and
for half of what their advisor tells
them they think no rubbish and they
listen to the other half and then they
end up smarter than the advisor so these
big neural Nets can actually do they can
do much better than their training data
and most people don't realize that so
how how do you expect these models to
add reasoning in into them so I mean one
approach is you add sort of the
heuristics on on top of them which a lot
of the research is doing now where you
have sort of Shan of thought you just
feedback its reasoning um in into itself
and another way would be in the model
itself as you scale scale scale it up
what's your intuition around that so my
intuition is that as we scale up these
models I get better at reasoning and if
you ask how people work roughly speaking
we have these
intuitions and we can do reasoning and
we use the reasoning to correct our
intuitions of course we use the
intuitions during the reasoning to do
the reasoning but if the conclusion of
the reasoning conflicts with our
intuitions we realize the intuitions
need to be changed that's much like in
Alpha go or Alpha zero where you have an
evaluation function um that just looks
at a board and says how good is that for
me but then you do the Monte Cara roll
out and now you get a more accurate idea
and you can revise your evaluation
function so you can train it by getting
it to agree with the results of
reasoning and I think these large
language models have to start doing that
they have to start training their raw
intuitions about what should come next
by doing reasoning and realizing that's
not right and so that way they can get
more training data than just mimicking
what people did and that's exactly why
alphago could do this creative move 37
it had much more training data because
it was using reasoning to check out what
the right next move should have been and
what do you think about multimodality so
we spoke about these analogies and often
the analogies are Way Beyond what we
could see it's discovering analogies
that are far beyond humans and at maybe
abstraction levels that will never be be
able to to to understand now when we
introduce images to that and and video
and sound how do you think that will
change the models and uh how do you
think it will change the analogies that
it will be able to make um I think it'll
change it a lot I think it'll make it
much better at understanding spatial
things for example from language alone
it's quite hard to understand some
spatial things although remarkably gb4
can do that even before it was multi mod
Al um but when you make it multimodal if
you have it both doing vision and
reaching out and grabbing things it'll
understand object much better if it can
pick them up and turn them over and so
on so although you can learn an awful
lot from language it's easier to learn
if you multimodal and in fact you then
need less language and there's an awful
lot of YouTube video for predicting the
next frame so or something like that so
I think these multimodule models are
clearly going to take over um you can
get more data that way they need less
language so there's really a
philosophical point that you could learn
a very good model from language alone
but it's much easier to learn it from a
multimodal system and how do you think
it will impact the models reasoning I
think it'll make it much better at
reasoning about space for example
reasoning about what happens if you pick
objects up if you actually try picking
objects up you're going to get all sorts
of training data that's going to help do
you think the human brain evolved to
work well with with language or do you
think language evolved to work well with
the human brain I think the question of
whether language evolved to work with
the brain or the brain evolved to work
with the language I think that's a very
good question I think both happened I
used to think we would do a lot of
cognition without needing language at
all um now I've changed my mind a bit so
let me give you three different views of
language um and how it relates to
cognition there's the oldfashioned
symbolic view which is cognition
consists of having strings of symbols in
some kind of cleaned up logical language
where there's no ambiguity and applying
rules of inference and that's what
cognition is it's just these symbolic
manipulations on things that are like
strings of language symbols um so that's
one extreme view an opposite extreme
view is no no once you get inside the
head it's all vectors so symbols come in
you convert those symbols into big
vectors
and all the stuff inside's done with
being vectors and then if you want to
produce output you produce symbols again
so there was a point in machine
translation in about
2014 when people were using neural
recurrent neural Nets and words will
keep coming in and that have a hidden
State and they keep accumulating
information in this hidden state so when
they got to the end of a sentence that
have a big hidden Vector that capture
the meaning of that sentence that could
then be used for producing the sentence
in another language that was called a
thought fact vector and that's a sort of
second view of language you convert the
language into a big Vector that's
nothing like language and that's what
cognition is all about but then there's
a third view which is what I believe now
which is that you take these
symbols and you convert the symbols into
embeddings and you use multiple layers
of that so you get these very rich
embeddings but the embeddings are still
tied to the symbols in the sense that
you've got a big Vector for this symbol
and a big Vector for that
symbol and these vectors interact to
produce the vector for the symbol for
the next word and that's what
understanding is understanding is
knowing how to convert the symbols into
these vectors and knowing how the
elements of the vector should interact
to predict the vector for the next
symbol that's what understanding is both
in these big language models and in our
brains and that's an example which is
sort of in between you're staying with
the symbols but you're interpreting them
as these big vectors and that's where
all work is and all the knowledge is in
what vectors you use and how the
elements of those vectors interact not
in symbolic rules um but it's not saying
that you get away from the symbols Al
together it's saying you turn the
symbols into big vectors but you stay
with that surface structure of the
symbols and that's how these models are
working and that's now seem to me a more
plausible model of human thought too you
were one of the first folks to get the
idea of using gpus and I know Jensen
loves you uh for that uh back in 2009
you mentioned that that you told Yensen
that this could be a quite good idea um
for for training training neural Nets
take us back to that early intuition of
of using gpus for for training neural
Nets so actually I think in about
2006 I had a former graduate student
called Rick zisy who's a very good
computer vision guy and I talked to him
at a meeting and he said you know you
ought to think about using Graphics
processing cards because they're very
good at Matrix multiplies and what
you're doing is basically all Matrix
multiplies so I thought about that for a
bit and then we learned about these
Tesla systems that had um four gpus in
and initially we just got um gaming gpus
and discovered they made things go 30
times faster and then we bought one of
these Tesla systems with 4 gpus and we
did speech on that and it worked very
well and then in 2009 I gave a talk at
nips and I told a thousand machine
learning researchers you should all go
and buy Nvidia gpus they're the future
you need them for doing machine learning
and I actually um then sent mail to
Nvidia saying I told a thousand machine
learning researchers to buy your boards
could you give me a free one and they
said no actually they didn't say no they
just didn't reply um but when I told
Jensen this story later on he gave me a
free
one that's uh that's very very good I I
think what's interesting is um as well
is sort of how gpus has evolved
alongside the the field so where where
do you think we we should go go next in
in the in the computer so my last couple
of years at Google I was thinking about
ways of trying to make analog
computation so instead of using like a
megawatt we could use like 30 Watts like
the brain and we could run these big
language models in analog hardware and I
never made it
work and but I started a really
appreciating digital computation so if
you're going to use that low power
analog
computation every piece of Hardware is
going to be a bit different and the idea
is the learning is going to make use of
the specific properties of that hardware
and that's what happens with people all
our brains are different um so we can't
then take the weights in your brain and
put them in my brain the hardware is
different the precise properties of the
individual neurons are different the
learnings you to make has learned to
make use of all that and so we're mortal
in the sense that the weights in my
brain are no good for any other brain
when I die those weights are useless um
we can get information from one to
another rather
inefficiently by I produce sentences and
you figure out how to change your weight
so you would have said the same thing
that's called distillation but that's a
very inefficient way of communicating
knowledge and with digital systems
they're Immortal because once you got
some weights you can throw away the
computer just store the weights on a
tape somewhere and now build another
computer put those same weights in and
if it's digital it can compute exactly
the same thing as the other system did
so digital systems can share weights and
that's incredibly much more efficient if
you've got a whole bunch of digital
systems and they each go and do a tiny
bit of
learning and they start with the same
weights they do a tiny bit of learning
and then they share their weights again
um they all know what all the others
learned we can't do that and so they're
far superior to us in being able to
share knowledge
Weitere verwandte Videos ansehen
![](https://i.ytimg.com/vi/-eyhCTvrEtE/hq720.jpg)
Heroes of Deep Learning: Andrew Ng interviews Geoffrey Hinton
![](https://i.ytimg.com/vi/n4IQOBka8bc/hq720.jpg)
Geoffrey Hinton | On working with Ilya, choosing problems, and the power of intuition
![](https://i.ytimg.com/vi/kuvFoXzTK3E/hq720.jpg)
Prof. Chris Bishop's NEW Deep Learning Textbook!
![](https://i.ytimg.com/vi/qJZ_1a-t_sA/hq720.jpg)
6. Layers of Mental Activities
![](https://i.ytimg.com/vi/mG31I9mfVLU/hq720.jpg)
【人工智能】直觉的力量 | 杰弗里辛顿最新对话 | Sana AI峰会 | 回忆AI生涯 | Ilya的能力和直觉 | 缩放法则 | 多模态 | 语言与认知 | 神经网络 | AI情感 | 反向传播
![](https://i.ytimg.com/vi/qTogNUV3CAI/hq720.jpg)
Demis Hassabis - Scaling, Superhuman AIs, AlphaZero atop LLMs, Rogue Nations Threat
5.0 / 5 (0 votes)