Geoffrey Hinton: The Foundations of Deep Learning
Summary
TLDR本视频脚本介绍了神经网络和反向传播算法的基本原理及其在多个领域的应用。首先,解释了反向传播算法的工作原理,它通过模拟进化过程调整网络权重来优化性能。然后,通过实例展示了算法在图像识别、语音识别和机器翻译等方面的成功应用。特别提到了多伦多大学研究生如何利用神经网络改进语音识别系统,并最终超越了当时的技术水平。此外,还讨论了卷积神经网络在图像识别中的巨大成功,以及循环神经网络在处理序列数据时的优势。最后,探讨了神经网络在未来的潜力,包括在医学图像分析中超越放射科医师的可能性,以及在药物设计中预测分子活性的潜力。整个脚本强调了神经网络在现代人工智能中的重要性和其解决复杂问题的能力。
Takeaways
- 🤖 神经网络通过学习算法模拟人脑网络,通过调整权重来解决问题,而不需要明确编程。
- 📈 反向传播算法是神经网络训练的核心,通过计算损失函数的梯度来更新网络权重。
- 🔍 神经网络在图像识别和语音识别上取得了突破性进展,超越了传统算法。
- 🎓 多伦多大学的研究生通过将神经网络应用于语音识别,显著提高了识别性能。
- 📚 神经网络在处理大量标记数据时表现出色,尤其是在有足够的计算能力时。
- 📈 神经网络在图像识别竞赛中取得了压倒性的胜利,错误率远低于传统计算机视觉系统。
- 🔁 循环神经网络(RNN)适合处理序列数据,如语言和视频,通过自身循环结构记忆信息。
- 🌐 机器翻译领域,神经网络通过编码器-解码器模型,将一种语言翻译成另一种语言。
- 📖 神经网络在医学图像分析中显示出超越放射科医师的潜力,能够更准确地诊断疾病。
- 🧠 神经网络的训练不需要大量的语言学知识,而是依赖于大量的标记数据。
- 🏆 神经网络在药物分子结合预测竞赛中获胜,展示了其在化学领域的应用潜力。
Q & A
什么是反向传播算法?
-反向传播算法是一种在神经网络中用于训练和优化权重的算法。它通过计算损失函数关于网络中每个权重的梯度,并利用这些梯度来更新权重,以减少网络的预测误差。
为什么传统的编程方式在某些问题上不如神经网络有效?
-传统的编程方式需要人类程序员详细地告诉计算机如何执行任务,这在复杂或者人类自身也不太了解的问题上非常困难。而神经网络通过学习算法和大量示例数据,可以自行发现解决问题的方法,无需人类详细编程。
人工神经元是如何工作的?
-人工神经元接收来自传感器或其他神经元的输入信号,每个输入信号都有一个权重,可以是正数或负数。神经元将输入值与权重相乘,然后求和,得到总输入。然后,它输出一个非线性函数的结果,该函数是总输入的函数。如果总输入不够大,神经元不会产生输出;一旦总输入超过某个阈值,神经元开始响应,并且随着总输入的增加,输出也会增加。
为什么说反向传播算法在20世纪90年代被放弃了?
-在20世纪90年代,由于数据集相对较小,其他算法表现得更好,而且可以对这些算法进行数学证明。反向传播算法在当时没有证明其有效性,并且不同人运行相同的反向传播算法会得到不同的结果,这使得机器学习领域的研究人员对其失去了兴趣。
为什么近年来反向传播算法又变得流行起来?
-近年来,随着大数据和计算能力的提升,反向传播算法在大量标记数据和强大计算资源的情况下表现出色。此外,一些技术上的进步,特别是在多伦多和蒙特利尔的研究人员所做的工作,使得反向传播算法在处理图像和语音识别等任务时取得了显著的成果。
神经网络在图像识别中是如何工作的?
-神经网络通过多层的隐藏单元处理图像中的像素数据。每一层都会提取图像的不同特征,并将这些特征传递到下一层。通过这种方式,神经网络能够识别图像中的物体,并将数百万的像素值转换成描述图像内容的文字。
什么是递归神经网络(RNN)?
-递归神经网络是一种专门用于处理序列数据的神经网络。它们通过在网络中引入循环来保持对先前信息的记忆,这使得RNN能够处理如时间序列数据、自然语言等具有序列依赖性的任务。
机器翻译是如何利用神经网络实现的?
-机器翻译使用神经网络首先将一种语言的句子编码成一个高维的“思想”向量,然后使用解码器网络将这个向量翻译成另一种语言的句子。这种方法不需要任何语言学知识,而是完全基于数据驱动的学习方法。
什么是注意力机制?
-注意力机制是一种在神经网络中增加的模块,它允许网络在生成翻译或执行其他任务时,能够聚焦于输入数据的特定部分。这提高了翻译的准确性,并且使得网络能够在训练时更有效地学习。
神经网络在医学图像分析中的应用前景如何?
-神经网络在医学图像分析中有巨大的应用潜力。它们能够通过学习大量的医学图像数据来识别和诊断疾病,预计不久将能够超越放射科医生的诊断能力。
神经网络在药物发现领域有何应用?
-神经网络可以用于预测分子是否会与特定的药物靶标结合,这对于药物发现非常重要。通过分析分子的结构和属性,神经网络可以帮助制药公司在不实际合成分子的情况下筛选出有潜力的候选药物。
Outlines
🤖 神经网络与反向传播算法简介
本段介绍了神经网络的基础概念和反向传播算法。首先,讲解了传统的编程方式与现代学习算法的对比,强调了通过示例而非详细指令来训练计算机的新方法。随后,通过图像识别的例子展示了神经网络的能力,说明了人工神经元的工作原理,包括输入线、权重以及如何通过改变权重来适应。最后,讨论了如何通过进化算法调整网络中的连接,并引入了反向传播算法,这是一种利用微积分高效调整权重的方法。
📈 反向传播算法的复兴与应用
这段内容讲述了反向传播算法在20世纪90年代的低谷期以及后来的复兴。尽管反向传播算法曾被放弃,因为当时数据集较小且存在其他更好的算法,但随着数据量和计算能力的增加,它开始展现出惊人的效果。文中提到了多伦多大学学生将算法应用于语音识别并取得突破的案例,以及随后在Android系统中的使用。此外,还提到了两名学生在图像识别领域取得的成就,以及反向传播算法在计算机视觉和语音识别领域的广泛应用。
🔄 循环神经网络与序列处理
本段介绍了循环神经网络(RNN)的工作原理和应用。循环神经网络特别适合处理序列数据,如语音或视频。通过将输入数据(如单词或图像帧)提供给网络,并让隐藏单元之间的连接形成记忆,网络能够积累信息。然后,使用反向传播算法来训练网络,通过比较网络的输出和期望的正确答案来调整权重。文中还提到了使用RNN进行机器翻译的进展,包括编码器网络将一种语言的句子转换为“思想”,然后解码器网络将这个“思想”转换为另一种语言的句子。
🌐 思想、语言和机器翻译
这段内容深入探讨了机器翻译的工作原理,特别是编码器-解码器模型如何将一种语言翻译成另一种语言。文章指出,传统上认为语言翻译涉及将一种符号串转换为另一种符号串的观点是错误的,真正的翻译过程更为复杂。介绍了如何通过神经网络将单词转换为向量表示,并将这些向量组合成“思想”。然后,解码器网络基于这些“思想”生成目标语言的句子。此外,还讨论了谷歌翻译使用的技术,以及如何通过训练神经网络来改进翻译质量。
📚 注意力机制与多模态学习
本段讲述了注意力机制如何改进神经网络的性能,尤其是在机器翻译中。通过让网络在生成目标语言时回顾源语言,可以提高翻译的准确性。此外,还提到了使用不同大小的语言片段进行翻译的效果,以及使用字母或汉字的位图作为输入时,神经网络能够学习到汉字的结构。文章还探讨了将图像识别与语言生成结合起来的可能性,即通过图像得到“感知”,然后生成描述图像的文本。最后,提出了未来神经网络可能需要的参数数量,以及它们在医学图像分析等领域的潜在应用。
🧠 神经网络的未来与挑战
这段内容讨论了神经网络在医学图像分析中的潜力,以及它们可能很快就会超越放射科医生的诊断能力。文章指出,即使医生之间的诊断存在分歧,神经网络仍然能够通过训练学习并提供更准确的诊断。最后,通过一个关于药物分子结合预测的故事,强调了神经网络在没有领域知识的情况下也能取得显著成果的能力,以及它们在未来可能面临的挑战和机遇。
Mindmap
Keywords
💡反向传播算法
💡神经网络
💡卷积神经网络
💡递归神经网络
💡机器翻译
💡图像识别
💡语音识别
💡深度学习
💡梯度下降
💡权重
💡激活函数
Highlights
介绍了反向传播算法的基本原理,并解释了如何通过改变神经网络中的权重来实现学习。
通过展示神经网络如何将图像像素转换为描述性文字,说明了神经网络的强大能力。
讨论了人工神经元的工作原理,包括输入线、权重以及非线性函数的输出。
通过一个小批量的训练样本来调整权重,展示了如何逐步改进神经网络的性能。
利用微积分优化了权重调整过程,使得可以同时对所有权重进行高效调整。
尽管反向传播在20世纪90年代一度被放弃,但随着数据集的增大和计算能力的提升,它重新变得有效。
多伦多大学的研究生将反向传播算法应用于语音识别,取得了突破性的进展。
通过展示神经网络在图像识别和语音识别上的应用,说明了其在模式识别领域的潜力。
介绍了循环神经网络(RNN)的基本概念,以及它们在处理序列数据方面的优势。
描述了如何使用RNN进行机器翻译,将一种语言的句子转换为另一种语言。
Google Translate使用的是基于RNN的系统,它通过学习大量的翻译数据来提高翻译质量。
提出了“注意力”机制,它允许翻译系统在生成目标语言时回顾源语言的结构,从而提高翻译的准确性。
展示了如何将图像识别网络与翻译系统集成,以生成图像的描述性文字。
讨论了神经网络在医学图像分析中的潜力,以及它们可能超越放射科医师的可能性。
通过一个关于药物分子结合预测的故事,展示了神经网络在化学领域的应用潜力。
强调了即使在标签数据不完全准确的情况下,神经网络也能通过学习来超越训练数据的限制。
提出了神经网络可能需要与大脑相当的参数数量,以达到类似的处理能力。
讨论了神经网络在自然语言处理和图像处理方面的最新进展,以及它们对未来技术的影响。
Transcripts
[Music]
[Applause]
I'm going to talk about some basic share
and I imagine the some people in the
audience who don't really have a good
grip of what the backpropagation
algorithm is so I'm actually going to
explain that very briefly so you know
what we're talking about and now I'm
sure a few examples of what it can do
and these are all things that are now a
little bit out of date so if you want a
computer to do something the old way to
do it is to write a program that is you
figure out how you do it yourself and
its squizz it detail you tell the
computer exactly what to do and the
computers like you but faster the new
way is you tell the computer to pretend
to be in your network with a learning
algorithm in it that's programming but
then after that if you want to solve a
particular problem you just show
examples so suppose you want to solve
the problem of I give you all the pixels
in the image
that's three numbers per pixel for the
color there's like let's say a million
of them and you have to turn those three
million numbers into a string of words
that says what's in the image that's a
tricky program to write people tried in
air for 50 years and didn't even come
close but now a neural net can just do
it and I'll show you how it does that
that is we have no idea how to write
that program but a neural net can do it
so we're gonna make our neural net out
of artificial neurons an artificial
neuron is going to have some input lines
that come from the sensors or other
neurons on each input line there's going
to be a weight that could be positive or
negative and it's going to adapt by
changing the strengths of those weights
and the way it behaves is that it takes
the values on the input lines multiplies
each value by weight adds it all up and
that's this total input and then it
gives an output that's a nonlinear
function of that total input and the
function is shown on the right if the
total input isn't big enough it stays
silent as soon as the total input gets
bigger than that it starts giving a
response it gets bigger as the total
input gets bigger for 30 years we used a
different kind of neuron that didn't
work as well and then we tried this one
and this works better
that gives you some idea about the state
of the art in the field there's very
simple changes that just make things
weren't much better than people haven't
explored ok we're going to hook those
guys up into networks with multiple
layers and we're going to learn the
connections on the inputs to the neurons
in all the layers and so the problem
solved now all we need to do is figure
out how to adapt those connections and
we're done because these networks can do
anything so it's just a question of
changing the connections and there's a
very interesting and simple algorithm
that occurs to anybody who believes in
evolution which hopefully is most of you
what you do is you take one of the
connections you take a small batch of
training examples
a typical batch you run it through the
network and you see how well the network
does so how similar are the outputs of
the network to the outputs that you
think are the correct answers on this
training data and then you change that
one wait and then you run this batch
through again and you see if that
improves things if it improves things
you keep the change if it doesn't
improve things you don't keep the change
you leave it like it was that's it and
that algorithm works that's a very naive
view of evolution but it works it's how
many mutations you're doing and if you
just do that long enough you'll get a
network that does good things the
problem is that I have to run a whole
bunch of examples through the network
actually twice for each weight and I
might have a billion weights so what we
want we want to do that algorithm that's
the basic algorithm you're going to
tinker with weights and just keep the
tinkers of change but it's hard to do it
efficiently and so we're now going to
use some calculus and do the same thing
efficiently what we do is because we
know the weights in the network or your
brain knows the weights in your brain we
don't actually have to tinker with the
weight and see the effect we can imagine
tinkering with the weight and figure out
what the effect would be if you know all
the weights in the network you can say
if I were to change this the output
would change this way and that would be
good
so therefore I want to change this so
what we can do is look at the
discrepancy in the output and from the
discrepancy between what we want and
what we got we can send information
backwards through the net that's doing
this computation for every weight of how
a small change in that way would affect
the output how a small increase in that
weight would improve or make worse the
output and we can do that for all the
weights in parallel so in the same
amount of time is the mutation algorithm
can figure out what shoe is one wage we
can figure out what to do with all the
weights and if there's a million weights
that's a million times more efficient
when a million times more efficient
enough to make a difference and
backpropagation had great promise but by
the 1990s people in machine learning
given up on it because they had
relatively small data sets and other
algorithms worked better
and what's more you could prove things
about these other events with back
propagation you couldn't prove it would
work and what's more when different
people ran it they got different answers
and so if you're obsessed with only
being one correct answer and you're
being able to prove you get it back
propagation is not for you nor is life
actually one of the reasons people lost
interest is that it doesn't work so well
the the naive algorithm didn't work well
in GP networks and it didn't work in
recurrent networks which are explained
in a minute so then a few technical
advances were made in Canada in by
Canada I mean Toronto and Montreal in
New York and we're very concerned about
those details of those advances but
that's minor February else the main
message is if we give you a lot of label
data and a lot of compute power back
propagation now works amazingly well and
the rest of the talk would be trying to
convince you of that so here was the
first practical problem that it made a
big impact on that's not quite fair but
since this was done in Toronto I'll
pretend with us actually for handwritten
digit recognition it made a big impact
but people said that's an easy problem
by a speech recognition is a tough
problem so a couple of graduate students
at the University of Toronto
took an algorithm that I'd been working
on and applied to speech recognition and
you take some coefficients that describe
the sound wave you put it through
multiple layers of hidden units and
these were lots of hidden units so there
are only a few million training examples
and between each pair of layers there's
four million parameters that because
it's fully connected so a statistician
if you do statistics 101 you will know
that this cannot possibly work because
there's many more parameters than our
training examples so it's crazy if you
want a critique statistics 101 you might
say in your lifetime you make about 10
billion fixations and you have about
10,000 times more synapses than that so
you have about 10,000 synapses for each
fixation you make so you don't satisfy
statistics 101 either um okay so we
trained this up on or they trained it up
on sound waves trying to predict which
piece of which phoneme was being said so
imagine a little bit of a spectrogram
which is essentially what the bottom is
and you're looking at this piece of the
vector and saying in the middle of the
spectrogram which piece of which phoneme
is this guy trying to say and you get a
probabilistic answer and then you take
all those probability answers and you
string together with something else to
find a plausible utterance nowadays you
don't do that nowadays what you do is
you just put sine waves in and you get
the transcription out and the only thing
is a neural network so recurrent neural
network but back then this was just the
front end of the system we replaced the
front end of speech recognizers with
this and it worked better it worked just
a little bit better but good speech
people particularly down at Microsoft
realized right away that if this works a
little bit better and to graduate
students did in a few months he's going
to completely wipe out the existing
state-of-the-art
and indeed over the next couple years it
did so an avid DJ Taniya grad student of
Toronto he wanted to go to rim and take
this technology to rim he really wanted
to do that and I talked to him and they
said they weren't interested in speech
recognition I I don't know what became
of them um
so by 2012 Google was using it in the
Android and that was the first there was
a big increase in speech recognition
performance sense it suddenly got better
than Siri now everybody's using this
algorithm but more updated versions of
it and all the best speech recognition
systems when I trained with
backpropagation in a neural net and some
are just end to end in some solar system
there's nothing else you just trained it
on data all that how you pronounce
things and what the words are and all
that I forget it and that we'll learn
all that then in 2012 two more graduate
students of mine so the trick to all
this is you have to always get graduate
students who are smarter than you
there's no point having a graduate
student dumber than you because you
could have done that so two other
graduate students Ilya sutskever
who recently got given a billion dollars
by open AI to run and laugh um she's
slightly depressing because that's a lot
more than I ever got and Aleks Reshevsky
they took the image net competition
where there were a million images and a
thousand of each class and you had to
recognize subjects in that class and it
was a public competition with a secret
test set so you couldn't cheat and the
person who ran our system on the test
set told me I met him at a conference he
told me he didn't believe the results so
I'm back and ran it again he still
didn't believe the results he had to run
it three times before he believed the
performance results because they were so
much better than anybody else's so
here's the results in 2012 all of the
conventional computer vision systems
that didn't use neural nets had
plateaued at about twenty five percent
error rate and our system I'm almost
half that and as with speech recognition
as soon as people switched then you got
thousands of smart graduate students and
thousands of experienced developers and
so on making this work really well and
by 2015 we'd reach on that data set
people reach human levels one hero
called andrew capacity
actually did the task himself which took
a lot of time and got 5% error and now
they're down below 3% error and so it's
a tenth of the error rate of the
computer vision systems now of the
previous computer vision systems so this
made a big impact
partly because speech' worked already
but people thought that was a niche the
speech worked first because they were
the guys who had big labeled data sets
when this worked people got all excited
and it was very good for IP lawyers okay
so these are examples of the kinds of
images and notice they're not images
that have one nicely centered object in
canonical view point most of the
teachers missing and the red bar is what
the system thought was his best bet it
gets told it's right if he in the top
five bets it gets right ants because
it's not always clear what the right
answer is but you'll notice the bullet
train it gets right even there's only a
small fraction of the image the hand
glass it gets wrong it thinks it's
scissors if you look at the other things
that thinks it is it thinks it might be
a stethoscope or a frying pan and you
can see why it thinks that and you can
see that it needs glasses but the point
is it's got the visual appearance of
something if you look at it's wrong
answers they tell you more than looking
at the right answers
now I'm going to go on to recurrent Nets
so these fie forward Nets were very good
at recognizing a phoneme in speech and
recognizing object in an image but for
dealing with sequences you want a
recurrent net and the kinds of recurrent
Nets people use now are based on work by
a hawk writer and schmidt hoop in 1997
that I'm not going to explain and I'm
going to simplify them I'm going to
pretend to you these recurrent Nets are
simpler than they are because you really
don't want to it would be nice if they
were this simple but they're not okay um
so here's how our current net works it
has a bunch of input neurons not just
one like it shows here but a bunch and
that's representing the data of regular
time so it might represent a word in a
sentence it might work represent an
image in a video that's the input it has
a bunch of hidden neurons and these
hidden neurons connect to themselves so
if you look at the second time slice
here the second column and look at that
middle unit it's getting input it'll get
some input from the input to the system
which might be the video frame of the
word it'll also get input from the
previous state of all the hidden neurons
so it's remembering and accumulating
information and you can train this same
thing with backpropagation what you do
is you feed it the inputs and the hidden
units of accumulated information when
you get to the end you see if they can
produce the right answer and if they
can't you back propagate information so
you just go backwards through all those
arrows and one thing you'll notice about
those arrows is they form a directed
acyclic graph that is you cannot go
around in a circle following the arrows
and that means you can do back
propagation you can go backwards without
everyone getting in or not basically so
in your suits cover and Oriole villians
in quickly and pretty much in parallel
yoshua bengio and Barda new and show in
Montreal developed a way of using these
algorithms for doing machine
translations and initially it seemed
crazy so what we're going to do is we're
going to have an encoder Network that
reads the sentence in one language and
turns it into a thought
and then we're going to take the thought
and we're going to turn it into a
sentence in another language of course
to do that you need to know what a
thought is now most people in AI in fact
still most people in AI they made a very
naive mistake which is they thought that
strings of symbols come in as words when
you say something strings of symbols
come out so they think what's in between
must be something like a string of
symbols that's the stupidest thinking
pixels come in and when you print
something pixels come out so everything
in between must be pixels and in fact
the symbolic AI people were laughing in
that view that it's all pixels in
between that was a view of someone
particularly naive called Steve Gosselin
and they laughter that but they had
exactly the same mistake they thought
the stuff that comes in and the stuff
that comes out which is the only stuff
we know about from outside it must be
the same kind of stuff in the middle
even though you know that what's in the
brain is just big vectors of neural
activity
there's no symbols in there this
particularly no symbolic expressions in
there and there certainly aren't rules
for animate manipulating symbolic
expressions at least after many years of
high school there might be a few of
those rules that you can't really follow
very well but that's not the basic way
of doing business so you're going to put
words into this encoding network one at
a time it's going to first turn those
words into a vector representation which
is a whole bunch of features it's going
to learn to do that so all of these
connections are learn by back
propagating and it's gonna basically
make say the vector for Tuesday be very
similar to the vector for Wednesday and
very different to the vector for
although the words come in it
accumulates information in it's hidden
units and at the end of the English
sentence of the top there there'll be a
state of the hidden units that I will
call a thought and that's not meant to
be a joke that's what I believe a
thought is a thought is an activity
pattern in a big bunch of neurons and
ass activity pattern that doesn't need
to be inspected to thinks out and it
causes things to happen so I can say to
you John thought dan or John thought is
snowing outside anything you can put in
quotes John can think and what's more
John can say it so if John thought it's
snowing outside you might say to you is
snowing yes
so it's obvious that the way you get at
thoughts the way I tell you what I'm
thinking is either by the words that
would have caused the thought or by the
words that the thought would have caused
it's hooked up at both ends and but the
thought itself doesn't look anything
like words it's something completely
different inside and in fact it looks
like that it's not necessarily red um
you take that thought vector and you
give it to a decoder network and decoder
network says okay that was the thought
let's suppose it's doing English to
French what's the first word in French
so it takes a thought and it says okay I
think the house was probably loved but
it might be law and it might be
something else it gives you
probabilities of all the various words
one way of seeing what the network one
way of decoding the thought not the best
way but one way to do it is to say okay
take those words it thought were
reasonably possible pick one of them
according to how probably thought they
were and then lie to the network tell it
okay that was actually the right then
you got it that right okay what do you
think comes next and then it gives you a
prediction for the next word and you say
okay you got that right what do you
think comes next and that way it will
give you a string of words until it
eventually gives you a full stop and
then that's the translation now what's
amazing is that actually works but if
you train the whole thing with
backpropagation
and Google Translate used to have huge
tables of phrases this phrase in English
tends to go to that phrase in French and
you try and put all these tables
together to get a plausible French
sentence and it turns out it works much
better to have a system that has no
linguistic knowledge whatsoever that is
this actually got lots of linguistic
much but it wasn't put in by people so
now the way Google Translate works on
many pairs of languages and soon all of
them I think is you take a language you
automatically break the words of that
language into 32,000 fragments so
fragments for English would be whole
words like the they'd also be the
individual letters there'd be things
like in and II D and s and you represent
the input string by this string of
symbols these photoshoot
from this alphabet a 32,000 symbols you
feed it the English sentence you have a
trance big database of translations it
then produces the French sentence and it
has these probabilities of producing
words and you look at at each point in
time when it's producing the French
sentence you look at the probability of
the science of the correct word then you
sang in a back propagate through all
those connections you see in the net
there send information backwards
computing how a small change in that
connection strength would increase the
probability of the right word that's
what you do and so you start with random
weights and then you send all this
information back to change the strengths
very slightly so as to increase the
probability of the correct word and you
do that for a lot of things and hey
presto is the best machine translation
system there is pretty much one big
improvement that was made by researchers
in Montreal is attention so the system I
described to you turns English into a
thought and then turns the thought into
French because that's not what a real
translator does I mean he could do that
but it'll do better if as he's producing
the French he looks back at the English
and so they made their networks look
back not at the input words in the
English but at the hidden States when it
was getting English words and they made
it learn where to look so that's pretty
fancy it's an extra but module in the
network that's trying to learn where to
attend in the English sentence as its
producing the French sentence it
successfully does that and it makes the
whole thing work better and be able to
be trained on much less data that's one
way the word fragments already described
don't use words use pieces of words
it'll also work if you use individual
letters in fact here's an amazing thing
if you're translating Chinese to English
and I give you the following choice you
could have a big list of Chinese symbols
because they're symbols for whole words
or I could give you bitmaps of the
Chinese symbols which would you rather
have as input well it turns out it works
better if I give you the bitmaps because
back propagation learns that Chinese
symbols actually have confidential
structure a Chinese symbol you know
there's a man running to a house or so I
don't I know nothing
Chinese but those little bits in there
actually have morphemic structure and
it'll learn that from bitmaps so this is
rather bad mood for news for linguists
the number of linguists you need to make
a really good speech recognition system
is zero actually that's entirely unfair
you you need to have a well curated data
set and linguists will know a lot about
how to get a well curated data set but
you don't need them telling the neural
network what to do now let's combine
that with the vision we did before so
we're going to take our net that
recognizes objects in images trained on
image net and we're going to say when we
translate we get a thought and then we
say that thought well what if we got a
percept and then we said that percept so
instead of using English to get the
thought we're going to use the image net
thing to look at an image and get a
thought and then from that thought we're
going to produce the output and the
thought or percept that the net has is
simply the activity of all the units
just before the answer the last hidden
let because what the net is done is
really it's trying to pixels into
activity of a bunch of things that's to
do with objects but to do with lots of
objects in the image and then it makes a
choice and says the name of an object
but before it sets the name of an object
you have stuff to do with lots of
objects so we use that percept as the
encoding and then we decode it and we
train it to decode that is obviously
percepts to say those you need some
training so you take that last layer of
image net and you take a big database
that emma that microsoft kindly supplied
with a few hundred thousand images each
with several possible captions and you
train the decoder to turn that percept
into a sentence and then it does things
like you show it that and it says a
group of people shopping around to
market the actual transcript of that is
the correct answer according to the
databases people are crouched around in
an open market which is better because
it's got the crowd cheers
and then the one you saw at the
beginning so we reach closure you now
know how this worked
we trained up something on image net we
then trained the out brother of that
thing to produce synthesis in English
and it says the clothes of a child
holding a stuffed animal
the real caption a young girl asleep on
the sofa cuddling a stuffed bear is
somewhat better but I was just
completely blown away by this
when oreal venules and Sammy Avenger and
other researchers at Google showed me
that this worked I thought well you know
that's the dream of AI to be able to
look at the picture and say what's in it
I mean that's sort of basic AI if you
can do that you're really onto something
and it worked and then within about a
week lots of other people had done
similar things I think I was just
slightly better there's all sorts of
implications for document processing if
you can convert a sentence into a
thought and then model the structure of
those thoughts you can get natural
reasoning you might not want natural
reasoning because most people's natural
reasoning isn't much good but a tissue
can model it I think to do this properly
we'll need a number of parameters
comparable with the brain which is a
hundred trillion and our neural networks
currently have a few billion it's as a
puzzle here which is we can translate
between multiple pairs of languages
using a few billion weights that's less
than one voxel of a brain scan
so either the brain is amazingly much
better than what we can do or it's using
a different algorithm or it's using back
propagation but inefficiently and I
don't know which in medical images very
soon will be better than radiologists so
already for skin cancers there's a
system that's comparable with
radiologists with dermatologists and
actually as soon as he's trained on more
images it'll be significantly better
that was trained on off the order of
100,000 images training on 10 million
men will be better one thing to bear in
mind you that doctors often worry about
is where do you get the correct answers
well here's something interesting you
train a neural network on labels
produced by doctors
and the neural network can end up much
better than the doctors but is the
doctors all disagree I only have 70
percent agreement
the neural network actually gets what's
going on and it can be much better than
the labels you used to train it
that seems paradoxical but it's not so
we don't actually need the ground truth
we just need enough something related
enough to the ground true so the neural
network and figure out what's going on
which the doctor couldn't and then we
can do better I want to finish with one
story about the same student George Dahl
as with involved in the speech
recognition in 2009 in 2012 or 11 I
think 12 he entered a competition on
Cargill which he entered quite late and
the competition was I give you a few
thousand properties of molecules and you
have to predict whether this molecule
will bind to something the drug
companies would like to know this and
they'd like to do without synthesizing
the molecule so they'd like to predict
which one's a good candidates for
binding to something
George basically through our standard
neural network at it multiple layers of
rectified linear units far more
parameters than the world training cases
here he was using probably a million
parameters with 15,000 training cases
and it worked he combined it with other
methods but he didn't need to it
actually would have won the competition
without being confined with other
methods and they were surprised and
there's $20,000 prize and so George said
okay give me the price and Merck said
well it's part of the competition that
you have to tell us what cuse are you
used and George says what said what's Q
so this is slightly embarrassing because
there's a field called Kusa q site is
quantitative structure-activity
relationships that is how does the
structure give rise to the activity I
mean has this been going for like 20
years it has a journal it has an annual
conference it has a whole bunch of
people who that's what they do and
George white the right
even knowing the name of the field okay
that's it
[Applause]
[Music]
[Applause]
[Music]
Parcourir plus de vidéos associées
![](https://i.ytimg.com/vi/OVwEeSsSCHE/hqdefault.jpg?sqp=-oaymwEXCJADEOABSFryq4qpAwkIARUAAIhCGAE=&rs=AOn4CLA-ug-Cn-hZJ1U9XcvNQ0mWM3OauQ)
Lecture 1.1 — Why do we need machine learning — [ Deep Learning | Geoffrey Hinton | UofT ]
![](https://i.ytimg.com/vi/fOvTtapxa9c/hq720.jpg)
Natural Language Processing: Crash Course Computer Science #36
![](https://i.ytimg.com/vi/-4E2-0sxVUM/hq720.jpg)
Computer Vision: Crash Course Computer Science #35
![](https://i.ytimg.com/vi/OyFJWRnt_AY/hq720.jpg)
CS480/680 Lecture 19: Attention and Transformer Networks
![](https://i.ytimg.com/vi/aircAruvnKk/hq720.jpg)
But what is a neural network? | Chapter 1, Deep learning
![](https://i.ytimg.com/vi/lLBbsif2Xt4/hq720.jpg)
Geoffrey Hinton is a genius | Jay McClelland and Lex Fridman
5.0 / 5 (0 votes)