Backpropagation and the brain
Summary
TLDR这篇论文由Timothy Lillicrap、Adam Santoro、Luke Morris、Colin Ackerman和Geoffrey Hinton共同撰写,提出了一个关于大脑中反向传播算法如何工作的假设。尽管之前有大量证据反对大脑中存在类似反向传播的过程,但论文探讨了神经网络如何在没有外部反馈的情况下通过Hebbian学习和通过反馈进行学习。作者们提出了一种类似于反向传播的学习机制,称为“backprop-like”学习,并通过反馈网络实现,使得每个神经元都能接收到如何更新自己的详细指令。论文还讨论了为何之前认为反向传播在大脑中不可能的原因,包括突触对称性的需求和错误信号的类型问题。最后,作者们提出了end-grad假设,认为大脑可以通过使用基于自编码器的近似反向传播算法来实现神经网络的学习。这一算法避免了传统反向传播中的一些问题,如突触对称性和错误信号的传递问题,并通过自编码器的堆叠和训练来实现。
Takeaways
- 🧠 这篇论文提出了一个关于大脑中类似反向传播算法如何工作的假设,尽管之前有很多证据反对大脑中存在类似反向传播的机制。
- 📈 论文中提到了多种神经网络学习的方式,包括Hebbian学习和通过反馈进行的学习,其中Hebbian学习不需要外界反馈,是一种自我强化的模式激活机制。
- 🔄 反向传播算法在机器学习社区中很常见,但在生物学大脑中实现的可能性之前被认为较低,因为其需要前向和后向路径的突触对称性。
- 🤔 论文讨论了为什么之前认为反向传播不可能在大脑中发生,包括突触对称性的需求和错误信号的类型。
- 🧬 论文提出了一种基于自编码器的近似反向传播算法,称为end gradient hypothesis,它使用局部更新规则而不是反向传播。
- 🤓 通过比较人工神经网络和生物神经网络的隐藏表示,研究表明使用反向传播训练的网络在形成隐藏表示方面与生物网络更为接近。
- 🔄 论文提出的算法通过使用近似逆向函数来实现类似于反向传播的效果,同时避免了反向传播在生物学上不可行的问题。
- ⚙️ 该算法利用前向和后向权重的更新,使得网络的隐藏层表示更接近期望状态,并且通过局部信息计算所需的错误信号。
- 📊 论文还探讨了这种算法如何在生物学上实现,包括当前对神经元更为复杂和分化的看法。
- 📚 论文的作者包括机器学习领域的著名学者Geoffrey Hinton,这也是该论文受到关注的原因之一。
- 🎓 对于对机器学习和神经科学交叉领域感兴趣的读者,这篇论文提供了一个理解和探索大脑学习机制的新视角。
Q & A
本文讨论的论文主要提出了什么假设?
-本文讨论的论文提出了一个关于大脑中类似反向传播算法工作的假设。尽管之前有许多证据反对大脑中存在类似反向传播的机制,论文作者认为大脑中可能确实存在类似反向传播的学习机制。
Hebbian学习和反向传播学习有什么区别?
-Hebbian学习是一种不需要外部反馈的自我强化模式,通过增加或减少连接权重来匹配或增强之前的输出。而反向传播学习则需要外部反馈,通过反向传播误差信号(即梯度)来精确调整每个神经元或连接权重,以减少整体的误差。
为什么之前的研究认为反向传播在大脑中不太可能实现?
-之前的研究认为反向传播在大脑中不太可能实现的原因主要有两个:一是反向传播要求前向和后向路径的突触对称性,但生物神经元的结构并不支持这一点;二是反向传播中的误差信号是有符号的,可能具有极端值,而生物神经元的信号通常是以尖峰率的形式存在,难以表示有符号的误差信号。
论文中提出的“Backdrop学习”是什么?
-“Backdrop学习”是论文中提出的一个类似反向传播的学习算法,它使用近似的逆向自编码器来实现。该算法利用局部信息和激活数据类型来计算所需的误差信号,而不进行真正的误差反向传播,从而解决了生物神经元难以实现反向传播的问题。
自编码器在Backdrop学习中扮演了什么角色?
-自编码器在Backdrop学习中起到了关键作用。通过将多层自编码器堆叠起来,可以在每一层中使用自编码器的重构误差来计算所需的隐藏表示,然后通过近似逆函数来调整这些隐藏表示,使其更接近期望值。这样,每个自编码器层可以独立地调整其权重,而不需要全局的误差反向传播。
如何理解Backdrop学习中的“近似逆向”?
-在Backdrop学习中,由于并不存在完美的逆向函数,所以使用了“近似逆向”的概念。这意味着使用一个近似的逆向函数来估计给定前向传播结果应该对应的隐藏层表示。通过这种方式,即使没有完美的逆向函数,也能够在每个层级上计算出误差信号,并据此更新权重。
论文中提到的实验是如何证明反向传播与生物神经网络的相似性的?
-论文中提到的实验通过使用人工神经网络学习与人类或动物大脑相同的任务,并将这些人工神经网络的隐藏层表示与生物神经网络的隐藏层表示进行比较。结果显示,使用反向传播训练的网络在隐藏层表示上更接近生物网络,而不是使用标量更新算法训练的网络,这为大脑中可能存在类似反向传播的学习机制提供了证据。
为什么说Backdrop学习算法的数据类型检查是必要的?
-Backdrop学习算法的数据类型检查是必要的,因为生物神经元通过尖峰率传递信息,这些信息通常是非负的。算法需要确保前向传播和近似逆向传播的数据类型一致,都是激活信号,这样才能保证算法的生物学可行性。
论文中提到的神经元的生物学实现有哪些可能的证据?
-论文中提到了一些可能的生物学证据,包括现代对神经元更为复杂的视图,例如神经元的不同区域可以独立运作,以及神经元之间的相互干扰等。这些现代观点提供了神经元可能实现类似Backdrop学习算法的生物学基础。
如何理解Backdrop学习算法中的“局部更新”?
-在Backdrop学习算法中,“局部更新”指的是每个神经元或每层网络可以根据本层的信息独立进行权重更新,而不需要依赖于从其他层级传递来的全局误差信号。这种方法利用了局部可用的信息,如激活信号和近似逆函数,来计算必要的误差信号,从而实现了权重的调整。
Backdrop学习算法与标准的反向传播算法相比有哪些优势?
-Backdrop学习算法的优势在于它不依赖于全局误差信号的反向传播,而是利用局部信息和近似逆函数来进行权重更新。这种方法更符合生物神经系统的工作方式,避免了生物神经元难以实现的全局误差反向传播的问题,同时保持了反向传播算法的优化效率。
Outlines
🤖 论文介绍与背景
本段介绍了一篇由Timothy Lilly、Adam Santoro、Luke Morris、Colin Ackerman和Geoffrey Hinton共同撰写的论文,探讨了大脑中类似反向传播算法的工作原理。尽管此前有大量证据反对大脑中存在类似反向传播的过程,但论文提出了一个假设,解释了大脑神经网络如何学习。论文还讨论了Hebbian学习和反馈学习的不同,以及它们在神经网络学习中的作用。
🧠 神经网络学习方式的对比
这一段详细对比了Hebbian学习和反向传播学习的不同。Hebbian学习不需要外部反馈,是一种自我强化的模式激活机制。而反向传播学习则需要外部反馈,通过反向计算精确地调整每个神经元或权重,以减少误差。此外,还讨论了生物神经元实现反向传播的难题,以及论文提出的大脑可能采用类似反向传播的学习机制。
🔍 反向传播在大脑中的可行性探讨
本段探讨了为什么人们之前认为反向传播在大脑中不可能实现,主要原因包括突触对称性的需求以及错误信号的类型。尽管如此,论文提出了大脑可能使用类似自动编码器的近似反向传播算法的假设,并讨论了自动编码器的工作原理以及如何通过局部更新规则而非反向传播来训练它们。
🔧 自动编码器与反向传播的关系
这一段深入讨论了自动编码器的工作原理,包括它们如何通过输入层、隐藏层和输出层来重建输入数据。作者提出了一种算法,该算法使用近似逆函数来模拟反向传播的过程,从而避免了反向传播中的一些生物学不可行性。这种方法允许网络仅使用局部信息来计算所需的误差信号,并且保证了信号的正向性,使其可以被尖峰神经元编码。
🧬 近似逆函数在神经网络中的应用
本段详细介绍了如何使用近似逆函数G来模拟反向传播算法中的误差信号传播。通过两次应用G函数,可以评估逆函数的准确性,并据此调整网络的隐藏表示,以更接近期望的输出。这种方法允许网络在不进行传统反向传播的情况下,通过局部更新来学习。
🌐 生物学实现的可能性
最后一段讨论了这种学习算法如何在生物学上实现,包括当前对神经元更为复杂和分化的理解。作者提到了神经元的不同区域,如细胞体的不同区域,以及神经元间的干扰等。尽管作者表示自己对此部分内容不够专业,但鼓励读者自行阅读和探索。
Mindmap
Keywords
💡反向传播
💡赫布学习
💡误差信号
💡自编码器
💡突触对称性
💡误差景观
💡激活函数
💡权重更新
💡梯度
💡神经可塑性
💡近似逆
Highlights
本文提出了一个关于大脑中反向传播算法如何工作的假设。
论文讨论了之前反对大脑中存在类似反向传播的证据。
介绍了Hebbian学习和其无需外部反馈的自我强化模式。
反向传播与Hebbian学习不同,它需要通过反向计算来调整神经元权重。
论文提出了一种类似反向传播的学习机制,称为“backdrop-like learning”,它通过反馈网络实现。
通过比较人工神经网络和生物神经网络的隐藏表示,发现反向传播训练的网络与生物网络更为相似。
论文讨论了为什么之前认为反向传播在大脑中不可能实现的原因。
提出了end grad假设,即大脑可能使用基于自编码器的近似反向传播算法。
介绍了自编码器的概念以及如何通过局部更新规则而非反向传播进行训练。
论文提出了一种新的算法,它不使用反向传播,而是使用局部信息来计算所需的误差信号。
算法使用近似逆函数来实现类似于反向传播的更新,解决了生物可实现性的问题。
论文讨论了如何通过生物神经系统实现这种算法,并提供了一些证据。
论文还探讨了现代对神经元更为复杂和分化的看法,与传统的简单输入-反馈模型不同。
论文的算法提出了一种新的方法,使得反向传播在生物神经网络中成为可能。
论文的理论和算法对于理解大脑学习和人工神经网络的设计都有重要意义。
论文的作者包括著名机器学习专家Geoffrey Hinton,增加了研究的权威性。
Transcripts
hi there today we're looking at back
propagation in the brain by Timothy
Lilly corrupt Adam Santoro Luke Morris
Colin Ackerman and Geoffrey Hinton so
this is a bit of an unusual paper for
the machine learning community but
nevertheless it's interesting and let's
be honest at least half of our interest
comes from the fact that Geoffrey Hinton
is one of the authors of this paper so
this is a paper that basically proposes
a hypothesis on how the algorithm of
back propagation works in the brain
because previously there has been a lot
of evidence against there being
something like back propagation in the
brain so the question is how do neural
networks in the brain learn and they
they say there there can be many
different ways that neural networks
learn and they list them up in in this
kind of diagram where you have a network
and it maps from input to output by
having these weighted connections
between neurons so the input is
two-dimensional and then it maps using
these weights
to a three-dimensional hidden layer and
usually there is a nonlinear function
somewhere at the output here of these so
they they do a weighted sum of the
inputs and then they do a nonlinear
nonlinear function and then they
propagate that signal to the next layer
and till then to finally to the output
all right so how do these networks learn
the one way of learning is called
hebbian learning the interesting thing
here is that it requires no feedback
from the outside world basically what
you want to do in hebbian learning is
you want to update the connections such
that they kind of match their own
previous outputs or even increase their
own previous outputs so you propagate a
signal and then maybe this neuron spikes
really hard and this
Spike's really low then if you propagate
the signal again right then you want to
match that those those activations or if
you if you properly similar signals no
feedback required so basically it's a
self amplifying or self dampening
process the ultimately though you want
to learn something about the world and
that means you have to have some some
feedback from outside right so with
feedback what we mean is usually that
the output here let's look this way the
output here is goes into the world let's
say this is a motor neuron right you do
something with your arm like you hammer
on a nail and then you either hit the
nail or you don't let's say you don't
hit the nail so after it looks like
crooked there you have feedback right so
feedback usually in the form of some
sort of error signal right so feedback
it can be like this was good or this was
bad or it can be this was a bit too much
to the left or so on the important part
is you get kind of one number of
feedback right how bad you were and now
your goal is to adjust all of the
individual neurons or weights between
neurons such that the error will be
lower so in hebbian learning there is no
feedback it's just simply a self
reinforcing pattern activation machine
in the first in these kind of first
instances of perturbation learning what
you'll have is you'll have one single
feedback and that you can see this is a
diffuse cloud here what you're basically
saying is that every single neuron is
kind of punished let's say the the
feedback here was negative one that
means every single neuron is is punished
for that so how you can imagine
something if you have your input X and
you map it through through your function
f then the function f has a way to w1
and so on right
so you map X through it right and then
you get feedback of negative 1 and then
you map X with a little bit of noise
plus M right da-da-da-dah and you get a
feedback of negative 2 right then you
you that means that the direction of
this noise was probably a bad direction
so ultimately you want to update X into
the direction of negative that noise by
modulated of course by by some some
factor here that's that it kind of tells
you how bad it was so this could be the
negative 2 minus negative 1 now that
makes big sense
No yes that would be no it would be
negative 1 minus negative nevermind so
basically with a scalar feedback you
simply tell each neuron what it did
right or sorry if if the entire network
right the entire network did right or
wrong so the entire network will lead to
this feedback you don't have
accountability of the individual neurons
all you can say is that whatever I'm
doing here is wrong and whatever I'm
doing here is right so I'm gonna do more
of the right things now in back
propagation it is very different right
in back propagation what you'll do is
you'll have your feedback here let's say
that's negative 1 and then you do a
reverse computation so the forward
computation in this case was this
weighted sum of this layer now usually
layer wise reverse computation which
means that you know how
this function here this output came to
be out of the out of the inputs and that
means you can inverse and you can do an
inverse propagation of the error signal
which is of course the gradient so this
would be your your you you would derive
your error by the inputs to the layer
right so this basically tells in the
back propagation algorithm you can
exactly determine if you are this node
how do I have to adjust my input weights
how do I have to adjust them in order to
make this number here go down right and
then because you always propagate the
error according to that what you'll have
in each in each layer is basically a
vector target so it's no longer just one
number but each layer now has a target
of vectors and it says okay these are
the outputs that would be beneficial
please this layer please change your
outputs in the direction of negative two
negative three plus four so you see this
is so the negative two would be this
unit the negative three would be this
unit and the plus four would be this
unit so each unit is instructed
individually to say please this is the
direction that each unit should change
in in order to make this number go lower
you see how this is much more
information than the perturbation
learning in the perturbation learning
all the units simply know well the four
was bad and now is better
so let's you know change a bit and here
you have detailed instructions for each
unit because of the back propagation
algorithm so ultimately people have kind
of thought that since back propagation
wasn't really possible with biological
neurons that the brain might be doing
something like perturbation learning but
this paper argues that something like
back propagation is not only possible
but likely
in the brain and they proposed this kind
of backdrop like learning with the
feedback network so they basically
concern all the they differentiate hard
between these two regimes here in this
hand you have the scalar feedback which
means that the entire network gets one
number as a feedback and the each neuron
just gets that number and here you have
vector feedback where each neuron gets
an individual instruction of how to
update and they achieve this not by back
propagation because still the original
formulation of back prop as we use it in
neural networks is not biologically
plausible but they achieve this with
this backdrop like learning with the
feedback network and we'll see how this
does but in in essence this feedback
network is constructed such that it can
give each neuron in the forward pass
here detailed instructions on how to
update itself right so yeah they have a
little bit of a diagram here of if you
do hebbian if this if this is an error
landscape if you do have you in learning
you basically you don't care about the
error you're just reinforcing yourself
if you do perturbation learning then you
it's very slow because you don't have a
detailed signal you just you just rely
on this one number it's kind of if you
were to update every single neuron in
your neural network with reinforcement
learning considering the output the of
the neural networks or the error
considering that the reward not using
back row and then with back probably
have a much smoother much faster
optimization trajectory so they looked
at this and they they come to some some
conclusions first of all so here's
here's back prop basically saying back
prop as we said you have the forward
pass
and there you simply compute these
weighted averages and you you also pass
them usually through some sort of
nonlinear activation right and the cool
thing about this is in artificial neural
networks is that once the error comes in
you can exactly reverse that so you can
do a backward pass of errors where you
can propagate these errors through
because you know it's kind of invertible
the function doesn't have to be
invertible but that the gradients will
flow backwards if you know how the
forward pass was computed so first of
all they go into a discussion of back
prop in the brain how can we even expect
that and one cool piece of evidence is
where I find is that they cite several
examples where they use artificial
neural networks to learn the same tasks
as humans right and or as as animal
brains and then I have no clue how how
they measure any of this but then they
compare the hidden representations of
the living neural networks and the
artificial neural networks and it turns
out that the these the networks that
were trained with backpropagation x'
then networks that were not trained with
backdrop so basically that means if you
train a network with backprop it matches
the biological networks much closer in
how they form their hidden
representations and they they do a
number they cite the number of
experiments here that show this so this
gives you very good evidence that if the
hidden representations they look as if
they had been computed by backdrop and
not by any of these scaler update
algorithms so it is conceivable that we
find backprop in the brain that's why
they go here next they go into problems
with backdrops so basically why why
would we why so far have we believed
that back prop isn't happening in the
brain
so now let's I want to highlight two
factors here that that I find a thinker
suffice state they have more but first
of all back prop demands synaptic
symmetry in the forward and backward
paths right so basically if you have a
neuron and it has output to another
neuron what you need to be able to do is
to pass back information along that
neuron so it kind of has to be a
symmetric connection idea of the forward
and the backward pass and these need to
be exact right and this is just not if
you know how neurons are structured they
have kind of input dendrites and then
there's this accent act action potential
and along the axon the signal travels
and the back traveling of the signal
just I think is very is very very very
slow if even possible and so it's
generally not invertible or inverse
compute capable so this is one reason
why that prop seems unlikely and then
the second reason here is error signals
are signed and potentially extreme
valued and i want to add to that they
also just talk about this somewhere that
error signals are of a different type
right that's a different type
so first let's see what signed error
signals are signed yes we need to be
able to adjust neurons in a specific
directions right if you look at again
what we've drawn before here we said
here this is how these neurons must must
update
so the first neuron must must decrease
by two this must decrease by three and
this must increase by four now in
background we need this but in if if we
assume that there is something like a
reverse computation or signaling here
happening then we still have the problem
that usually these output signals are in
the form of spiking rates which means
that over time right so if a neuron
wants to if a neuron has zero activation
there's just no signal but if a neuron
has a high activation it spikes a lot if
has a low activation it kind of spikes
sometimes well what he can do is
negative spike right like zero is as low
as it goes so the the thought that there
are signed information in in the
backward pass is inconceivable even if
you have something like a second so you
can imagine here instead of this
backward connection because of the
symmetry problem we have some kind of
second neural network that goes in this
direction still you'd have the problem
that here you can only have positive
signal or a zero and they might be
extreme valued which okay it can't be
really encoded with the spiking because
they are they're limited in the range
they can assume but they are also of a
different type and I'm what I mean by
that is basically if you think of this
as a programming problem then the
forward passes here are our activations
right and the backward passes here they
are deltas so in the backward passes
view either propagate deltas or you
propagate kind of directions so the
activations are sort of impulses whereas
the backward signals are this isn't how
you need to change their their gradients
ultimately
so it's fundamentally a different type
of data that is propagated along would
be propagated along these directions and
that makes it very unlikely because we
are not aware as this paper says that
the that neural networks that neurons
can kind of switch the data type that
they're they're transmitting all right
so then the paper goes into their end
grad hypothesis and what this is the
hypothesis basically states that the
brain could implement something like
neural networks by using by using an
approximate backdrop like algorithm
based on autoencoders and I want to jump
straight into the algorithm no actually
first they do talk about autoencoders
which which I find very interesting so
if you think of autoencoders what is an
autoencoder an autoencoder is a network
that basically starts out with an input
layer and then has a bunch of hidden
layers and at the end it tries to
reconstruct its own input right so you
feed a data in here you get data out
here and then your error the error
signal it will be your difference to
your original input now the usually when
we train autoencoders in deep learning
we also train this by back prop right we
see then this error here and this goes
back but if you just think of single
layer autoencoders so um let's let's go
over here single layer auto-encoder with
let's say the the same number of the
same number of units in this in this
layer what you'll have is so this this
is input this is output and this is the
hidden layer right you'll have a weight
matrix here
and you'll probably have some sort of
nonlinear function and then you have
another weight matrix here and they call
them W and B another way to draw this is
I have weight matrix going up then I
have a nonlinear function going
transforming this into this signal and
then I have the be going back right so
I'm drawing I'm drawing it in two
different ways up here or over here and
with the second way you can see that it
is kind of a forward backward algorithm
where now the error if you look at what
is the error here the error is the
difference between this and this and the
difference between this and this and the
difference between this and this right
and you can train an autoencoder
simply by saying W please make sure that
the that the the the input here gets
mapped closer to the output and to be
the same thing this will become clear in
a second so but basically sorry this I
mean the the hidden representations
you'll see basically the idea is that
you can train an autoencoder only by
using local update rules you don't have
to do back prop and that's what this
algorithm is proposing namely if you
think of a stack of autoencoders this
this this transforming one hidden
representation into the next right this
is the feed-forward function what you
can do is you first of all you can
assume that for each of these functions
here you have a perfect inverse right
you can you can perfectly compute the
inverse function that's this this G here
of course this doesn't exist but assume
you have it
what you then could do is you could if
if you knew in one layer and on the top
layer of course you know if you knew
that okay I got this from my forward
pass but I would like to have this this
is my desired output right so in the
output layer you get this this is your
error signal if you knew you you you
could compute an error right here this
is what you do in the output right now
in back prop we would back propagate
this error along the layers but now we
don't do this instead of what we do is
we use this G function to invert the F
function right and by that what we'll
say is what hidden representation in
layer two what should the hidden
representation have been that in order
for us to obtain this thing right so the
the claim here is if in layer two we had
had H two as a hidden representation
then we would have landed exactly where
we want it right that's what this G
function does because here we use F so
had we had F h2 and used F on it we
would be exactly where we want instead
we had h2 here and used F on it and then
we landed here where we don't want so
this is where we want we would want to
be in layer two and this is where we
were so again we can compute an error
here again instead of back propagating
that error what we'll do is we'll use
the inverse of the forward function in
order to back propagate our desired
hidden representation and you can see
there is of course a relationship to the
true back prop here but the the
important distinction is we are not
trying to back propagate the error
signal we're trying to
invert the desired hidden states of the
network and then in each layer we can
compute from the forward pass we can
compute the difference to the desired
hidden state and thereby compute an
error signal and now we have achieved
what we wanted we want an algorithm that
doesn't do back prop that only uses
local information in order to compute
the error signal that it needs to adjust
and by local I mean information in the
same layer and also the data type that
is propagated by F is activations right
of hidden representations and by G is
also activations of hidden
representations both of them are always
positive can be encoded by spiking
neurons and so on so this algorithm
achieves what we want they go bit into
detail how the actual error update here
can be achieved and apparently neurons
can achieve you know in the same layer
to to adjust themselves to a given
desired activation so this algorithm
achieves it of course we don't have this
G we don't have it and therefore we need
to go a bit more complicated what they
introduces the this following algorithm
the goals are the same but now we assume
we do not have a perfect inverse but we
have something that is a bit like an
inverse so we have an approximate
inverse and they basically suggest if we
have an approximate inverse we can do
the phone so G G is now an approximate
inverse to F what we can do is this is
our input signal right we use F to map
it forward to this and so on all the way
up until we get our true or error right
here this is our error from the
environment right this is the nail being
wrong and then we do two applications of
G right so this is an application of F
we do to applet
of g1 we applied g2 this to what we got
in the forward pass right and this now
gives us a measure of how bad our
inverse is right so if G is now an
approximate inverse and this now we see
here oh okay we we had a ch2 in the
forward pass and we basically forward
passed and then went through our inverse
and we didn't land quite exactly where
we started but we know that okay this
this is basically the difference between
our our inverse our forward inverse H
and our true H and then we also back
project using G again the desired
outcome so we invert the desired outcome
here now before we have adjusted
directly these two right because we said
this is what we got this is what we want
but now we include for the fact that G
isn't a perfect inverse and our
assumption is that G here probably makes
about the same mistakes as G here so
what we'll do is we'll take this vector
right here and apply it here in order to
achieve this thing and this thing is now
the corrected thing our corrected to
desired hidden representation correct
for the fact that we don't have a
perfect inverse and now again we have
our error here that we can locally
adjust again all the signals propagated
here here and here are just neural
activations and all the information
required to update a layer of neurons is
now contained within that layer of
neurons right and and this goes back
through the network so this is how they
achieve how they achieve this this is a
bit of a close-up look and here are the
computations to do this so basically
for the forward updates you want to
adjust W into the direction of the H
minus the H tilde and the H tilde in
this case would be this the the hidden
representation that you would like to
have so you will update your forward
forward weights into the direction such
that your hidden representations are
closer sorry that your forward haven
representation is closer to your
backward hidden representation and the
backward updates now your goal is to get
a more a better to make G so sir W here
is our W or the weight of F and B or the
weights of G so in the backward updates
your goal is to make G a better inverse
right so what you'll do is again you'll
take the difference between now you see
the difference here here here right not
the same error so here you will you in
the W update use what we labeled error
here in the G update you use this error
here so this is the error of G so when
you update the function G you want to
make these two closer together such that
G becomes a better inverse right because
you're dealing with an approximate
inverse you still need to obtain that
approximate inverse end and this here is
how you learn it this algorithm now
achieves what we wanted right
local updates data types check signed
check and so on I hope this was enough
clear in essence is pretty simple but
it's pretty cool how they work around
this they call this a different story
with propagation and not these these
kind of papers I don't think they
invented this maybe I'm not sure maybe
they did
maybe they didn't and this paper just
kind of frames it in this hypothesis it
is unclear to me I am not familiar with
this kind of papers so sorry if I miss
attribute something here all right
then they go into into how could these
things be implemented biologically and
they go for some evidence and they also
state that we used to look at neurons
basically in this way where you had
input and feedback here very simple
simplistic view of neurons whereas
nowadays even the company computational
community views neurons in a more
differentiated way where you have for
example different regions here on the
soma that can be separated from each
other and you have inter neuron
interference and so on I'm not qualified
too much to comment on this stuff but I
invite you to read it for yourself if
you want alright so this was my take on
this paper I find the algorithm they
propose pretty cool if you I hope you
liked it and check it out bye bye
Ver Más Videos Relacionados
Geoffrey Hinton Unpacks The Forward-Forward Algorithm
The Most Important Algorithm in Machine Learning
Lecture 1.2 — What are neural networks — [ Deep Learning | Geoffrey Hinton | UofT ]
Geoffrey Hinton is a genius | Jay McClelland and Lex Fridman
Ilya Sutskever | AI neurons work just like human neurons | AGI will be conscious like humans
Heroes of Deep Learning: Andrew Ng interviews Geoffrey Hinton
5.0 / 5 (0 votes)