Backpropagation and the brain

Yannic Kilcher
20 Apr 202032:25

Summary

TLDR这篇论文由Timothy Lillicrap、Adam Santoro、Luke Morris、Colin Ackerman和Geoffrey Hinton共同撰写,提出了一个关于大脑中反向传播算法如何工作的假设。尽管之前有大量证据反对大脑中存在类似反向传播的过程,但论文探讨了神经网络如何在没有外部反馈的情况下通过Hebbian学习和通过反馈进行学习。作者们提出了一种类似于反向传播的学习机制,称为“backprop-like”学习,并通过反馈网络实现,使得每个神经元都能接收到如何更新自己的详细指令。论文还讨论了为何之前认为反向传播在大脑中不可能的原因,包括突触对称性的需求和错误信号的类型问题。最后,作者们提出了end-grad假设,认为大脑可以通过使用基于自编码器的近似反向传播算法来实现神经网络的学习。这一算法避免了传统反向传播中的一些问题,如突触对称性和错误信号的传递问题,并通过自编码器的堆叠和训练来实现。

Takeaways

  • 🧠 这篇论文提出了一个关于大脑中类似反向传播算法如何工作的假设,尽管之前有很多证据反对大脑中存在类似反向传播的机制。
  • 📈 论文中提到了多种神经网络学习的方式,包括Hebbian学习和通过反馈进行的学习,其中Hebbian学习不需要外界反馈,是一种自我强化的模式激活机制。
  • 🔄 反向传播算法在机器学习社区中很常见,但在生物学大脑中实现的可能性之前被认为较低,因为其需要前向和后向路径的突触对称性。
  • 🤔 论文讨论了为什么之前认为反向传播不可能在大脑中发生,包括突触对称性的需求和错误信号的类型。
  • 🧬 论文提出了一种基于自编码器的近似反向传播算法,称为end gradient hypothesis,它使用局部更新规则而不是反向传播。
  • 🤓 通过比较人工神经网络和生物神经网络的隐藏表示,研究表明使用反向传播训练的网络在形成隐藏表示方面与生物网络更为接近。
  • 🔄 论文提出的算法通过使用近似逆向函数来实现类似于反向传播的效果,同时避免了反向传播在生物学上不可行的问题。
  • ⚙️ 该算法利用前向和后向权重的更新,使得网络的隐藏层表示更接近期望状态,并且通过局部信息计算所需的错误信号。
  • 📊 论文还探讨了这种算法如何在生物学上实现,包括当前对神经元更为复杂和分化的看法。
  • 📚 论文的作者包括机器学习领域的著名学者Geoffrey Hinton,这也是该论文受到关注的原因之一。
  • 🎓 对于对机器学习和神经科学交叉领域感兴趣的读者,这篇论文提供了一个理解和探索大脑学习机制的新视角。

Q & A

  • 本文讨论的论文主要提出了什么假设?

    -本文讨论的论文提出了一个关于大脑中类似反向传播算法工作的假设。尽管之前有许多证据反对大脑中存在类似反向传播的机制,论文作者认为大脑中可能确实存在类似反向传播的学习机制。

  • Hebbian学习和反向传播学习有什么区别?

    -Hebbian学习是一种不需要外部反馈的自我强化模式,通过增加或减少连接权重来匹配或增强之前的输出。而反向传播学习则需要外部反馈,通过反向传播误差信号(即梯度)来精确调整每个神经元或连接权重,以减少整体的误差。

  • 为什么之前的研究认为反向传播在大脑中不太可能实现?

    -之前的研究认为反向传播在大脑中不太可能实现的原因主要有两个:一是反向传播要求前向和后向路径的突触对称性,但生物神经元的结构并不支持这一点;二是反向传播中的误差信号是有符号的,可能具有极端值,而生物神经元的信号通常是以尖峰率的形式存在,难以表示有符号的误差信号。

  • 论文中提出的“Backdrop学习”是什么?

    -“Backdrop学习”是论文中提出的一个类似反向传播的学习算法,它使用近似的逆向自编码器来实现。该算法利用局部信息和激活数据类型来计算所需的误差信号,而不进行真正的误差反向传播,从而解决了生物神经元难以实现反向传播的问题。

  • 自编码器在Backdrop学习中扮演了什么角色?

    -自编码器在Backdrop学习中起到了关键作用。通过将多层自编码器堆叠起来,可以在每一层中使用自编码器的重构误差来计算所需的隐藏表示,然后通过近似逆函数来调整这些隐藏表示,使其更接近期望值。这样,每个自编码器层可以独立地调整其权重,而不需要全局的误差反向传播。

  • 如何理解Backdrop学习中的“近似逆向”?

    -在Backdrop学习中,由于并不存在完美的逆向函数,所以使用了“近似逆向”的概念。这意味着使用一个近似的逆向函数来估计给定前向传播结果应该对应的隐藏层表示。通过这种方式,即使没有完美的逆向函数,也能够在每个层级上计算出误差信号,并据此更新权重。

  • 论文中提到的实验是如何证明反向传播与生物神经网络的相似性的?

    -论文中提到的实验通过使用人工神经网络学习与人类或动物大脑相同的任务,并将这些人工神经网络的隐藏层表示与生物神经网络的隐藏层表示进行比较。结果显示,使用反向传播训练的网络在隐藏层表示上更接近生物网络,而不是使用标量更新算法训练的网络,这为大脑中可能存在类似反向传播的学习机制提供了证据。

  • 为什么说Backdrop学习算法的数据类型检查是必要的?

    -Backdrop学习算法的数据类型检查是必要的,因为生物神经元通过尖峰率传递信息,这些信息通常是非负的。算法需要确保前向传播和近似逆向传播的数据类型一致,都是激活信号,这样才能保证算法的生物学可行性。

  • 论文中提到的神经元的生物学实现有哪些可能的证据?

    -论文中提到了一些可能的生物学证据,包括现代对神经元更为复杂的视图,例如神经元的不同区域可以独立运作,以及神经元之间的相互干扰等。这些现代观点提供了神经元可能实现类似Backdrop学习算法的生物学基础。

  • 如何理解Backdrop学习算法中的“局部更新”?

    -在Backdrop学习算法中,“局部更新”指的是每个神经元或每层网络可以根据本层的信息独立进行权重更新,而不需要依赖于从其他层级传递来的全局误差信号。这种方法利用了局部可用的信息,如激活信号和近似逆函数,来计算必要的误差信号,从而实现了权重的调整。

  • Backdrop学习算法与标准的反向传播算法相比有哪些优势?

    -Backdrop学习算法的优势在于它不依赖于全局误差信号的反向传播,而是利用局部信息和近似逆函数来进行权重更新。这种方法更符合生物神经系统的工作方式,避免了生物神经元难以实现的全局误差反向传播的问题,同时保持了反向传播算法的优化效率。

Outlines

00:00

🤖 论文介绍与背景

本段介绍了一篇由Timothy Lilly、Adam Santoro、Luke Morris、Colin Ackerman和Geoffrey Hinton共同撰写的论文,探讨了大脑中类似反向传播算法的工作原理。尽管此前有大量证据反对大脑中存在类似反向传播的过程,但论文提出了一个假设,解释了大脑神经网络如何学习。论文还讨论了Hebbian学习和反馈学习的不同,以及它们在神经网络学习中的作用。

05:03

🧠 神经网络学习方式的对比

这一段详细对比了Hebbian学习和反向传播学习的不同。Hebbian学习不需要外部反馈,是一种自我强化的模式激活机制。而反向传播学习则需要外部反馈,通过反向计算精确地调整每个神经元或权重,以减少误差。此外,还讨论了生物神经元实现反向传播的难题,以及论文提出的大脑可能采用类似反向传播的学习机制。

10:06

🔍 反向传播在大脑中的可行性探讨

本段探讨了为什么人们之前认为反向传播在大脑中不可能实现,主要原因包括突触对称性的需求以及错误信号的类型。尽管如此,论文提出了大脑可能使用类似自动编码器的近似反向传播算法的假设,并讨论了自动编码器的工作原理以及如何通过局部更新规则而非反向传播来训练它们。

15:07

🔧 自动编码器与反向传播的关系

这一段深入讨论了自动编码器的工作原理,包括它们如何通过输入层、隐藏层和输出层来重建输入数据。作者提出了一种算法,该算法使用近似逆函数来模拟反向传播的过程,从而避免了反向传播中的一些生物学不可行性。这种方法允许网络仅使用局部信息来计算所需的误差信号,并且保证了信号的正向性,使其可以被尖峰神经元编码。

20:10

🧬 近似逆函数在神经网络中的应用

本段详细介绍了如何使用近似逆函数G来模拟反向传播算法中的误差信号传播。通过两次应用G函数,可以评估逆函数的准确性,并据此调整网络的隐藏表示,以更接近期望的输出。这种方法允许网络在不进行传统反向传播的情况下,通过局部更新来学习。

25:10

🌐 生物学实现的可能性

最后一段讨论了这种学习算法如何在生物学上实现,包括当前对神经元更为复杂和分化的理解。作者提到了神经元的不同区域,如细胞体的不同区域,以及神经元间的干扰等。尽管作者表示自己对此部分内容不够专业,但鼓励读者自行阅读和探索。

Mindmap

Keywords

💡反向传播

反向传播(Backpropagation)是一种在神经网络中用于训练模型的算法,通过计算损失函数关于网络参数的梯度,并利用这些梯度来更新网络的权重。在视频中,讨论了反向传播在大脑中可能的实现方式,以及它如何与生物神经网络的学习机制相匹配。

💡赫布学习

赫布学习(Hebbian Learning)是一种无监督学习算法,其核心思想是“一起激发的神经元连在一起”,即如果两个神经元总是同时激活,它们之间的连接权重就会增强。视频中提到,赫布学习不需要外部反馈,是一种自增强模式的激活机制。

💡误差信号

误差信号是机器学习中用来衡量模型预测值与实际值之间差异的信号。在反向传播算法中,误差信号用于计算损失函数的梯度,进而更新网络权重。视频中讨论了误差信号在大脑中的表现形式,以及它们如何影响神经网络的学习过程。

💡自编码器

自编码器(Autoencoder)是一种用于无监督学习的神经网络,旨在通过编码器部分将输入数据压缩成一个低维表示,然后通过解码器部分重建输入数据。视频中提到了自编码器在实现类似反向传播算法中的作用,以及它们如何通过局部更新规则进行训练。

💡突触对称性

突触对称性(Synaptic Symmetry)是指在神经网络中,前向传播和反向传播路径上的突触连接是对称的。视频中提到,反向传播算法要求突触对称性,但在生物神经系统中,这种对称性并不常见,这给反向传播在大脑中的实现带来了挑战。

💡误差景观

误差景观(Error Landscape)是指在机器学习中,损失函数随参数变化形成的数学结构。在视频中,讨论了通过自编码器如何优化误差景观,以及如何通过近似反向传播算法来改善神经网络的隐藏表示。

💡激活函数

激活函数(Activation Function)是在神经网络中用于在输入和输出之间引入非线性的函数,如ReLU或Sigmoid函数。视频中提到激活函数在神经网络的前向传播中的作用,以及它们如何影响神经网络的学习过程。

💡权重更新

权重更新(Weight Update)是指在机器学习模型训练过程中,根据损失函数的梯度来调整网络权重的过程。在视频中,讨论了在反向传播算法中如何进行权重更新,以及在提出的近似反向传播算法中如何利用局部信息进行权重的局部更新。

💡梯度

梯度(Gradient)是数学中描述函数在某一点处变化率的概念,在机器学习中,梯度用于指导如何调整神经网络的参数以最小化损失函数。视频中讨论了梯度在反向传播算法中的重要性,以及在生物神经系统中可能的类似机制。

💡神经可塑性

神经可塑性(Neuroplasticity)是指神经系统在生命过程中对经验、学习和环境变化做出的结构和功能上的适应性改变。视频中提到了神经可塑性在理解大脑如何学习,以及反向传播算法可能在大脑中的生物学对应物中的作用。

💡近似逆

近似逆(Approximate Inverse)是指在数学中,一个函数的逆如果不能完全精确地还原原始输入,但能够近似地接近原始输入。视频中讨论了在提出的算法中,如何使用近似逆来实现类似于反向传播的效果,同时考虑到生物神经系统的局限性。

Highlights

本文提出了一个关于大脑中反向传播算法如何工作的假设。

论文讨论了之前反对大脑中存在类似反向传播的证据。

介绍了Hebbian学习和其无需外部反馈的自我强化模式。

反向传播与Hebbian学习不同,它需要通过反向计算来调整神经元权重。

论文提出了一种类似反向传播的学习机制,称为“backdrop-like learning”,它通过反馈网络实现。

通过比较人工神经网络和生物神经网络的隐藏表示,发现反向传播训练的网络与生物网络更为相似。

论文讨论了为什么之前认为反向传播在大脑中不可能实现的原因。

提出了end grad假设,即大脑可能使用基于自编码器的近似反向传播算法。

介绍了自编码器的概念以及如何通过局部更新规则而非反向传播进行训练。

论文提出了一种新的算法,它不使用反向传播,而是使用局部信息来计算所需的误差信号。

算法使用近似逆函数来实现类似于反向传播的更新,解决了生物可实现性的问题。

论文讨论了如何通过生物神经系统实现这种算法,并提供了一些证据。

论文还探讨了现代对神经元更为复杂和分化的看法,与传统的简单输入-反馈模型不同。

论文的算法提出了一种新的方法,使得反向传播在生物神经网络中成为可能。

论文的理论和算法对于理解大脑学习和人工神经网络的设计都有重要意义。

论文的作者包括著名机器学习专家Geoffrey Hinton,增加了研究的权威性。

Transcripts

play00:00

hi there today we're looking at back

play00:02

propagation in the brain by Timothy

play00:05

Lilly corrupt Adam Santoro Luke Morris

play00:08

Colin Ackerman and Geoffrey Hinton so

play00:12

this is a bit of an unusual paper for

play00:15

the machine learning community but

play00:17

nevertheless it's interesting and let's

play00:19

be honest at least half of our interest

play00:22

comes from the fact that Geoffrey Hinton

play00:24

is one of the authors of this paper so

play00:28

this is a paper that basically proposes

play00:33

a hypothesis on how the algorithm of

play00:37

back propagation works in the brain

play00:39

because previously there has been a lot

play00:42

of evidence against there being

play00:46

something like back propagation in the

play00:48

brain so the question is how do neural

play00:52

networks in the brain learn and they

play00:56

they say there there can be many

play00:59

different ways that neural networks

play01:00

learn and they list them up in in this

play01:06

kind of diagram where you have a network

play01:09

and it maps from input to output by

play01:13

having these weighted connections

play01:15

between neurons so the input is

play01:16

two-dimensional and then it maps using

play01:19

these weights

play01:20

to a three-dimensional hidden layer and

play01:22

usually there is a nonlinear function

play01:25

somewhere at the output here of these so

play01:30

they they do a weighted sum of the

play01:32

inputs and then they do a nonlinear

play01:35

nonlinear function and then they

play01:37

propagate that signal to the next layer

play01:39

and till then to finally to the output

play01:42

all right so how do these networks learn

play01:46

the one way of learning is called

play01:50

hebbian learning the interesting thing

play01:52

here is that it requires no feedback

play01:54

from the outside world basically what

play01:57

you want to do in hebbian learning is

play01:59

you want to update the connections such

play02:02

that they kind of match their own

play02:04

previous outputs or even increase their

play02:07

own previous outputs so you propagate a

play02:09

signal and then maybe this neuron spikes

play02:12

really hard and this

play02:13

Spike's really low then if you propagate

play02:16

the signal again right then you want to

play02:21

match that those those activations or if

play02:23

you if you properly similar signals no

play02:28

feedback required so basically it's a

play02:31

self amplifying or self dampening

play02:33

process the ultimately though you want

play02:37

to learn something about the world and

play02:39

that means you have to have some some

play02:41

feedback from outside right so with

play02:44

feedback what we mean is usually that

play02:47

the output here let's look this way the

play02:53

output here is goes into the world let's

play02:57

say this is a motor neuron right you do

play03:00

something with your arm like you hammer

play03:04

on a nail and then you either hit the

play03:09

nail or you don't let's say you don't

play03:12

hit the nail so after it looks like

play03:14

crooked there you have feedback right so

play03:18

feedback usually in the form of some

play03:22

sort of error signal right so feedback

play03:25

it can be like this was good or this was

play03:27

bad or it can be this was a bit too much

play03:30

to the left or so on the important part

play03:33

is you get kind of one number of

play03:35

feedback right how bad you were and now

play03:40

your goal is to adjust all of the

play03:42

individual neurons or weights between

play03:46

neurons such that the error will be

play03:49

lower so in hebbian learning there is no

play03:52

feedback it's just simply a self

play03:54

reinforcing pattern activation machine

play03:58

in the first in these kind of first

play04:02

instances of perturbation learning what

play04:05

you'll have is you'll have one single

play04:09

feedback and that you can see this is a

play04:11

diffuse cloud here what you're basically

play04:15

saying is that every single neuron is

play04:17

kind of punished let's say the the

play04:20

feedback here was negative one that

play04:22

means every single neuron is is punished

play04:26

for that so how you can imagine

play04:30

something if you have your input X and

play04:33

you map it through through your function

play04:37

f then the function f has a way to w1

play04:41

and so on right

play04:43

so you map X through it right and then

play04:47

you get feedback of negative 1 and then

play04:52

you map X with a little bit of noise

play04:55

plus M right da-da-da-dah and you get a

play04:59

feedback of negative 2 right then you

play05:03

you that means that the direction of

play05:05

this noise was probably a bad direction

play05:07

so ultimately you want to update X into

play05:13

the direction of negative that noise by

play05:16

modulated of course by by some some

play05:22

factor here that's that it kind of tells

play05:24

you how bad it was so this could be the

play05:29

negative 2 minus negative 1 now that

play05:36

makes big sense

play05:37

No yes that would be no it would be

play05:43

negative 1 minus negative nevermind so

play05:45

basically with a scalar feedback you

play05:48

simply tell each neuron what it did

play05:50

right or sorry if if the entire network

play05:53

right the entire network did right or

play05:57

wrong so the entire network will lead to

play05:59

this feedback you don't have

play06:01

accountability of the individual neurons

play06:03

all you can say is that whatever I'm

play06:05

doing here is wrong and whatever I'm

play06:07

doing here is right so I'm gonna do more

play06:09

of the right things now in back

play06:13

propagation it is very different right

play06:16

in back propagation what you'll do is

play06:18

you'll have your feedback here let's say

play06:20

that's negative 1 and then you do a

play06:24

reverse computation so the forward

play06:27

computation in this case was this

play06:29

weighted sum of this layer now usually

play06:32

layer wise reverse computation which

play06:36

means that you know how

play06:39

this function here this output came to

play06:42

be out of the out of the inputs and that

play06:45

means you can inverse and you can do an

play06:48

inverse propagation of the error signal

play06:51

which is of course the gradient so this

play06:54

would be your your you you would derive

play06:59

your error by the inputs to the layer

play07:04

right so this basically tells in the

play07:07

back propagation algorithm you can

play07:09

exactly determine if you are this node

play07:12

how do I have to adjust my input weights

play07:16

how do I have to adjust them in order to

play07:20

make this number here go down right and

play07:24

then because you always propagate the

play07:27

error according to that what you'll have

play07:31

in each in each layer is basically a

play07:33

vector target so it's no longer just one

play07:35

number but each layer now has a target

play07:37

of vectors and it says okay these are

play07:40

the outputs that would be beneficial

play07:44

please this layer please change your

play07:47

outputs in the direction of negative two

play07:49

negative three plus four so you see this

play07:53

is so the negative two would be this

play07:55

unit the negative three would be this

play07:56

unit and the plus four would be this

play07:58

unit so each unit is instructed

play08:01

individually to say please this is the

play08:05

direction that each unit should change

play08:07

in in order to make this number go lower

play08:11

you see how this is much more

play08:13

information than the perturbation

play08:14

learning in the perturbation learning

play08:16

all the units simply know well the four

play08:18

was bad and now is better

play08:20

so let's you know change a bit and here

play08:25

you have detailed instructions for each

play08:27

unit because of the back propagation

play08:29

algorithm so ultimately people have kind

play08:33

of thought that since back propagation

play08:36

wasn't really possible with biological

play08:39

neurons that the brain might be doing

play08:42

something like perturbation learning but

play08:46

this paper argues that something like

play08:49

back propagation is not only possible

play08:51

but likely

play08:53

in the brain and they proposed this kind

play08:57

of backdrop like learning with the

play08:59

feedback network so they basically

play09:02

concern all the they differentiate hard

play09:06

between these two regimes here in this

play09:09

hand you have the scalar feedback which

play09:12

means that the entire network gets one

play09:15

number as a feedback and the each neuron

play09:18

just gets that number and here you have

play09:21

vector feedback where each neuron gets

play09:23

an individual instruction of how to

play09:26

update and they achieve this not by back

play09:30

propagation because still the original

play09:32

formulation of back prop as we use it in

play09:35

neural networks is not biologically

play09:39

plausible but they achieve this with

play09:41

this backdrop like learning with the

play09:42

feedback network and we'll see how this

play09:46

does but in in essence this feedback

play09:48

network is constructed such that it can

play09:51

give each neuron in the forward pass

play09:54

here detailed instructions on how to

play09:57

update itself right so yeah they have a

play10:06

little bit of a diagram here of if you

play10:08

do hebbian if this if this is an error

play10:11

landscape if you do have you in learning

play10:12

you basically you don't care about the

play10:15

error you're just reinforcing yourself

play10:17

if you do perturbation learning then you

play10:20

it's very slow because you don't have a

play10:24

detailed signal you just you just rely

play10:26

on this one number it's kind of if you

play10:29

were to update every single neuron in

play10:31

your neural network with reinforcement

play10:33

learning considering the output the of

play10:37

the neural networks or the error

play10:39

considering that the reward not using

play10:42

back row and then with back probably

play10:44

have a much smoother much faster

play10:46

optimization trajectory so they looked

play10:51

at this and they they come to some some

play10:55

conclusions first of all so here's

play10:58

here's back prop basically saying back

play11:00

prop as we said you have the forward

play11:03

pass

play11:04

and there you simply compute these

play11:08

weighted averages and you you also pass

play11:14

them usually through some sort of

play11:16

nonlinear activation right and the cool

play11:20

thing about this is in artificial neural

play11:24

networks is that once the error comes in

play11:27

you can exactly reverse that so you can

play11:30

do a backward pass of errors where you

play11:33

can propagate these errors through

play11:34

because you know it's kind of invertible

play11:38

the function doesn't have to be

play11:40

invertible but that the gradients will

play11:43

flow backwards if you know how the

play11:45

forward pass was computed so first of

play11:50

all they go into a discussion of back

play11:53

prop in the brain how can we even expect

play11:57

that and one cool piece of evidence is

play12:01

where I find is that they cite several

play12:05

examples where they use artificial

play12:09

neural networks to learn the same tasks

play12:12

as humans right and or as as animal

play12:18

brains and then I have no clue how how

play12:21

they measure any of this but then they

play12:23

compare the hidden representations of

play12:26

the living neural networks and the

play12:30

artificial neural networks and it turns

play12:32

out that the these the networks that

play12:36

were trained with backpropagation x'

play12:45

then networks that were not trained with

play12:48

backdrop so basically that means if you

play12:50

train a network with backprop it matches

play12:54

the biological networks much closer in

play12:58

how they form their hidden

play12:59

representations and they they do a

play13:02

number they cite the number of

play13:04

experiments here that show this so this

play13:06

gives you very good evidence that if the

play13:10

hidden representations they look as if

play13:13

they had been computed by backdrop and

play13:15

not by any of these scaler update

play13:18

algorithms so it is conceivable that we

play13:23

find backprop in the brain that's why

play13:28

they go here next they go into problems

play13:31

with backdrops so basically why why

play13:34

would we why so far have we believed

play13:37

that back prop isn't happening in the

play13:40

brain

play13:41

so now let's I want to highlight two

play13:45

factors here that that I find a thinker

play13:48

suffice state they have more but first

play13:51

of all back prop demands synaptic

play13:53

symmetry in the forward and backward

play13:55

paths right so basically if you have a

play13:59

neuron and it has output to another

play14:02

neuron what you need to be able to do is

play14:05

to pass back information along that

play14:08

neuron so it kind of has to be a

play14:10

symmetric connection idea of the forward

play14:15

and the backward pass and these need to

play14:17

be exact right and this is just not if

play14:21

you know how neurons are structured they

play14:23

have kind of input dendrites and then

play14:25

there's this accent act action potential

play14:28

and along the axon the signal travels

play14:32

and the back traveling of the signal

play14:35

just I think is very is very very very

play14:38

slow if even possible and so it's

play14:42

generally not invertible or inverse

play14:45

compute capable so this is one reason

play14:49

why that prop seems unlikely and then

play14:52

the second reason here is error signals

play14:55

are signed and potentially extreme

play14:57

valued and i want to add to that they

play15:00

also just talk about this somewhere that

play15:04

error signals are of a different type

play15:06

right that's a different type

play15:12

so first let's see what signed error

play15:16

signals are signed yes we need to be

play15:18

able to adjust neurons in a specific

play15:21

directions right if you look at again

play15:23

what we've drawn before here we said

play15:27

here this is how these neurons must must

play15:31

update

play15:31

so the first neuron must must decrease

play15:36

by two this must decrease by three and

play15:38

this must increase by four now in

play15:41

background we need this but in if if we

play15:47

assume that there is something like a

play15:49

reverse computation or signaling here

play15:53

happening then we still have the problem

play15:57

that usually these output signals are in

play16:00

the form of spiking rates which means

play16:03

that over time right so if a neuron

play16:07

wants to if a neuron has zero activation

play16:11

there's just no signal but if a neuron

play16:13

has a high activation it spikes a lot if

play16:17

has a low activation it kind of spikes

play16:20

sometimes well what he can do is

play16:23

negative spike right like zero is as low

play16:26

as it goes so the the thought that there

play16:29

are signed information in in the

play16:32

backward pass is inconceivable even if

play16:36

you have something like a second so you

play16:37

can imagine here instead of this

play16:39

backward connection because of the

play16:41

symmetry problem we have some kind of

play16:43

second neural network that goes in this

play16:45

direction still you'd have the problem

play16:47

that here you can only have positive

play16:50

signal or a zero and they might be

play16:55

extreme valued which okay it can't be

play16:58

really encoded with the spiking because

play17:00

they are they're limited in the range

play17:02

they can assume but they are also of a

play17:06

different type and I'm what I mean by

play17:09

that is basically if you think of this

play17:10

as a programming problem then the

play17:14

forward passes here are our activations

play17:17

right and the backward passes here they

play17:20

are deltas so in the backward passes

play17:24

view either propagate deltas or you

play17:27

propagate kind of directions so the

play17:32

activations are sort of impulses whereas

play17:39

the backward signals are this isn't how

play17:42

you need to change their their gradients

play17:44

ultimately

play17:45

so it's fundamentally a different type

play17:47

of data that is propagated along would

play17:51

be propagated along these directions and

play17:53

that makes it very unlikely because we

play17:56

are not aware as this paper says that

play17:59

the that neural networks that neurons

play18:02

can kind of switch the data type that

play18:06

they're they're transmitting all right

play18:09

so then the paper goes into their end

play18:14

grad hypothesis and what this is the

play18:17

hypothesis basically states that the

play18:20

brain could implement something like

play18:22

neural networks by using by using an

play18:27

approximate backdrop like algorithm

play18:29

based on autoencoders and I want to jump

play18:32

straight into the algorithm no actually

play18:37

first they do talk about autoencoders

play18:40

which which I find very interesting so

play18:43

if you think of autoencoders what is an

play18:45

autoencoder an autoencoder is a network

play18:48

that basically starts out with an input

play18:52

layer and then has a bunch of hidden

play18:54

layers and at the end it tries to

play18:58

reconstruct its own input right so you

play19:01

feed a data in here you get data out

play19:05

here and then your error the error

play19:08

signal it will be your difference to

play19:11

your original input now the usually when

play19:22

we train autoencoders in deep learning

play19:23

we also train this by back prop right we

play19:25

see then this error here and this goes

play19:27

back but if you just think of single

play19:29

layer autoencoders so um let's let's go

play19:33

over here single layer auto-encoder with

play19:36

let's say the the same number of the

play19:41

same number of units in this in this

play19:46

layer what you'll have is so this this

play19:49

is input this is output and this is the

play19:55

hidden layer right you'll have a weight

play19:58

matrix here

play19:59

and you'll probably have some sort of

play20:01

nonlinear function and then you have

play20:04

another weight matrix here and they call

play20:06

them W and B another way to draw this is

play20:09

I have weight matrix going up then I

play20:12

have a nonlinear function going

play20:15

transforming this into this signal and

play20:18

then I have the be going back right so

play20:24

I'm drawing I'm drawing it in two

play20:27

different ways up here or over here and

play20:30

with the second way you can see that it

play20:32

is kind of a forward backward algorithm

play20:35

where now the error if you look at what

play20:39

is the error here the error is the

play20:41

difference between this and this and the

play20:44

difference between this and this and the

play20:47

difference between this and this right

play20:50

and you can train an autoencoder

play20:53

simply by saying W please make sure that

play21:01

the that the the the input here gets

play21:07

mapped closer to the output and to be

play21:10

the same thing this will become clear in

play21:14

a second so but basically sorry this I

play21:23

mean the the hidden representations

play21:25

you'll see basically the idea is that

play21:28

you can train an autoencoder only by

play21:32

using local update rules you don't have

play21:35

to do back prop and that's what this

play21:37

algorithm is proposing namely if you

play21:40

think of a stack of autoencoders this

play21:43

this this transforming one hidden

play21:46

representation into the next right this

play21:48

is the feed-forward function what you

play21:51

can do is you first of all you can

play21:55

assume that for each of these functions

play21:58

here you have a perfect inverse right

play22:01

you can you can perfectly compute the

play22:03

inverse function that's this this G here

play22:07

of course this doesn't exist but assume

play22:10

you have it

play22:12

what you then could do is you could if

play22:17

if you knew in one layer and on the top

play22:22

layer of course you know if you knew

play22:24

that okay I got this from my forward

play22:28

pass but I would like to have this this

play22:30

is my desired output right so in the

play22:33

output layer you get this this is your

play22:35

error signal if you knew you you you

play22:41

could compute an error right here this

play22:43

is what you do in the output right now

play22:45

in back prop we would back propagate

play22:48

this error along the layers but now we

play22:51

don't do this instead of what we do is

play22:53

we use this G function to invert the F

play22:58

function right and by that what we'll

play23:03

say is what hidden representation in

play23:08

layer two what should the hidden

play23:11

representation have been that in order

play23:14

for us to obtain this thing right so the

play23:19

the claim here is if in layer two we had

play23:23

had H two as a hidden representation

play23:26

then we would have landed exactly where

play23:30

we want it right that's what this G

play23:33

function does because here we use F so

play23:36

had we had F h2 and used F on it we

play23:41

would be exactly where we want instead

play23:43

we had h2 here and used F on it and then

play23:47

we landed here where we don't want so

play23:50

this is where we want we would want to

play23:54

be in layer two and this is where we

play23:57

were so again we can compute an error

play24:00

here again instead of back propagating

play24:03

that error what we'll do is we'll use

play24:05

the inverse of the forward function in

play24:08

order to back propagate our desired

play24:11

hidden representation and you can see

play24:15

there is of course a relationship to the

play24:17

true back prop here but the the

play24:20

important distinction is we are not

play24:22

trying to back propagate the error

play24:24

signal we're trying to

play24:26

invert the desired hidden states of the

play24:29

network and then in each layer we can

play24:32

compute from the forward pass we can

play24:35

compute the difference to the desired

play24:38

hidden state and thereby compute an

play24:41

error signal and now we have achieved

play24:43

what we wanted we want an algorithm that

play24:46

doesn't do back prop that only uses

play24:49

local information in order to compute

play24:54

the error signal that it needs to adjust

play24:57

and by local I mean information in the

play24:59

same layer and also the data type that

play25:03

is propagated by F is activations right

play25:07

of hidden representations and by G is

play25:10

also activations of hidden

play25:12

representations both of them are always

play25:15

positive can be encoded by spiking

play25:17

neurons and so on so this algorithm

play25:20

achieves what we want they go bit into

play25:23

detail how the actual error update here

play25:27

can be achieved and apparently neurons

play25:30

can achieve you know in the same layer

play25:32

to to adjust themselves to a given

play25:36

desired activation so this algorithm

play25:41

achieves it of course we don't have this

play25:43

G we don't have it and therefore we need

play25:46

to go a bit more complicated what they

play25:50

introduces the this following algorithm

play25:53

the goals are the same but now we assume

play25:56

we do not have a perfect inverse but we

play25:58

have something that is a bit like an

play26:03

inverse so we have an approximate

play26:05

inverse and they basically suggest if we

play26:08

have an approximate inverse we can do

play26:10

the phone so G G is now an approximate

play26:12

inverse to F what we can do is this is

play26:15

our input signal right we use F to map

play26:18

it forward to this and so on all the way

play26:22

up until we get our true or error right

play26:26

here this is our error from the

play26:28

environment right this is the nail being

play26:30

wrong and then we do two applications of

play26:35

G right so this is an application of F

play26:38

we do to applet

play26:39

of g1 we applied g2 this to what we got

play26:45

in the forward pass right and this now

play26:50

gives us a measure of how bad our

play26:52

inverse is right so if G is now an

play26:55

approximate inverse and this now we see

play26:58

here oh okay we we had a ch2 in the

play27:01

forward pass and we basically forward

play27:05

passed and then went through our inverse

play27:07

and we didn't land quite exactly where

play27:09

we started but we know that okay this

play27:13

this is basically the difference between

play27:16

our our inverse our forward inverse H

play27:19

and our true H and then we also back

play27:25

project using G again the desired

play27:30

outcome so we invert the desired outcome

play27:34

here now before we have adjusted

play27:37

directly these two right because we said

play27:40

this is what we got this is what we want

play27:43

but now we include for the fact that G

play27:47

isn't a perfect inverse and our

play27:49

assumption is that G here probably makes

play27:53

about the same mistakes as G here so

play27:56

what we'll do is we'll take this vector

play27:59

right here and apply it here in order to

play28:03

achieve this thing and this thing is now

play28:06

the corrected thing our corrected to

play28:10

desired hidden representation correct

play28:12

for the fact that we don't have a

play28:13

perfect inverse and now again we have

play28:16

our error here that we can locally

play28:19

adjust again all the signals propagated

play28:22

here here and here are just neural

play28:26

activations and all the information

play28:29

required to update a layer of neurons is

play28:31

now contained within that layer of

play28:34

neurons right and and this goes back

play28:38

through the network so this is how they

play28:41

achieve how they achieve this this is a

play28:46

bit of a close-up look and here are the

play28:50

computations to do this so basically

play28:53

for the forward updates you want to

play28:56

adjust W into the direction of the H

play29:01

minus the H tilde and the H tilde in

play29:04

this case would be this the the hidden

play29:07

representation that you would like to

play29:09

have so you will update your forward

play29:12

forward weights into the direction such

play29:15

that your hidden representations are

play29:16

closer sorry that your forward haven

play29:19

representation is closer to your

play29:21

backward hidden representation and the

play29:24

backward updates now your goal is to get

play29:27

a more a better to make G so sir W here

play29:33

is our W or the weight of F and B or the

play29:39

weights of G so in the backward updates

play29:42

your goal is to make G a better inverse

play29:44

right so what you'll do is again you'll

play29:48

take the difference between now you see

play29:52

the difference here here here right not

play29:55

the same error so here you will you in

play29:58

the W update use what we labeled error

play30:02

here in the G update you use this error

play30:07

here so this is the error of G so when

play30:13

you update the function G you want to

play30:15

make these two closer together such that

play30:19

G becomes a better inverse right because

play30:22

you're dealing with an approximate

play30:23

inverse you still need to obtain that

play30:25

approximate inverse end and this here is

play30:28

how you learn it this algorithm now

play30:32

achieves what we wanted right

play30:35

local updates data types check signed

play30:38

check and so on I hope this was enough

play30:42

clear in essence is pretty simple but

play30:46

it's pretty cool how they work around

play30:49

this they call this a different story

play30:51

with propagation and not these these

play30:57

kind of papers I don't think they

play30:59

invented this maybe I'm not sure maybe

play31:06

they did

play31:07

maybe they didn't and this paper just

play31:09

kind of frames it in this hypothesis it

play31:13

is unclear to me I am not familiar with

play31:17

this kind of papers so sorry if I miss

play31:20

attribute something here all right

play31:23

then they go into into how could these

play31:27

things be implemented biologically and

play31:29

they go for some evidence and they also

play31:32

state that we used to look at neurons

play31:34

basically in this way where you had

play31:36

input and feedback here very simple

play31:42

simplistic view of neurons whereas

play31:45

nowadays even the company computational

play31:48

community views neurons in a more

play31:51

differentiated way where you have for

play31:54

example different regions here on the

play31:57

soma that can be separated from each

play32:01

other and you have inter neuron

play32:03

interference and so on I'm not qualified

play32:05

too much to comment on this stuff but I

play32:11

invite you to read it for yourself if

play32:13

you want alright so this was my take on

play32:16

this paper I find the algorithm they

play32:19

propose pretty cool if you I hope you

play32:22

liked it and check it out bye bye

Rate This

5.0 / 5 (0 votes)

Related Tags
反向传播大脑学习神经网络Geoffrey HintonHebbian学习误差信号自编码器算法生物可实现性机器学习
Do you need a summary in English?