Geoffrey Hinton Unpacks The Forward-Forward Algorithm
Summary
TLDR在这段视频脚本中,Craig Smith采访了深度学习领域的先驱Jeffrey Hinton,探讨了他提出的前馈-前馈(forward-forward)算法。Hinton教授对大脑如何处理信息充满好奇,尤其是他所提出的算法,旨在模拟大脑皮层的学习方式。该算法分为在线(清醒时)和离线(睡眠时)两个阶段,在线阶段网络通过高活动性来识别真实数据,而离线阶段则生成自身的数据并尝试低活动性。Hinton认为,这种方法比传统的反向传播算法更符合大脑的工作机制,因为它不需要完美的前向系统模型,也不需要在时间上反向传播。此外,他还讨论了如何生成负数据,以及如何利用这种算法进行有效学习。整个讨论技术性很强,但即便对于那些不熟悉技术细节的听众,Hinton教授的洞察力也提供了宝贵的视角。
Takeaways
- 🤖 神经网络和深度学习先驱杰弗里·辛顿(Geoffrey Hinton)提出了一种新的学习算法——前馈前馈(forward-forward)算法,他认为这可能是大脑皮层学习的一种更合理的模型。
- 🧠 辛顿认为传统的反向传播(back propagation)算法并不能完全解释大脑是如何处理信息的,因为大脑中并没有证据表明信息是以反向传播的方式进行处理的。
- 🔁 前馈前馈算法包含在线(在线相位对应于清醒状态)和离线(离线相位对应于睡眠状态)两个阶段,分别用于处理真实数据和生成数据。
- 🌐 在线阶段,网络尝试让每一层对真实数据有高活动性,而在离线阶段,网络生成自己的数据并尝试让每一层的活动性降低。
- 📉 该算法的一个关键特点是它不需要对前向系统有一个完美的模型,这与传统的反向传播算法不同,后者需要精确的前向系统模型来进行权重更新。
- 🚀 辛顿讨论了将静态图像转换为“视频”,以便在时间维度上实现自上而下的效应,这有助于处理动态视觉信息。
- 👶 辛顿提到婴儿在很短的时间内就能学习到结构从运动中获得的3D感知,这表明大脑能够快速学习到3D结构与运动之间的关系。
- 🔧 辛顿在Matlab上进行了大量的实验,以测试和证明前馈前馈算法的有效性,尽管Matlab并不是处理大型系统的最有效工具。
- 📈 辛顿认为,如果我们能够理解大脑的工作原理并复制它,那么创建具有推理能力的模型是有可能的,但他对于意识(consciousness)的本质持谨慎态度。
- 🤔 辛顿强调,我们对意识的理解可能类似于100年前的“生命力”概念,即试图用简单的精华来解释复杂机制,而这种理解可能并不准确。
- 💡 辛顿提到,他的知识很大程度上来自于与人交谈,而不是阅读,他认为与人交谈是获取知识的高效方式。
Q & A
什么是深度学习中的前馈前向(forward-forward)算法?
-前馈前向算法是杰弗里·辛顿提出的一种新的学习算法,它被认为比反向传播(backpropagation)算法更可能是大脑皮层学习信息的一个合理模型。该算法将学习过程分为在线(清醒时)和离线(睡眠时)两个阶段,在线阶段网络通过输入真实数据并尝试让每一层都保持高活动水平来区分真实数据和假数据;离线阶段网络生成自己的数据,并尝试让每一层的活动水平降低,以此来学习生成模型。
为什么杰弗里·辛顿认为反向传播算法不是大脑处理信息的方式?
-辛顿认为反向传播算法需要一个完美的前向系统模型,并且在大脑中没有证据表明信息会以反向传播所需的方式反向流动。此外,反向传播在递归网络中的实现会遇到时间上的挑战,例如在处理视频时无法简单地停止并反向传递。
在前馈前向算法中,如何生成负数据(negative data)?
-在前馈前向算法中,负数据是在网络运行在负阶段时输入的数据,目的是在所有隐藏层中获得低活动水平。负数据可以由模型自身生成,也可以通过手动输入不正确的标签或图像来提供。最终目标是让模型自己生成负数据,当模型足够好时,负数据看起来就像真实数据一样,此时学习过程结束。
前馈前向算法中的“在线阶段”和“离线阶段”是如何工作的?
-在线阶段,网络接收输入并处理真实数据,目标是让每一层的活动水平足够高,以便于区分真实数据和假数据。而在离线阶段,网络尝试生成自己的数据,并希望在给定自己生成的数据作为输入时,每一层的活动水平降低。这样可以训练网络学习一个生成模型,并区分真实数据和假数据。
为什么杰弗里·辛顿认为前馈前向算法可能更适合低功耗计算机架构?
-前馈前向算法不需要对前向系统有一个完美的模型,它只需要对单个神经元的行为有一个足够好的模型,以便能够改变进入该神经元的权重,使其更活跃或不那么活跃。这种算法可以在包含未知黑盒的系统中学习,这使得它在低功耗硬件上更具潜力。
杰弗里·辛顿是如何开发和测试他的新学习算法的?
-辛顿通过在Matlab中编写和测试小规模的模型来开发和测试他的新学习算法。他会花费时间思考概念,并在Matlab中实现这些想法,以检验它们是否有效。对于大多数原始想法,他发现它们是错误的,而Matlab的便利性在于能够快速证明这些想法的错误。
为什么杰弗里·辛顿认为意识不是一个简单的概念?
-辛顿认为意识是一个极其复杂的概念,人们常常将其混为一谈,试图用简单的“本质”来解释复杂的机制。他认为,意识的讨论类似于过去的“生命力”概念,后者是一个无法具体定义的概念,用来模糊地解释生物为何是活的。辛顿强调,意识的感知和体验实际上是一种关于假设性状态的陈述,而不是某种特殊的内部本质。
在前馈前向算法中,网络如何处理静态图像和视频数据?
-在处理静态图像时,前馈前向算法可以将其视为视频数据的一个特例,其中视频帧没有随时间变化。而在处理视频数据时,网络会在时间维度上进行前向传递,允许顶层的预测影响底层的表示,从而实现对动态场景的理解。
杰弗里·辛顿是如何与他人交流和学习新知识的?
-辛顿通过与不同领域的专家交谈来交流和学习新知识。他与认知科学家、神经科学家和心理学家等人交流,从他们那里获取信息。由于阅读速度较慢,且在阅读方程时会受到阻碍,他认为与人交谈是获取知识的有效方式。
为什么杰弗里·辛顿认为在当前阶段,将前馈前向算法扩展到大型系统还为时过早?
-辛顿认为,由于前馈前向算法在实现时需要一些特定的技巧,而这些技巧比反向传播算法更为复杂,因此在基本算法的基本原理和最佳实践方法得到充分理解和验证之前,将其扩展到大型系统是不成熟的。
杰弗里·辛顿是否认为前馈前向算法在未来有可能超越基于反向传播的模型?
-辛顿并不确定前馈前向算法是否会超越基于反向传播的模型。他认为反向传播在给定数量的连接中可以压缩更多的知识,而前馈前向算法则在处理大量连接时可能不那么有效。但他也指出,大脑的主要问题不是知识压缩,而是如何有效地利用经验来获取信息。
在前馈前向算法中,网络是如何区分真实数据和假数据的?
-网络通过每一层的活动水平来区分真实数据和假数据。在线阶段,网络处理真实数据时,每一层都试图保持高活动水平;而在离线阶段,网络生成自己的数据(假数据)时,每一层都试图降低活动水平。通过这种方式,网络可以学会区分真实数据和假数据。
Outlines
🤖 深度学习先驱杰弗里·辛顿的前馈前向算法
本段落介绍了人工智能领域的杰出科学家杰弗里·辛顿,他对神经网络和深度学习的贡献,以及他提出的前馈前向算法。辛顿认为传统的反向传播算法并不能完全模拟大脑处理信息的方式,因此他提出了一种新的学习算法,旨在更合理地模拟大脑皮层可能的学习方式。
🔍 前馈前向算法的在线与离线学习阶段
第二段详细描述了前馈前向算法的在线(觉醒)和离线(睡眠)学习阶段。在线阶段,网络处理输入数据,尝试区分真实数据和假数据。而在离线阶段,网络生成自己的数据,并尝试区分自己生成的假数据和真实数据。辛顿还讨论了如何通过正负数据的区分来训练网络,以及这一过程与生成对抗网络(GANs)的相似之处。
📈 正负数据的生成与学习信号
在第三段中,辛顿探讨了正负数据的概念,以及如何在网络的负相位中使用这些数据来降低隐藏层的活动。他解释了如何通过模型生成负数据,以及如何利用正负数据的统计差异作为学习信号。此外,辛顿还讨论了在监督学习中如何手动输入负数据,并解释了如何通过这种方式训练网络。
🔢 数字识别与预测的简单学习模型
第四段通过一个简单的数字识别和预测任务来阐述前馈前向算法的工作原理。辛顿描述了如何使用正相位来提高隐藏层的活动,并利用这些活动来预测下一个字符。在负相位,网络则尝试降低由预测字符引起的隐藏层活动,以此来优化模型。
🕒 时间维度在神经网络中的应用
第五段讨论了时间维度在神经网络中的作用,特别是在处理视频数据时。辛顿解释了如何在网络中模拟时间的流逝,以及如何通过这种方式来处理动态图像。他还提到了如何通过快速输入来挑战网络的稳定性,并探讨了在时间维度上进行信息处理的复杂性。
🧠 胶囊网络与前馈前向算法的结合
第六段中,辛顿讨论了胶囊网络的概念,以及如何将它们与前馈前向算法结合起来。他提到了胶囊网络如何能够表示不同类型的对象,并且在不同的层次上构建对象的表示。辛顿还提到了胶囊网络如何能够处理3D结构,并预测从不同视角观察对象时的图像。
🌐 3D现实感知与前馈前向算法的未来
在第七段,辛顿表达了他对于前馈前向算法在处理3D现实感知方面的期望。他认为,如果算法能够成功模拟大脑皮层的信息处理方式,那么对深度和3D现实世界的感知将自然地出现。他还讨论了如何通过视频和观察角度的变化来训练网络学习3D结构。
⚙️ 算法的优化与硬件的配合
第八段中,辛顿讨论了算法优化和硬件配合的重要性。他提到了如何通过不同的目标函数来寻找数据中的特征或约束,并讨论了在大型系统中应用前馈前向算法的潜力。辛顿还提到了对于新型计算机的需求,以及如何利用硬件的自然属性来实现更高效的计算。
🧐 意识、感知与人工智能的未来
在最后一段中,辛顿探讨了意识和感知的概念,以及它们如何与人工智能的未来相联系。他批评了当前对意识的理解,认为意识是一个复杂的机制,而不是一个简单的概念。辛顿还讨论了如何通过描述可能引起某种大脑状态的假设情况来传达个人的感受和经验。
Mindmap
Keywords
💡深度学习
💡反向传播
💡前馈-前馈算法
💡清醒(wake)阶段
💡睡眠(sleep)阶段
💡特征和约束
💡胶囊网络
💡生成模型
💡梯度
💡认知科学
💡意识
Highlights
克雷格·史密斯介绍了杰弗里·辛顿,一位在神经网络和深度学习领域的先驱,他提出了前馈-前馈算法,认为这可能是大脑皮层学习过程的一个更合理的模型。
辛顿不认为反向传播算法解释了大脑如何处理信息,他提出了前馈-前馈算法,该算法将学习过程分为在线和离线两个阶段。
在线阶段,网络的目标是区分真实数据和假数据,每一层都应有高活动性以表示真实数据。
离线阶段,网络需要生成自己的数据,尝试让每一层对假数据有低活动性,这需要网络学习一个生成模型。
辛顿讨论了前馈-前馈算法在处理视频等时间序列数据时的优势,因为它允许信息的流水线处理。
算法的正相位旨在获得高活动性,而负相位则尝试获得低活动性,这与生成对抗网络中的判别和生成模型有相似之处。
辛顿提到,尽管前馈-前馈算法在技术上具有挑战性,但它不需要完美的前向系统模型,这使得它在硬件上的实现更加灵活。
他强调了在学习算法中区分真实数据和假数据的重要性,并解释了如何通过活动性高低来实现这一目标。
辛顿讨论了在正相位和负相位之间切换的策略,以及如何在没有混合这两个阶段的情况下进行有效学习。
他解释了负数据的概念,即在负相位中用于尝试降低隐藏层活动性的数据,并且可以由模型本身生成或手动输入。
辛顿提供了一个简单学习练习的示例,说明了正数据和负数据如何用于训练网络。
他讨论了前馈-前馈算法如何可能适用于处理自然语言、视觉处理和常识推理等任务。
辛顿认为,如果我们能够理解大脑的工作方式并复制它,那么就有可能实现具有推理能力的模型。
他对于意识的讨论表明,意识可能不像人们想象的那样是一个简单的概念,而是许多不同概念的混合体。
辛顿分享了他在Matlab上的工作,包括他对前馈-前馈算法的实验和探索。
他强调了在计算机视觉领域,深度学习模型的成功是如何挑战传统研究范式的。
辛顿提到,他希望通过分享Matlab代码,激励更多的人参与到前馈-前馈算法的研究和实现中。
他讨论了在幼儿学习中观察到的一些现象,以及这些现象如何启发他对算法的研究。
辛顿分享了他对于如何有效地将信息整合到大脑的思考,以及这与前馈-前馈算法的潜在联系。
他对于未来计算机架构的展望,特别是在处理自然语言和视觉任务方面的低功耗计算机。
Transcripts
seeing a pink elephant notice the words
pink and elephant refer to things in the
world
so what's actually happening is I'd like
to tell you what's going on inside my
head hi I'm Craig Smith and this is I
onai
[Music]
Jeffrey Hinton a Pioneer in neural
networks and the man who coined the term
deep learning has been driven throughout
his career to understand the brain
well his application of the back
propagation of error algorithm to deep
Networks
set off a revolution in artificial
intelligence he doesn't believe that it
explains how the brain processes
information
late last year he introduced a new
learning algorithm
which he calls the forward forward
algorithm that he believes is a more
plausible model for how the cerebral
cortex might learn a lot has been
written about the forward forward
algorithm in recent weeks but here Jeff
gives us a deep dive into the algorithm
and the journey that led him to it the
conversation is Technical and assumes a
lot of knowledge on the part of
listeners
but my advice for those that don't have
that knowledge is to let the technical
stuff wash over you and listen instead
for Jeff's insights before we begin I'd
like to mention our sponsor clearml an
open source end-to-end ml op solution
you can try it for free at clear.ml
that's
c-l-e-a-r dot ml
tell them I on a I sent you now here's
Jeff I hope you find the conversation as
fascinating as I did
foreign
to listeners forward forward networks
and why you're looking for something
Beyond back propagation despite its
tremendous success let me start with
explaining why I don't believe the brain
is doing back population one thing about
back propagation is you need to have a
perfect model of the forward system
that is impact propagation it's easiest
to think about for a layered net but it
also works for recurrent Nets
for a layered net you do a forward pass
where the input comes in at the bottom
and goes through these layers so the
input might be pixels and what comes out
the top might be a classification of is
it a cat or a dog
you go forwards through the layers
and then you look at the error in the
output
if it says cat when it should say dog
that's wrong and you'd like to figure
out how to change all the weights in the
forward pass so that
next time it's more likely to say the
right category rather than the wrong one
so you have to figure out how a change
in a weight would affect
the how much it gives the right answer
and then you want to go off and change
all the weights in proportion to how
much they help in getting the right
answer
and back propagation is a way of
figuring out that gradient we're
figuring out how much a change in the
weight would make the system have less
error and then you change the weight in
proportion to how much it helps and
obviously if it hurts you change it in
the opposite direction
now about propagation
looks like the forward pass but it goes
backwards it has to use the same
connectivity pattern with the same
weights but in the backwards Direction
and it has to go backwards through the
non-linearity of the neuron there's no
evidence that the brain is doing that
and there's lots of elements it's not
doing that
so the worst case is if you're doing
back propagation in a recurrent net
because then you run the recurrent net
forwards in time
and it outputs an answer at the end of
running forwards in time
and then you have to run it backwards
through time
in order to get all these derivatives so
I had to change the weights
and that's particularly problematic if
for example if you're trying to process
video
you can't stop and go backwards in time
so combined with the fact that there's
no evidence the brain does it
well no good evidence
there's the problem that just for
technology it's a mess it interrupts the
pipelining of stuff through so you'd
really like something like video there's
been multiple stages of processing and
you'd like to just pipeline the inputs
through those multiple stages and just
keep pipelining it through
and so the idea of the Ford algorithm
is that if you can divide
the learning the process of getting the
gradients you need into two separate
phases you can do one of them online and
one of them offline
and the way you do online can be very
simple and will allow you to just
pipeline stuff through
so the online phase which is meant to
correspond to wake
you put input into the network
and let's take the recurrent version
input keeps coming into the network
and what you're trying to do
for each layer at each time step
you're trying to make
the layer of high activity
or rather high enough activity so that
it can
figure out that this is real data
so the underlying idea is for Real data
you want every layer to have high
activity and for fake data what comes
out we get that later you'd like every
layer to have low activity
and the task of the network or the thing
it's trying to achieve
is not to give the correct label as in
back propagation is trying to achieve
this property but being able to tell the
difference between real data and fake
data at every layer by each layer having
high activity for real data and no
activity for fake data
so each layer has its own objective
function
in fact to be more precise we take the
sum of the squares of the activities of
the units in a layer
we subtract off some thresholds
and then we feed that to a logistic
function that simply decides what's the
probability that this is a is real data
as opposed to fake data
and if the logistical function gets a
lot of input it will say it's definitely
real data
and so there's no need to change
anything if it's getting lots of input
you won't learn on that example because
it's already getting it right
and that explains how you can run lots
of positive examples without running any
negative examples which are fake data
because it'll just saturate on positive
examples it's getting right
so that's what it does in the positive
phase it tries to get high sum of
squared activities in every layer so
that it can tell high enough so it can
tell that it's real data
in the negative phase
which is run Offline that is during
sleep
the network needs to generate its own
data
and try and get given its own data as
input
it wants to have low activity in every
layer
so the network has to learn a generative
model
and what it's trying to do is
discriminate between real data and fake
data produced by its generative model
obviously if it can't discriminate at
all
then what's going to happen is the
derivatives that it gets for real data
and the derivatives we get for fake data
will be equal and opposite so it won't
learn anything learning will have
finished then if you can't tell the
difference between what it generates and
real data
this is very like again if you know
about generative adversarial Networks
except that the discriminative net
that's trying to tell the difference
between real and fake and the generative
model that's trying to generate fake
data use the same hidden units and so
they use the same hidden representations
that overcomes a lot of the problems
that a gun has
on the other hand because it's not doing
back propagation to learn the generative
model it's harder to learn a good
General model
that's a rough overview of the algorithm
let me ask one a couple of questions on
the awake and sleep cycle are you
cycling
quickly between them
okay so most of the research what I
would do is
the preliminary research cycle quickly
between them because that's the obvious
thing to do
and later on I discovered
well I've known for some time that with
contraceptive learning you can separate
the phases
and later on I discovered it worked
pretty well to separate the phases
recent experiments I've done with
predicting characters
You can predict
you can have it predict about a quarter
of a million characters so it's running
on real data trying to predict the next
character is making predictions he's
running with mini batches so after
making quite a large number of
predictions they're going to updates the
weights and then it sees more positive
examples it updates away scan so in all
those phases it's just trying to get
higher activity
in the hidden layers
but only if it's not already got high
activity
and you can predict like quarter of a
million characters in the positive phase
and then switch to the negative phase
where the Network's generating its own
string of characters
and
it you're now trying to get
low activity in the hidden layers for
the characters it's predicting
it's looking a little window characters
and then you run for quarter of a
million characters like that and it
doesn't actually have to be the same
number anymore we've bought some
machines it's very important to have the
same number of things in the positive
phase and negative phase but with this
it isn't
the most remarkable is
that up to a few hundred thousand
predictions
it works almost as well if you separate
the phases
as opposed to interleave
and that's quite surprising
in human learning
certainly in the we can sleep for
complicated Concepts that you're
learning but there's learning going on
all the time that
doesn't require a sleep phase well there
is in this too if you're just running on
positive examples
it's changing the weights
for all the examples where it's not
completely obvious that this is a
positive data
so it will do a lot of it does a lot of
learning in the positive phase
but if you go on too long you fails
catastrophically
and people seem to be the same if I
probably sleep for a week you'll go
completely psychotic
and job hallucinations and you may never
recover
can you explain I think one thing that
people are having trouble
non-practitioners are having trouble
understanding is the concept of negative
data I've seen a few articles where they
just put it in quotation marks out of
your paper
which indicates that they don't
understand it
okay
what I mean by negative data is data
that you give to the system
when it's running in the negative phase
that is when it's trying to get low
activity in all the hidden layers
and there are many ways of generating
negative data in the end you'd like the
model itself to generate the negative
data
so this is just like it was in Baltimore
machines the data that the model itself
generates
is negative data and real data is what
you're trying to model
and once you've got a really good model
the negative data looks just like the
real data so no loan takes place
but negative data doesn't have to be
produced by the model so for example
you can train it to do supervised
learning
by inputting both an image and the label
so now the label's part of the input not
part of the output
and what you're asking it to do is
when I input an image with the correct
label that's going to be the positive
data you want to have high activity you
want to input an image with the
incorrect label which I just put in by
hand that's the incorrect as an
incorrect label that's negative data
now it works best if you get the model
to predict the label and you put in
the best of the model's predictions it's
not correct
because then you're giving it the things
it's most the mistake is most likely to
make as negative data
but you can put in negative data by hand
and it works fine
and the
the reconciliation then at the end
is it
as in boltzmann machines where you're
subtracting the negative data from the
positive data
but in both machines what you do is you
give it positive data real data
and you let it settle to equilibrium
which you don't have to do with the
forward-forward algorithm
well not exactly anyway
and
once I started equilibrium you measure
the pairwise statistics that is how
often two units that are connected are
on together
and then in the negative phase you do
the same thing
with stuff you just let the model settle
as producing data itself
and you mentioned the same statistics
and you take the difference of those
pairwise statistics and that is the
correct learning signal for a basketball
machine but the problem is you have to
let the model Settle yeah and there just
isn't time for that also you have to
have all sorts of other conditions like
the connections have to be symmetric
there's no evidence Connections in the
brain symmetric
can you give a concrete example of
of positive and negative data in a very
simple
learning exercise you were working on
digits
in this example I think is if you're
predicting a string of characters the
positive data you'd see a little window
of characters
and should have some hidden letters
and because that's a positive window of
characters you try and make the activity
high in all the hidden layers
but also from those hidden layers from
the activity in those hidden layers
you would try to predict the next
character
that's a very simple geometry model
but notice the geometry model isn't
having to learn its own representation
so representations are learned just to
make positive strings of characters give
you high activity in all the hidden
notes that's the objective of the
learning the objective isn't to predict
the next character
but having done that learning you've got
the right representations for these
strings of characters these windows of
characters you also learn to predict the
next character
and that's what you're doing in the
positive phase
seeing Windows of characters you're
changing the weights so that all the
hidden layers have high activity for
those windows or characters
but you're also changing
top down weights that are trying to
predict the next character from the
activity in the hidden lads that's
what's sometimes called a linear
classifier
so that's the positive face in the
negative phase
you as input you use characters that
have been predicted already so you've
got this window and you're going along
and just predicting the next character
and then moving the window along one to
include the next character you predicted
and to drop off the oldest character you
just keep going like that
and for each of those frames
you try and get low activity in the
hidden wires because it's negative data
and I think you can see that if your
predictions were perfect
and you start from a string a real
string then
the what's happening in the negative
phase will be exactly like what's
happening in the positive phase right
and so the two will cancel out
but if there's a difference
then you'll be
learning to make things more like the
positive phase and less like the
negative phase
and so it'll get better and better at
predicting
is I understood back propagation on
static data
there are inputs there's an output and
you calculate the error
and then you run backwards through the
network and
correct the weights and then do it again
and that's not a good model for the
brain because there's no evidence of
information flowing backward through the
neurons it's not that's not exactly the
right wish that there's no no good
evidence of
derivative information thrown back the
studies these error gradients flowing
backwards okay obviously the brain has
top down connections if you look at the
perceptual system there's a kind of
forward direction that goes from
the thalamus up to him for a temporal
cortex where you recognize things and
the thalamus is a sort of where the
input comes in from the eyes
and there's Connections in the backward
Direction but the connection in the
backward Direction don't look at all
like what you'd need for back
propagation for example in two cortical
areas the connection is coming back
don't go to the same cells as
connections going forward come from
it's not reciprocal in that sense yeah
there's a loop between the cortical
areas but information in one course got
area goes through about six different
neurons before it gets back to where it
started
and so it's a loop it's not uh it's not
like a mirrored system
okay but my question is you talk about
turning the static image into a boring
video that allows you to have top-down
effects that's right yeah so you have to
think of the being a forward Direction
which is going from lower layers to
higher layers
and then orthogonal to that was the time
dimension
and so if I have a video even if it's a
video of just a single thing that stays
still
I can be going up and down through the
layers as I go forwards in time
and that's what's allowing you to have
top down effects
okay I understood that yeah each layer
can receive inputs from a higher layer
in the previous time step exactly yeah
so what a layer is doing it's receiving
input from higher layers
and lower layers at the previous time
step and from itself at the previous
time step
and if you've got static input
that whole process over time looks like
a network settling down
that's a bit more like a Baltimore
machine settling down
and the idea is that
the time that you're using for that is
the same as the time you're using for
posting video
and because of that
if I give you fast input that's changing
too fast you can never settle down to
interpret it
so I discovered this nice phenomenon if
you take a new regularly shaped object
like a potato for example a nice
irregularly shaped potato
and you throw it up in the air rotating
slowly at one or two revolutions per
second
you cannot see what shape it is you just
can't see the shape of it
you don't have time to settle on a 3D
interpretation
because it's the very same
time steps that you're using for posting
videos you're using for settling with a
static image
and what I found fascinating about and
maybe this is something that that is
already
in the literature but this idea of going
up and down in the layers
As you move through time
but it's that's always been in recurrent
Nets so to begin with recurrentness we
just have one hidden layer so typical
lstms and so on would have one hidden
there and then Alex Graves
the idea of having multiple hidden
layers and showed that it was a winner
so that idea has been around but it's
always been paired with back propagation
as the learning algorithm and in that
case it was back propagation through
time which was completely unrealistic
but
and the Brain real life is not static so
you're not perceiving in a truly static
fashion how much of this grew out of
Sinclair's contrast of learning or end
grads activity differences
a couple of years ago I got very excited
because I was trying to make a more
biologically plausible version of things
like Sim clear there's a whole bunch of
things like simple it simply wasn't the
first of them
in fact it's something a bit like
simpler that Sue Becker and I published
in about 19 1992 in nature
but we didn't use negative examples we
tried to analytically compute the
negative phase and that wasn't
there was a mistake it just that would
never work
um
once you start using negative examples
then you get things like simply
and I discovered that you could separate
the phases that they didn't and that got
me very excited a few years ago because
it seemed like I only had an explanation
for what sleep was for
one big difference is
simply is taking two different Patches
from the same image
and if they're from the same image it's
trying to make them have a similar
representation is they're from different
images it's trying to make them have
different representations sufficiently
different once they're different it
doesn't try and make them more different
and
when you think how to say this
simply involves looking at two
representations and seeing how similar
they are
and that's one way to measure agreement
and in fact if you think about the
squared difference between two vectors
that decomposes into three terms
the sun is to do with the square of the
first vector
there's something to do with the square
of the second vector
and then there's the
scalar product of the two vectors
and the scalar product of the two
vectors is the only kind is the only
interactive term
and so it turns out that
squared difference is very like a scalar
product
a big Square difference
is like a small scale of product
now there's a different way to measure
agreement
which is to take the things you'd like
to agree and feed them into one set of
neurons
and now if two sources coming into that
set of neurons are green
you'll get high activity in those
neurons it's like positive interference
between light waves
and if they disagree you'll get low
activity
and if you measure agreement just by the
activity in a layer of neurons
you're measuring an agreement between
the inputs then you don't have to have
two things you can have as many things
as you like you don't have to
divide the input into two patches and
say to the representation of the two
patches agree you can just say I've got
a hidden letter does this hidden layer
get highly active
and it seems to me that's a better way
to measure agreement it's easier for the
brain to do
and it's particularly interesting if you
have spiking neurons
because what I'm using at present
doesn't use Spike Insurance it just says
a hidden layer is really asking
are my inputs agreeing with each other
in which case I'll be highly active or
are they disagree in which case I won't
but if the inputs arrive at specific
times very precise times like spikes do
then you can ask not just other stem
neurons being stimulated
but are they being stimulated at exactly
the same time
and that's a much sharper way to measure
agreement so spiking neurons seem
particularly good for measuring
agreement which is what I need
that's the objective function to get
agreement in the positive phase is not
in the negative phase
and
I'm thinking about ways of trying to
implant you spiking neurons to make this
work better but that's one big
difference from simpler that you're not
taking two things and saying do they
agree you're just taking all the inputs
coming into a layer and saying do all
those inputs agree
when you talk about the activity that's
similar to what you were doing with n
grads where
you're comparing top-down predictions
and bottom-up predictions okay okay okay
this when you do the recurrent version
of the forward algorithm
at each time step
neurons in a Larry getting top down
input and bottom-up input right
and
they'd like them to agree
and if your objective function is to
have high activity
they'd like to make things highly active
there's another version of the forward
algorithm where the objective is to have
low activity
and then you want the top down to cancel
out the bottom up
and then it looks much more like
predictive coding it's not quite the
same but it's very similar but let's
stick with the version where you're
going for high activity you want the top
down and bottom up to agree and give you
high activity
but notice that
it's not like the top down is a
derivative
so in attempts to
Implement back crop in neural Nets
you try and have top down things which
are like derivatives
and bottom-up things which are like
activities
and you try and use temporal differences
to give you the derivatives
and that's somewhat different
here everything's activities you're
never propagated derivatives
and this algorithm also
does away with the idea of dynamic
routing that you talked about with yes
stacked capsule encoders yeah yes so
with capsules I moved on from the
dynamic routing to having what are
called Universal capsules
capsule would be a small collection of
neurons
and in the original capsules models that
collection of neurons would only be able
to represent one type of thing like a
nose and a different kind of capsule
would represent a mouse
in Universal capsules what you'd have is
that each capsule
could represent any type of thing so it
would have different activity patterns
to represent the different kinds of
things that might be there the capsule
would be dedicated to a location in the
image so a capsule will be representing
what kind of thing you have at that
location at a particular level of
butthole hierarchy
so it might be representing you that at
the part level you have a nose
um and then at a higher level you'd have
other capsules that are representing
other at the object level you have a
face or something
but when you get rid of the dedication
of a bunch of neurons to a particular
type of thing you don't need to do
routing anymore
and in the forward fold algorithm
I'm not doing routine and one of the
diagrams in the paper from the product
is actually taken from my paper on
pothole hierarchies my last paper on
capsule models
so I had a system called glom an
imaginary system and the problem with it
was I never had a plausible learning out
of it and the thought algorithm is a
plausible learning algorithm for glom is
something that's neurally reasonable
what was fascinating to me at least
about capsules is that they captured the
3D nature of reality right lots of
neural Nets are now doing that
so Nerf models neural Regions Field
models
now giving you very good 3D models in
neural Nets so you can see something
from a few different viewpoints
and then
produce an image of what it would look
like from a new viewpoint
that's very good for example making
smooth videos
from frames that are taken a quite long
time intervals but in the forward
forward algorithm what's your intuition
that that this is the if indeed
everything works out that this is a
model for information processing in the
cerebral cortex
and that
perception of depth and the 3D nature of
reality would emerge
yeah
yeah in particular if I'm showing you a
video
and the Viewpoint is changing during the
video
then
what you'd want is that the hidden
layers should represent 3D structure
that's all pie in the sky at present go
ahead reach that stage but yeah but with
capsules because I think you you
referred to pixels having depth
so that if one object moved in front of
another the system understood that the
that it was behind
the thing in front of it
do you capture that with forward
you would want it to learn to deal with
that yes yeah I wouldn't wire that in
but it's an obvious feature video that
it should learn about with babies
they learn in just a few days to get
structure from motion that is if I take
a static scene
and I move the Observer
or if I take keep the Observer
stationary
and the experiments were done with a
piece of paper folded into a w
and if you see it the wrong way around
it looks weird
and so
experiments done by Elizabeth Stokey and
other people use the idea that
you can tell a lot about the perception
of a baby by seeing what they're
interested in because they're interested
in things that look odd and so they'll
pay more attention to things that look
hard and within a few days
they learn to deal with how 3D structure
ought to be related to motion and if you
make it related wrong they think it's
weird
so they learn that very fast whereas it
takes them like at least six months I
think to learn to do stereo
to get it from the true eyes it's just
much easier to get from video than from
stereo but from evolutionary point of
view if something's really easy to learn
there's not much Point wiring it in
you've been working in Matlab famously
now
on toy problems are you starting to
scale are you still refining
I'm doing a bit of scanning I'm using a
GPU to make these go a bit faster but
I'm still at the stage where there's
very basic properties of the algorithm
I'm exploring in particular how to
generate negative data effectively from
the model
and until I've got the sort of basic
stuff working nicely
I think it's silly to scale it up as
soon as you scale it up it's slower to
investigate changes in the basic
algorithm and I'm still at the stage
where there's lots and lots of different
things I want to investigate for example
here's just one little thing that I
haven't had time to invest in yet you
can use
as your objective function to have high
activity
in the positive phase and low activity
in the negative phase
and if you do that it'll find nice
features in the hidden units
or you can have a zero objective
function to have low activity in the
positive phase
if you do that it'll find nice
constraints
if you think about what physicists do
they try and understand nature
by finding apparently different things
that add up to zero
another way of saying is that they're
equal and opposite but
if you take force and you subtract mass
times acceleration you get zero
but that's a constraint
okay
so if you have two sorts of information
one of which is force and the other
which is mass times acceleration
you'd like to
have hidden units that see both those
inputs and that say zero
no activity
and then when they see things that don't
fit the physics
they'll have high activity they'll be
the negative data
so that's called a constraint
and so if you make your objective
function B have low activity for real
things and high activity for
things that aren't real you'll find
constraints in the data as opposed to
features
so features are things that have high
variance and constraints of things that
have low variance
a feature something that's got higher
variance and it should have constrained
as low various than it should now
there's no reason why you shouldn't
have two types of neurons one's looking
for features and one's looking for
constraints
and we know with just linear models
that
a method like principal components
analysis
looks for the directions in the space at
the highest variance they're like
features
and it's very stable
there's other methods like minor
components analysis that look for
directions in the space that have the
lowest variance they're looking for
constraints
they're less numerically stable
but we know that it pays to have both
and so that for example is a direction
that might make things work better but
there's lots
there's about 20 things like that I need
to investigate
and my feeling is until I've got a good
recipe for whether you should use
features or constraints or both
what's the most effective way to
generate negative data and so on
it's premature to investigate really big
systems
with regard to really big systems one of
the things you talk about is the need
for a new kind of computer and I've seen
confusion about this too in the Press
I've seen people talk about how you talk
about getting rid of the annoyman
yeah you obviously want computers where
the hardware and software are separate
yeah and you want them to do things like
keep track of your bank account
this is for things that where we want
computers to be like people to process
natural language to process vision all
those things that
some years ago Bill Gates said computers
couldn't do like they're blind and deaf
they're not blind and deaf anymore but
for processing natural language or doing
motor control or doing Common Sense
reasoning
we probably want a different kind of
computer if we want to do a very low
energy
we need to make much better use of all
the properties of the hardware your
interest is understanding the brain well
I have a side interest in getting low
energy computation going and the point
about the forward forward is it works
when you don't have a good model of the
hardware so if for example I take a a
neural net and I insert a black box so I
have a layer that's just a black box I
have no idea how it works
it does stochastic things
I don't know what's going on
the question is can the whole system
learn with that black box in there
and it has absolutely no problem you've
done something different because the
black box is changing what happens on
the forward pass
but the point is it's changing into
exactly the same way for both forward
passes so it all cancels out
whereas in back propagation you're
completely sunk at this back box the
best you can do is try and learn a
differentiable model of the black box
and that's not going to be very good if
the black box is wandering in its
Behavior
so the forward algorithm doesn't need to
have a perfect model of a forward system
it needs to have a good enough model of
what one neuron is doing so that it can
change the incoming weights of that
neuron to make it more active or less
active but that's all it needs it
doesn't need to be able to
invert the forward pass
and you're not talking about replacing
back propagation which has obviously had
enormous success there's plenty of
compute plenty of power then
back crop is fine but and this is
speculative I understand where you are
in the research but can you imagine if
you had low power
computer
architecture that that could handle Ford
algorithms and you scale them imagine
that it would be great I've actually
been talking to someone called Jack
Kendall who works for a company called
rain who is very insightful about what
you can do with analog Hardware using
properties of the circuits using word
for circuits positives of the electrical
circuits natural properties of
electrical circles
um initially
that was very interesting for doing a
form of Baltimore machine learning
but it's also going to be very
interesting for the forward algorithm so
I can imagine it's scaling up very well
but there's a lot of work to be done to
make that happen
and if it did scale up very well to the
degree that large language models have
been successful do you think that its
abilities would Eclipse those of models
based on back propagation I'm not at all
sure I think they may not so I think
back propagation might be a better
algorithm in the sense that a given
number of connections you can get more
knowledge into those connections using
back propagation than you can with the
thought algorithm
so the Network's moving forward
better if they're somewhat bigger than
the best size networks for back
propagation
it's not good at squeezing a lot of
information into a few connections
back propagation will squeeze lots of
information into a few connections if
you force it to
it is much more happy not having to do
that but it'll do it if you force it to
and the full algorithm isn't good at
that
so if you take these large language
models
so take something with a trillion
connections
which is about the largest language
model that kind of size
that's about a cubic centimeter of
Cortex
and our cortex is like we got a thousand
times that much cortex
so these large language models that
actually know a lot more facts than your
ideal
because they've read everything on the
web not everything but an awful lot yeah
the sense in which they know them is a
bit dodgy but
if you had a sort of general knowledge
quiz
I think gpg3 even would beat me at a
general knowledge quiz
there'd be all sorts of people it knows
about and when they were born and what
they did but I don't know about it and
it all fits in a cubic centimeter cortex
if you measure by connections
so it's got much more knowledge than me
I mean much less brain
so I think back crop is much better at
squeezing information
but that's not the brain's main problem
broad brains we've got plenty of
synapses the question is how do you
effectively get information into them
how do you make good use of experience
David Chalmers talked about the
possibility of Consciousness and you're
certainly interested in
the possibility if you understand how
the brain works and you can
replicate it this kind of a model let's
imagine that it scales beautifully
do you see the potential for reasoning
and
oh I see the potential for reasoning
sure
but Consciousness is a different kind of
question so I think people
I'm amazed that anybody thinks they
understand what they're talking about
when they talk about consciousness
they talk about as if we can Define it
and it's really a jumble of a whole
bunch of different concepts yeah and
they're all mixed together into this
attempt to explain a really complicated
mechanism in terms of an essence
yeah so we've seen that before like 100
years ago if you asked philosophers what
makes something alive or even if you ask
biologists what makes something alive
they say Well it has vital force but if
you say what is vital force and can we
make machines have vital force
they can't really Define vital force
other than saying is what makes people
alive and as soon as you start
understanding biochemistry
you give up on the notion of vital force
you understand about biochemical
processes that are stable and things
breaking down and
so it's not that we cease to have vital
force we've got as much vital force as
we had before it's just that it's not a
useful concept because in an attempt to
explain something complicated in terms
of some simple essence
so
another model like that is
so sports cars have oomph and some have
a lot of them
like an Aston Martin with big noisy
exhausts and lots of acceleration and
bucket seats
has lots of
and
month is an intuitive concept you can
ask it doesn't Aston Martin have more
umph than my Toyota Corolla and it
definitely has a lot more oomph so we
really need to find out what oomph is
because umph is what's it what it's all
about if you're interested in cars or
fast cars anyway
but the concept of umph it's a perfectly
good concept but it doesn't really
explain much but if you want to know why
is it that when I press the accelerator
it goes very fast the concept of oomph
isn't going to help you you need to get
into the mechanics of it
how it actually works
and that that's a good analogy because
what I was going to say is it doesn't
really matter what Consciousness is it
matters whether
we as humans perceive something as
having Consciousness and I think there's
a lot to I think there's a lot to be
said to that yes yeah so if the if this
forward in a large model that scaled
relatively low power consumption
if it can reason there'll always be
philosophers that say yeah but it's not
conscious
but it doesn't really matter if you
can't tell the difference it matters to
the philosophers I think it would be
nice to show them the way out of
their trap they make for themselves
which is I think most people have a
radical misunderstanding of how terms
about perception and experience and
sensation and feelings actually work
I've had the language works
if for example I say I'm seeing a pink
elephant notice the words pink and
elephant refer to things in the world
so what's actually happening is I'd like
to tell you what's going on inside my
head yeah
but telling you what the neurons are
doing won't do you much good
particularly since all our brains are
wired slightly differently it's just no
use to you to tell you what the neurons
are doing
but I can tell you that whatever it is
my neurons are doing it's the kind of
thing that's normally caused by Pink
Elephant being out there if I was doing
veridical perception the cause of my
brain state would be a pink elephant I
can tell you that and that doesn't mean
a pink elephant exists in some spooky
thing inside my head or it's just a
mental thing what it really tells you is
I'm giving you a counterfactual I'm
saying the world doesn't really contain
a pink elephant but if it did contain a
pink elephant
that would explain my brain stage
that plus normal perceptual causation
will explain my brain stage
so when I say I'm having the experience
of a pink elephant the word experience
many people think experience refers to
some funny internal goings on it's an
experience it's some internal no what
I'm denoting when I use the word
experience is that it's not real
I'm took I'm giving you a hypothetical
statement but if this hypothetical thing
were out there in the world that would
explain this brain State and so I'm
giving you insight into my brain state
by talking about a hypothetical world
what's not real about experience is that
it's a hypothetical I'm giving them it's
not that it lives in some other Spooky
World
and it's the same for feelings
if I say I feel like hitting you
what I'm doing is I'm giving you
a sense of what's going on in my head
via what it would normally cause so in
perception it's the world causing of a
sexual state with feelings it's the
internal State causing an action and I'm
giving you insight into my internal
state by telling you what kind of action
it would cause now
I might feel like hitting you or anybody
else or kicking the cat or whatever in
which case I instead of giving you any
one of those actions I just use a term
like angry
but really that shorthand for all those
angry actions
so I'm giving you
I'm giving you a way of seeing what's
going on in my head via describing
actions I might do but they're just
hypothetical actions
and that's what the word feel means when
I say I feel
typically if I say I feel and then say I
feel like blah
it's not that there's some special
internal Essence that's feeling and
computers don't have it computers are
just transistors they don't have feeling
you have to have a soul to have feeling
or something
no I'm describing my internal State via
the actions it would cause if I were to
disinhibit it
from another human's point of view if
you were a machine and you were saying
things like that I would perceive it
as you having feelings
right so let's take the perception cases
it's slightly simpler I think suppose we
make a
big complicated neural network that can
do perception and can also produce
language we have those now yeah
and so you can show them a minute and
they can give you a description what's
there
and suppose we now take one of those
networks and we say
I want you to just imagine something
and okay so it imagine something
and then it tells you what it's
imagining so it says I'm experiencing a
pink elephant
that's experiencing the Pink Elephant
just as much as a person is when they
say they experience something elephant
it's got an internal perceptual state
that would normally be caused by a pink
elephant but in this case it's not
caused by a pink elephant and so it uses
the word experience to denote that there
you go I think it's got just as much
perceptual Sensations as we have
although at the current state of large
language models don't exhibit that kind
of cohesive internal logic you know but
they will they will you you think they
will oh yeah
yeah I don't think I don't think
Consciousness is people treat it like
it's like the sound barrier that
you're either below the speed of sound
or you're above the speed of sound
you've either got a model that hasn't
yet got Consciousness or you've got
there it's not like that at all
I think a lot of people were impressed
by you talking about using Matlab
I'm not sure impressed is the right word
they were interested they were surprised
but what is your day-to-day work like
you have other responsibilities but you
spend more time on conceptualizing and
that could happen while taking a walk or
taking a shower or do you spend more
time on
experimenting like on Matlab or do you
spend more time on running large
experiments
okay
it varies a lot over time so I'll often
spent a long time like when I wrote that
paper about glom I spent a long time
thinking about how to organize a
perceptual system that was more neurally
realistic and could deal with pothole
hierarchists without having to do
Dynamic setting up and connections
and so I spent many months just thinking
about how to do that and writing a paper
about that I spent a lot of time trying
to think about more biologically
possible learning algorithms yes and
then programming little systems in
Matlab and discovering why they don't
work so the point about most original
ideas is they're wrong
and matlab's very convenient for quickly
showing that they're wrong and very
small toy problems like recognizing
handwritten digits I'm very familiar
with that task I can very quickly test
out an idea to see if it works and I've
got I've probably got on my computer
thousands of programs that didn't work
well that I programmed in an afternoon
and
an afternoon was sufficient to decide
that okay that's not going to work
probably that's probably not going to
work you never know for sure because
there might be some little trick you
didn't think of and then there will be
periods when I think I've got onto
something that does work
and I'll spend
several weeks programming and running
things to see if it works
yeah I've been doing that recently with
the Ford forward let me see why I use
Matlab I learned lots of languages when
I was young I learned
pop two which was an Edinburgh language
UCSD Pascal a lisp
common lisp scheme
all sorts of lisps and
vanilla Matlab which is ugly in some
ways but if you're dealing with vectors
and matrices it's what you want it makes
it convenient and I became fluent in
Matlab
and I should have learned Python and I
should have learned all sorts of other
things but when you're old you're much
slower learning language and I'd learned
plenty of them and I figured since I'm
fluent in Matlab and I can test out
little ideas in Matlab and then other
people can test out running your own big
systems I would just stick with testing
out things on Matlab
there's a lot of things about just
literally shaped me but it's also very
convenient and you talk a lot about
learning in toddlers And
is that knowledge base something you
accumulated
years ago or are you continuing to read
and talk to people in different fields I
talk to a lot of people and I learned
most things from talking to people I'm
not very good at reading it I read very
slowly and when I come to equations they
slow me up a lot so I've learned most of
what I know from talking to people
and I'm lucky it's only got lots of good
people to talk to like I talked to Terry
sonoski and he tells me about all sorts
of Neuroscience things I talked to Josh
Tenenbaum when he tells me about all
sorts of cognitive science things
I talked to James Howell and he tells me
lots of kind of science psychology
things
so I get most of my knowledge just from
talking to people
your target nerves you mentioned yeah he
corrected my pronunciation of his name
look Khan why did you reference him in
that talk
oh because for many years he was pushing
convolutional neural networks oh okay
and the vision Community said okay
they're fine for little things like
handwritten digits but they'll never
work for real images
and
there was a
famous paper submitted to a conference
where him and his co-workers
where he actually did better than any
other system on a particular Benchmark I
think it was segmenting pedestrians but
I'm not quite sure it was something like
that and the paper got rejected even
though it had the best results
and one of the referees
so the reason they were rejecting the
paper was because
the system learned everything so it
taught us nothing about vision and this
is a wonderful example of a paradigm and
the Paradigm for computer vision was you
study the task that has to be performed
the computation has to be performed you
figure out an algorithm that'll do that
computation and then you figure out how
to implement it
efficiently
and so the knowledge is all explicit the
knowledge that it's using to do the
vision
is explicit you have to sort it out
mathematically and then implement it and
sitting there in the program
and they just assumed that's the way
that computer vision is going to work
and because computer vision has to work
that way if someone comes along and just
learns everything
so they're no use to you because they
haven't said what the knowledge is what
is the heuristic you're using
and so it's okay maybe it works but
that's just good luck in the end we're
bound to work better than that because
we're using real knowledge shouldn't we
understand what's going on
so they completely failed to get the
main message which was that it learned
everything
not quite everything because you're
writing convolution
but the machine Learning Community they
respected him because he's obviously a
smart guy but they thought he was on
completely the wrong path and they
dismissed his work years and years
and then when Fife Lee and her
collaborators produced the imagenet
competition
finally we had a big enough data set
to show that neural networks would
really work well
and
Jan actually tried to get several
different students to
make a serious attempt to do the image
Network convolutional Nets but he
couldn't find a student who was
interested in doing it at the same time
Elia became very interested in doing it
and I was interested in doing it and
Alex fishevski was a superb programmer
who put a lot of hard work within you
into making it work really well
so it was very unfortunate for Yan that
it wasn't
his group
that finally convinced the computer
vision Community actually this stuff
works much better than what you're doing
you've now
put this paper out there are you hoping
to ignite sort of an army of yeah of
people trying to put some simple Matlab
code out there too
yeah because there's a bunch of little
things you have to otherwise it won't
work and
the code needs to get there it's more
picky than backup with back propagation
you just show people the equations and
anybody can go and implement it
and it doesn't need a lot of tricks for
it to work quite well to work really
well it needs lots of Tricks but it's
worked quite well it's fine with the
forward forwards you need a few tricks
for it to work at all
the tricks are quite reasonable tricks
but once you put them in there then it
works and I want to put that Matlab code
out there so other people can get it to
work
but I didn't want to put my very
primitive Matlab code out there because
it's disgusting
[Music]
thank you
that's it for this week's podcast I want
to thank Jeff for his time I also want
to thank clear ml for their support
we're looking for more sponsors so if
you are interested in supporting the
podcast
please email me at Craig c r a i g at
ionai that's
e-y-e-hyphen on dot a i
as always you can find a transcript of
this episode on our website ey hyphen on
dot a i
I encourage you to read the transcript
if you're serious about understanding
the forward forward algorithm
in the meantime remember
The Singularity may not be near but AI
is about to change your world so pay
attention
Voir Plus de Vidéos Connexes
Season 2 Ep 22 Geoff Hinton on revolutionizing artificial intelligence... again
Backpropagation and the brain
Heroes of Deep Learning: Andrew Ng interviews Geoffrey Hinton
Possible End of Humanity from AI? Geoffrey Hinton at MIT Technology Review's EmTech Digital
Networking for GenAI Training and Inference Clusters | Jongsoo Park & Petr Lapukhov
The Most Important Algorithm in Machine Learning
5.0 / 5 (0 votes)