But what is a neural network? | Chapter 1, Deep learning

3Blue1Brown
5 Oct 201718:39

Summary

TLDR这段视频脚本深入探讨了神经网络的基本概念和工作原理,特别是它们如何识别手写数字。视频从一个低分辨率的数字“3”图像开始,展示了人类大脑如何轻松识别不同的数字,即使它们的像素值差异很大。接着,脚本介绍了构建一个能够处理28x28像素网格并输出0到10之间数字的程序的挑战,这引出了机器学习和神经网络的重要性。视频的目标是帮助观众理解神经网络的结构,并可视化其数学原理,而不仅仅是作为一个流行词汇。通过构建一个简单的神经网络模型,脚本解释了网络是如何通过多个层次的神经元(称为激活)来识别手写数字的。每个神经元代表一个数字,并且网络通过调整这些神经元之间的连接权重和偏置来“学习”。视频还讨论了网络中的隐藏层可能如何识别数字的子组件,如边缘和环。最后,脚本提到了sigmoid函数和ReLU(修正线性单元)在神经网络激活中的作用,并预告了下一期视频将深入探讨网络如何通过数据学习适当的权重和偏置。

Takeaways

  • 🧠 大脑能够轻松识别低分辨率图像中的数字,这展示了人类视觉识别的惊人能力。
  • 📈 编写一个能够识别28x28像素图像中数字的程序是一项艰巨的任务,这突显了机器学习和神经网络的重要性。
  • 🌟 神经网络的结构和学习过程将通过视频以数学的角度进行介绍,以帮助观众理解。
  • 🔍 神经网络的设计灵感来源于人脑,特别是神经元如何链接在一起。
  • 🔢 输入层的神经元数量与像素点对应,每个神经元代表一个像素的灰度值。
  • 📊 输出层有10个神经元,每个神经元代表一个数字,其激活值表示系统认为图像与特定数字的匹配程度。
  • 🤔 隐藏层的作用尚不清楚,但它们在识别数字的过程中起到关键作用。
  • 🔗 每一层的激活值决定了下一层的激活值,这是神经网络作为信息处理机制的核心。
  • 📉 通过训练,神经网络能够学习到如何将像素组合成边缘、模式或数字。
  • 🎯 权重和偏置是神经网络中的关键参数,它们决定了网络如何响应输入图像。
  • 🤝 神经网络的学习过程涉及调整成千上万的权重和偏置,以找到解决问题的最佳设置。
  • 📚 线性代数是理解神经网络的关键,特别是矩阵和向量乘法的概念。

Q & A

  • 为什么人脑能够轻松识别不同分辨率和风格中的数字3?

    -人脑的视觉皮层非常发达,它能够识别和解析不同的视觉模式,即使像素值和激活的光敏细胞在不同的图像中差异很大,大脑仍然能够将它们识别为相同的数字3。

  • 编写一个能够识别28x28像素网格中手写数字的程序为何变得非常困难?

    -因为需要设计一个能够理解像素模式并将其转换为0到10之间数字的复杂算法,这涉及到大量的参数调整和深度学习知识。

  • 神经网络的结构是如何启发于人脑的?

    -神经网络的结构模仿了人脑中的神经元网络,其中神经元是持有0到1之间数值的单元,通过层级结构处理信息,类似于大脑中神经元的激活模式。

  • 神经网络的第一层包含多少个神经元,它们代表什么?

    -神经网络的第一层包含784个神经元,每个神经元代表输入图像中一个28x28像素点的灰度值。

  • 输出层有10个神经元,它们代表什么含义?

    -输出层的10个神经元每个代表一个数字,从0到9。它们的激活值表示系统认为给定图像与某个数字的对应程度。

  • 隐藏层在神经网络中扮演什么角色?

    -隐藏层是神经网络中的中间层,它们处理输入层的信息,并为输出层提供更高级的特征表示,帮助网络识别更复杂的模式。

  • 如何通过数学方式表示神经网络中一层激活值对下一层的影响?

    -通过权重矩阵和激活向量的矩阵乘法,再加上偏置向量,然后通过sigmoid函数将结果压缩到0和1之间。

  • 为什么现代神经网络较少使用sigmoid函数?

    -因为sigmoid函数在训练时存在一些困难,而现代网络更倾向于使用ReLU(修正线性单元)函数,它更容易训练,并且对于深度神经网络表现更好。

  • 网络中的权重和偏置是如何决定神经元激活的?

    -权重决定了神经元如何响应前一层的激活模式,而偏置则决定了神经元开始显著激活的阈值。通过调整权重和偏置,可以训练网络识别特定的模式。

  • 为什么说神经网络的训练过程是寻找合适的权重和偏置设置?

    -因为神经网络的性能依赖于这些参数,通过训练数据调整这些参数,可以使网络更好地识别和预测目标,解决特定的问题。

  • 为什么理解矩阵和矩阵向量乘法对于学习机器学习很重要?

    -矩阵和矩阵向量乘法是神经网络中信息传递和变换的基础,它们在表示和计算神经网络的层间转换时非常关键,且许多机器学习库对矩阵运算进行了优化。

  • 如何将神经网络中的复杂数学表达式简化以便于理解和编程实现?

    -通过将权重矩阵和偏置向量表示为符号,并利用矩阵向量乘法和sigmoid函数的组合,可以非常紧凑和高效地表达神经网络的激活转换过程。

Outlines

00:00

😀 神经网络的基本概念和结构

第一段介绍了大脑如何轻松识别不同像素值的数字3,引出了机器学习和神经网络的重要性。接着,作者解释了神经网络的结构,包括输入层、隐藏层和输出层,以及它们之间的连接。特别指出,神经网络的每个神经元都包含一个介于0和1之间的数字,称为激活值,这个值代表了像素的灰度或者与特定数字的关联程度。最后,作者提到了神经网络的灵感来源——人脑,以及如何通过训练神经网络来识别手写数字。

05:02

🧠 神经网络的工作原理和期望行为

第二段深入探讨了神经网络的工作原理,特别是隐藏层可能如何智能地处理信息。作者提出了一个期望,即中间层的每个神经元可能对应于识别数字的不同子组件,例如数字9和8顶部的循环。此外,还讨论了如何通过识别图像中的边缘和小片段来逐步构建对整个数字的认识。作者强调了理解这些层次化结构的重要性,以及它们在图像识别之外的其他智能任务中的应用潜力。

10:02

🔍 神经网络的参数调整和训练过程

第三段详细介绍了如何通过调整神经网络中的权重和偏置来识别图像中的特定模式。权重是神经元之间连接的数值,而偏置则是在激活函数应用前加在加权和上的一个数值。作者解释了如何使用sigmoid函数将加权和的输出压缩到0和1之间,并讨论了如何通过添加偏置来调整神经元的激活阈值。此外,还提到了如何将这些复杂的参数集合简化为矩阵和向量的形式,以便于理解和编程实现。

15:05

📚 神经网络的数学基础和未来展望

第四段强调了神经网络的数学基础,特别是线性代数在机器学习中的重要性。作者将神经网络描述为一个复杂的函数,它接受784个输入值并输出10个数字。同时,作者提出了通过观察数据来学习适当权重和偏置的问题,这将在下一个视频中进行讨论。此外,还提到了现代网络中sigmoid函数的使用已经减少,取而代之的是ReLU(修正线性单元)函数,因为它更容易训练。最后,作者感谢了支持视频的Patreon赞助者,并预告了接下来关于神经网络学习过程的内容。

Mindmap

Keywords

💡神经网络

神经网络是一种模仿人脑神经结构的计算模型,用于处理和识别数据。在视频中,神经网络用于识别手写数字,这是通过将输入图像的像素值转换为数字0到10之间的输出来实现的。神经网络的核心在于其层级结构,其中每一层的激活值会决定下一层的激活值,从而实现复杂的模式识别。

💡激活值

激活值是神经网络中神经元所持有的数值,介于0和1之间,代表该神经元的活跃程度。在视频中,输入层的激活值对应于图像像素的灰度值,而输出层的激活值则表示系统认为图像对应于某个数字的置信度。

💡隐藏层

隐藏层是神经网络中不直接与输入或输出交互的层,它们在输入层和输出层之间。隐藏层的作用是提取和组合特征,用于更高层次的抽象。视频中提到,尽管隐藏层的具体工作方式暂时是个谜,但它们对于识别手写数字至关重要。

💡权重

权重是神经网络中连接神经元的参数,它们决定了从一个神经元到另一个神经元的信号强度。在视频中,通过为每一层的神经元分配权重,网络能够学习如何将像素组合成边缘、将边缘组合成模式,最终识别出数字。

💡偏置

偏置是加在神经网络中神经元的加权和上的一个数值,用于调整神经元的激活阈值。在视频中,偏置被用来决定加权和需要达到多高时,神经元才开始显著激活,这有助于模型区分不同的特征。

💡sigmoid函数

sigmoid函数是一种将任意实数映射到0和1之间的数学函数,常用于神经网络中将加权和转换为激活值。在视频中,sigmoid函数用于确保神经元的激活值保持在0到1的范围内,从而模拟生物神经元的激活状态。

💡ReLU

ReLU(Rectified Linear Unit)是一种现代神经网络中常用的激活函数,它在输入为正时输出该输入值,在输入为负时输出0。ReLU相比sigmoid函数更容易训练,因此在深度学习中更受欢迎。在视频中,虽然sigmoid函数在早期网络中使用较多,但现代网络更倾向于使用ReLU。

💡矩阵向量乘法

矩阵向量乘法是一种线性代数中的基本运算,它在神经网络中用于高效地计算层与层之间的加权和。在视频中,通过将权重组织成矩阵,激活值组织成向量,可以简洁地表达整个网络的前向传播过程。

💡反向传播

反向传播是神经网络中用于训练模型的一种算法,通过计算损失函数关于网络参数的梯度,并利用这些梯度来更新权重和偏置。虽然视频中没有直接提到反向传播,但这是学习过程中的关键步骤,用于调整网络的参数以正确识别手写数字。

💡特征提取

特征提取是机器学习中识别和利用数据中的关键信息以进行分类或其他任务的过程。在视频中,特征提取是通过隐藏层实现的,其中较低层次的隐藏层可能识别图像中的边缘,而较高层次的隐藏层则可能识别由这些边缘组成的更复杂的模式。

💡参数

参数是神经网络中用于定义模型的变量,包括权重和偏置。在视频中,网络有大约13,000个参数,这些参数通过学习过程不断调整,以便网络能够准确地识别手写数字。参数的调整是通过大量的数据和训练来完成的。

Highlights

大脑能够毫不费力地识别低分辨率图像中的数字3,展示了人类视觉皮层的强大能力。

即使像素值差异很大,大脑也能识别出不同的图像表示相同的数字3。

编写一个程序来识别28x28像素网格中的数字,从简单到极其困难的转变。

机器学习和神经网络在当前和未来的重要性和相关性。

介绍神经网络的实际结构,将其视为数学的一部分,而非仅仅是一个流行词。

通过两个视频介绍,构建一个能够识别手写数字的神经网络。

神经网络的灵感来源于大脑,但具体是如何工作的?

神经网络从输入图像的784个像素开始,每个像素对应一个神经元。

输出层有10个神经元,每个神经元代表一个数字。

隐藏层是神经网络中未知的部分,它们如何帮助识别数字仍然是一个谜。

网络的操作是一层的激活值决定下一层的激活值。

训练后的神经网络能够识别数字,这意味着输入图像会触发特定的激活模式。

中间层可能捕捉到数字的子组件,如9的顶部环和8的上下环。

希望网络能够检测到边缘和模式,这对于其他图像识别任务也非常有用。

设计神经网络时,需要确定每一层之间连接的权重和偏差。

权重和偏差是网络中可以调整的参数,总共约有13,000个。

学习过程是指计算机找到所有权重和偏差的有效设置,以解决手头的问题。

现代神经网络中,sigmoid函数的使用已经较少,ReLU(修正线性单元)更为常用。

Transcripts

play00:04

This is a 3.

play00:06

It's sloppily written and rendered at an extremely low resolution of 28x28 pixels,

play00:10

but your brain has no trouble recognizing it as a 3.

play00:14

And I want you to take a moment to appreciate how

play00:16

crazy it is that brains can do this so effortlessly.

play00:19

I mean, this, this and this are also recognizable as 3s,

play00:23

even though the specific values of each pixel is very different from one

play00:27

image to the next.

play00:28

The particular light-sensitive cells in your eye that are firing when

play00:32

you see this 3 are very different from the ones firing when you see this 3.

play00:37

But something in that crazy-smart visual cortex of yours resolves these as representing

play00:42

the same idea, while at the same time recognizing other images as their own distinct

play00:47

ideas.

play00:49

But if I told you, hey, sit down and write for me a program that takes in a grid of

play00:54

28x28 pixels like this and outputs a single number between 0 and 10,

play00:59

telling you what it thinks the digit is, well the task goes from comically trivial to

play01:04

dauntingly difficult.

play01:07

Unless you've been living under a rock, I think I hardly need to motivate the relevance

play01:10

and importance of machine learning and neural networks to the present and to the future.

play01:15

But what I want to do here is show you what a neural network actually is,

play01:19

assuming no background, and to help visualize what it's doing,

play01:22

not as a buzzword but as a piece of math.

play01:25

My hope is that you come away feeling like the structure itself is motivated,

play01:28

and to feel like you know what it means when you read,

play01:31

or you hear about a neural network quote-unquote learning.

play01:35

This video is just going to be devoted to the structure component of that,

play01:38

and the following one is going to tackle learning.

play01:40

What we're going to do is put together a neural

play01:43

network that can learn to recognize handwritten digits.

play01:49

This is a somewhat classic example for introducing the topic,

play01:52

and I'm happy to stick with the status quo here,

play01:54

because at the end of the two videos I want to point you to a couple good

play01:57

resources where you can learn more, and where you can download the code that

play02:00

does this and play with it on your own computer.

play02:05

There are many many variants of neural networks,

play02:07

and in recent years there's been sort of a boom in research towards these variants,

play02:12

but in these two introductory videos you and I are just going to look at the simplest

play02:16

plain vanilla form with no added frills.

play02:19

This is kind of a necessary prerequisite for understanding any of the more powerful

play02:23

modern variants, and trust me it still has plenty of complexity for us to wrap our minds

play02:28

around.

play02:29

But even in this simplest form it can learn to recognize handwritten digits,

play02:33

which is a pretty cool thing for a computer to be able to do.

play02:37

And at the same time you'll see how it does fall

play02:39

short of a couple hopes that we might have for it.

play02:43

As the name suggests neural networks are inspired by the brain, but let's break that down.

play02:48

What are the neurons, and in what sense are they linked together?

play02:52

Right now when I say neuron all I want you to think about is a thing that holds a number,

play02:58

specifically a number between 0 and 1.

play03:00

It's really not more than that.

play03:03

For example the network starts with a bunch of neurons corresponding to

play03:08

each of the 28x28 pixels of the input image, which is 784 neurons in total.

play03:14

Each one of these holds a number that represents the grayscale value of the

play03:19

corresponding pixel, ranging from 0 for black pixels up to 1 for white pixels.

play03:25

This number inside the neuron is called its activation,

play03:28

and the image you might have in mind here is that each neuron is lit up when its

play03:32

activation is a high number.

play03:36

So all of these 784 neurons make up the first layer of our network.

play03:46

Now jumping over to the last layer, this has 10 neurons,

play03:49

each representing one of the digits.

play03:52

The activation in these neurons, again some number that's between 0 and 1,

play03:56

represents how much the system thinks that a given image corresponds with a given digit.

play04:03

There's also a couple layers in between called the hidden layers,

play04:06

which for the time being should just be a giant question mark for

play04:09

how on earth this process of recognizing digits is going to be handled.

play04:14

In this network I chose two hidden layers, each one with 16 neurons,

play04:17

and admittedly that's kind of an arbitrary choice.

play04:21

To be honest I chose two layers based on how I want to motivate the structure

play04:24

in just a moment, and 16, well that was just a nice number to fit on the screen.

play04:28

In practice there is a lot of room for experiment with a specific structure here.

play04:33

The way the network operates, activations in one

play04:35

layer determine the activations of the next layer.

play04:39

And of course the heart of the network as an information processing mechanism comes down

play04:43

to exactly how those activations from one layer bring about activations in the next layer.

play04:49

It's meant to be loosely analogous to how in biological networks of neurons,

play04:53

some groups of neurons firing cause certain others to fire.

play04:58

Now the network I'm showing here has already been trained to recognize digits,

play05:01

and let me show you what I mean by that.

play05:03

It means if you feed in an image, lighting up all 784 neurons of the input layer

play05:08

according to the brightness of each pixel in the image,

play05:11

that pattern of activations causes some very specific pattern in the next layer

play05:16

which causes some pattern in the one after it,

play05:18

which finally gives some pattern in the output layer.

play05:22

And the brightest neuron of that output layer is the network's choice,

play05:26

so to speak, for what digit this image represents.

play05:32

And before jumping into the math for how one layer influences the next,

play05:36

or how training works, let's just talk about why it's even reasonable

play05:40

to expect a layered structure like this to behave intelligently.

play05:44

What are we expecting here?

play05:45

What is the best hope for what those middle layers might be doing?

play05:48

Well, when you or I recognize digits, we piece together various components.

play05:54

A 9 has a loop up top and a line on the right.

play05:57

An 8 also has a loop up top, but it's paired with another loop down low.

play06:01

A 4 basically breaks down into three specific lines, and things like that.

play06:07

Now in a perfect world, we might hope that each neuron in the second

play06:11

to last layer corresponds with one of these subcomponents,

play06:15

that anytime you feed in an image with, say, a loop up top,

play06:18

like a 9 or an 8, there's some specific neuron whose activation is going to be close to 1.

play06:24

And I don't mean this specific loop of pixels,

play06:26

the hope would be that any generally loopy pattern towards the top sets off this neuron.

play06:32

That way, going from the third layer to the last one just requires

play06:36

learning which combination of subcomponents corresponds to which digits.

play06:41

Of course, that just kicks the problem down the road,

play06:43

because how would you recognize these subcomponents,

play06:45

or even learn what the right subcomponents should be?

play06:48

And I still haven't even talked about how one layer influences the next,

play06:51

but run with me on this one for a moment.

play06:53

Recognizing a loop can also break down into subproblems.

play06:57

One reasonable way to do this would be to first

play06:59

recognize the various little edges that make it up.

play07:03

Similarly, a long line, like the kind you might see in the digits 1 or 4 or 7,

play07:08

is really just a long edge, or maybe you think of it as a certain pattern of several

play07:13

smaller edges.

play07:15

So maybe our hope is that each neuron in the second layer of

play07:18

the network corresponds with the various relevant little edges.

play07:23

Maybe when an image like this one comes in, it lights up all of the

play07:27

neurons associated with around 8 to 10 specific little edges,

play07:31

which in turn lights up the neurons associated with the upper loop

play07:35

and a long vertical line, and those light up the neuron associated with a 9.

play07:40

Whether or not this is what our final network actually does is another question,

play07:44

one that I'll come back to once we see how to train the network,

play07:47

but this is a hope that we might have, a sort of goal with the layered structure

play07:52

like this.

play07:53

Moreover, you can imagine how being able to detect edges and patterns

play07:56

like this would be really useful for other image recognition tasks.

play08:00

And even beyond image recognition, there are all sorts of intelligent

play08:04

things you might want to do that break down into layers of abstraction.

play08:08

Parsing speech, for example, involves taking raw audio and picking out distinct sounds,

play08:12

which combine to make certain syllables, which combine to form words,

play08:16

which combine to make up phrases and more abstract thoughts, etc.

play08:21

But getting back to how any of this actually works,

play08:24

picture yourself right now designing how exactly the activations in one layer

play08:28

might determine the next.

play08:30

The goal is to have some mechanism that could conceivably combine pixels into edges,

play08:36

or edges into patterns, or patterns into digits.

play08:39

And to zoom in on one very specific example, let's say the

play08:43

hope is for one particular neuron in the second layer to pick

play08:46

up on whether or not the image has an edge in this region here.

play08:51

The question at hand is what parameters should the network have?

play08:55

What dials and knobs should you be able to tweak so that it's expressive

play08:59

enough to potentially capture this pattern, or any other pixel pattern,

play09:03

or the pattern that several edges can make a loop, and other such things?

play09:08

Well, what we'll do is assign a weight to each one of the

play09:11

connections between our neuron and the neurons from the first layer.

play09:16

These weights are just numbers.

play09:18

Then take all of those activations from the first layer

play09:21

and compute their weighted sum according to these weights.

play09:27

I find it helpful to think of these weights as being organized into a

play09:31

little grid of their own, and I'm going to use green pixels to indicate positive weights,

play09:35

and red pixels to indicate negative weights, where the brightness of

play09:38

that pixel is some loose depiction of the weight's value.

play09:42

Now if we made the weights associated with almost all of the pixels zero

play09:46

except for some positive weights in this region that we care about,

play09:50

then taking the weighted sum of all the pixel values really just amounts

play09:53

to adding up the values of the pixel just in the region that we care about.

play09:59

And if you really wanted to pick up on whether there's an edge here,

play10:02

what you might do is have some negative weights associated with the surrounding pixels.

play10:07

Then the sum is largest when those middle pixels

play10:10

are bright but the surrounding pixels are darker.

play10:14

When you compute a weighted sum like this, you might come out with any number,

play10:18

but for this network what we want is for activations to be some value between 0 and 1.

play10:24

So a common thing to do is to pump this weighted sum into some function

play10:28

that squishes the real number line into the range between 0 and 1.

play10:32

And a common function that does this is called the sigmoid function,

play10:35

also known as a logistic curve.

play10:38

Basically very negative inputs end up close to 0,

play10:41

positive inputs end up close to 1, and it just steadily increases around the input 0.

play10:49

So the activation of the neuron here is basically a

play10:52

measure of how positive the relevant weighted sum is.

play10:57

But maybe it's not that you want the neuron to

play10:59

light up when the weighted sum is bigger than 0.

play11:02

Maybe you only want it to be active when the sum is bigger than say 10.

play11:06

That is, you want some bias for it to be inactive.

play11:11

What we'll do then is just add in some other number like negative 10 to this

play11:15

weighted sum before plugging it through the sigmoid squishification function.

play11:20

That additional number is called the bias.

play11:23

So the weights tell you what pixel pattern this neuron in the second

play11:27

layer is picking up on, and the bias tells you how high the weighted

play11:31

sum needs to be before the neuron starts getting meaningfully active.

play11:36

And that is just one neuron.

play11:38

Every other neuron in this layer is going to be connected to

play11:42

all 784 pixel neurons from the first layer, and each one of

play11:46

those 784 connections has its own weight associated with it.

play11:51

Also, each one has some bias, some other number that you add

play11:54

on to the weighted sum before squishing it with the sigmoid.

play11:58

And that's a lot to think about!

play11:59

With this hidden layer of 16 neurons, that's a total of 784 times 16 weights,

play12:06

along with 16 biases.

play12:08

And all of that is just the connections from the first layer to the second.

play12:12

The connections between the other layers also have

play12:14

a bunch of weights and biases associated with them.

play12:18

All said and done, this network has almost exactly 13,000 total weights and biases.

play12:23

13,000 knobs and dials that can be tweaked and

play12:26

turned to make this network behave in different ways.

play12:31

So when we talk about learning, what that's referring to is

play12:34

getting the computer to find a valid setting for all of these

play12:37

many many numbers so that it'll actually solve the problem at hand.

play12:42

One thought experiment that is at once fun and kind of horrifying is to imagine sitting

play12:47

down and setting all of these weights and biases by hand,

play12:50

purposefully tweaking the numbers so that the second layer picks up on edges,

play12:54

the third layer picks up on patterns, etc.

play12:56

I personally find this satisfying rather than just treating the network as a total

play13:01

black box, because when the network doesn't perform the way you anticipate,

play13:04

if you've built up a little bit of a relationship with what those weights and biases

play13:09

actually mean, you have a starting place for experimenting with how to change the

play13:13

structure to improve.

play13:14

Or when the network does work but not for the reasons you might expect,

play13:18

digging into what the weights and biases are doing is a good way to challenge

play13:22

your assumptions and really expose the full space of possible solutions.

play13:26

By the way, the actual function here is a little cumbersome to write down,

play13:30

don't you think?

play13:32

So let me show you a more notationally compact way that these connections are represented.

play13:37

This is how you'd see it if you choose to read up more about neural networks. 214 00:13:41,380 --> 00:13:40,520 Organize all of the activations from one layer into a column as a vector.

play13:41

Then organize all of the weights as a matrix, where each row of that matrix corresponds

play13:50

to the connections between one layer and a particular neuron in the next layer.

play13:58

What that means is that taking the weighted sum of the activations in

play14:02

the first layer according to these weights corresponds to one of the

play14:05

terms in the matrix vector product of everything we have on the left here.

play14:14

By the way, so much of machine learning just comes down to having a

play14:17

good grasp of linear algebra, so for any of you who want a nice visual

play14:21

understanding for matrices and what matrix vector multiplication means,

play14:24

take a look at the series I did on linear algebra, especially chapter 3.

play14:29

Back to our expression, instead of talking about adding the bias to each one of

play14:33

these values independently, we represent it by organizing all those biases into a vector,

play14:38

and adding the entire vector to the previous matrix vector product.

play14:43

Then as a final step, I'll wrap a sigmoid around the outside here,

play14:46

and what that's supposed to represent is that you're going to apply the

play14:50

sigmoid function to each specific component of the resulting vector inside.

play14:55

So once you write down this weight matrix and these vectors as their own symbols,

play15:00

you can communicate the full transition of activations from one layer to the next in an

play15:05

extremely tight and neat little expression, and this makes the relevant code both a lot

play15:10

simpler and a lot faster, since many libraries optimize the heck out of matrix

play15:14

multiplication.

play15:17

Remember how earlier I said these neurons are simply things that hold numbers?

play15:22

Well of course the specific numbers that they hold depends on the image you feed in,

play15:27

so it's actually more accurate to think of each neuron as a function,

play15:31

one that takes in the outputs of all the neurons in the previous layer and spits out a

play15:36

number between 0 and 1.

play15:39

Really the entire network is just a function, one that takes in

play15:43

784 numbers as an input and spits out 10 numbers as an output.

play15:47

It's an absurdly complicated function, one that involves 13,000 parameters

play15:51

in the forms of these weights and biases that pick up on certain patterns,

play15:55

and which involves iterating many matrix vector products and the sigmoid

play15:59

squishification function, but it's just a function nonetheless.

play16:03

And in a way it's kind of reassuring that it looks complicated.

play16:07

I mean if it were any simpler, what hope would we have

play16:09

that it could take on the challenge of recognizing digits?

play16:13

And how does it take on that challenge?

play16:15

How does this network learn the appropriate weights and biases just by looking at data?

play16:20

Well that's what I'll show in the next video, and I'll also dig a little

play16:23

more into what this particular network we're seeing is really doing.

play16:27

Now is the point I suppose I should say subscribe to stay notified

play16:30

about when that video or any new videos come out,

play16:33

but realistically most of you don't actually receive notifications from YouTube, do you?

play16:38

Maybe more honestly I should say subscribe so that the neural networks

play16:41

that underlie YouTube's recommendation algorithm are primed to believe

play16:44

that you want to see content from this channel get recommended to you.

play16:48

Anyway, stay posted for more.

play16:50

Thank you very much to everyone supporting these videos on Patreon.

play16:54

I've been a little slow to progress in the probability series this summer,

play16:57

but I'm jumping back into it after this project,

play16:59

so patrons you can look out for updates there.

play17:03

To close things off here I have with me Lisha Li who did her PhD work on the

play17:07

theoretical side of deep learning and who currently works at a venture capital

play17:10

firm called Amplify Partners who kindly provided some of the funding for this video.

play17:15

So Lisha one thing I think we should quickly bring up is this sigmoid function.

play17:19

As I understand it early networks use this to squish the relevant weighted

play17:23

sum into that interval between zero and one, you know kind of motivated

play17:26

by this biological analogy of neurons either being inactive or active.

play17:30

Exactly.

play17:30

But relatively few modern networks actually use sigmoid anymore.

play17:34

Yeah.

play17:34

It's kind of old school right?

play17:35

Yeah or rather ReLU seems to be much easier to train.

play17:39

And ReLU, ReLU stands for rectified linear unit?

play17:42

Yes it's this kind of function where you're just taking a max of zero

play17:47

and a where a is given by what you were explaining in the video and

play17:52

what this was sort of motivated from I think was a partially by a

play17:56

biological analogy with how neurons would either be activated or not.

play18:01

And so if it passes a certain threshold it would be the identity function but if it did

play18:06

not then it would just not be activated so it'd be zero so it's kind of a simplification.

play18:11

Using sigmoids didn't help training or it was very difficult

play18:15

to train at some point and people just tried ReLU and it happened

play18:20

to work very well for these incredibly deep neural networks.

play18:25

All right thank you Lisha.

Rate This

5.0 / 5 (0 votes)

相关标签
神经网络机器学习手写识别数学原理深度学习图像识别技术未来算法解析网络结构学习机制权重调整