But what is a neural network? | Chapter 1, Deep learning

3Blue1Brown
5 Oct 201718:39

Summary

TLDRThe video script delves into the fascinating world of neural networks, explaining their structure and function in an accessible manner. It starts by marveling at the human brain's ability to effortlessly recognize handwritten digits, even in low-resolution images. The script then introduces the concept of neural networks as a mathematical model inspired by the brain, consisting of layers of interconnected 'neurons' that hold numeric values. It explores the network's architecture, weights, biases, and activation functions, illustrating how they work together to process input data and recognize patterns. The script promises to cover the learning process of neural networks in a subsequent video, aiming to provide a comprehensive understanding of this powerful technology.

Takeaways

  • ๐Ÿ˜ƒ Neural networks are inspired by the human brain and are structured in layers of interconnected neurons (nodes that hold values).
  • ๐Ÿค– The input layer neurons represent the pixel values of an image, the hidden layers perform computations to detect patterns, and the output layer neurons indicate the predicted digit.
  • โš™๏ธ Each neuron connection has an associated weight and bias that determine how activations from one layer influence the next layer.
  • ๐Ÿงฎ The activations are computed through weighted sums and activation functions like the sigmoid, allowing the network to learn complex representations.
  • ๐Ÿ”ข For a 28x28 pixel input image and 10 output digits, the example network has around 13,000 weights and biases to be learned.
  • ๐Ÿงฉ The goal is for the hidden layers to learn to detect relevant features like edges, patterns, and components that can be combined to recognize digits.
  • ๐Ÿ“š Linear algebra concepts like matrices and matrix-vector multiplication provide a compact way to represent and compute the neural network operations.
  • ๐Ÿ“ˆ Early neural networks used sigmoid activation functions, but modern networks often use ReLU (rectified linear unit) activations, which are easier to train.
  • ๐Ÿ’ก Neural networks are just very complicated functions that map input data (like images) to output predictions (like digit classifications).
  • ๐Ÿ”„ The next video will cover how these networks can 'learn' the appropriate weights and biases from training data to perform the desired task.

Q & A

  • What is the purpose of this video series?

    -The purpose of the video series is to provide an introduction to neural networks, explaining their structure and how they learn, using the example of a neural network that can recognize handwritten digits.

  • What is a neuron in the context of neural networks?

    -In the context of neural networks, a neuron is a simple computational unit that holds a number between 0 and 1, representing its activation level.

  • How is the activation of a neuron in one layer determined by the activations of the previous layer?

    -The activation of a neuron is determined by taking the weighted sum of the activations from the previous layer, where each connection has a weight associated with it. This weighted sum is then passed through an activation function, such as the sigmoid function, to squish the result between 0 and 1.

  • What is the purpose of the weights and biases in a neural network?

    -The weights determine the strength of the connections between neurons in adjacent layers and control how the activations from one layer influence the next layer. The biases act as an additional parameter that shifts the activation function, allowing the neuron to be active or inactive based on a specific threshold.

  • How many total weights and biases are there in the neural network discussed in the video?

    -The neural network discussed in the video has a total of almost 13,000 weights and biases.

  • What is the role of the hidden layers in a neural network?

    -The hidden layers in a neural network are responsible for detecting and combining low-level features (such as edges) into more complex patterns (such as loops or digits), enabling the network to learn hierarchical representations of the input data.

  • What is the motivation behind the layered structure of neural networks?

    -The layered structure of neural networks is motivated by the idea that intelligent tasks, such as image recognition or speech parsing, can be broken down into layers of abstraction, where each layer builds upon the representations learned in the previous layer.

  • What is the purpose of the sigmoid function in neural networks?

    -The sigmoid function is used to squish the weighted sum of activations from the previous layer into the range between 0 and 1, ensuring that the activations of the neurons in the current layer fall within the desired range.

  • What is the advantage of representing neural network computations using matrix notation?

    -Representing neural network computations using matrix notation allows for more compact and efficient calculations, as many libraries optimize matrix multiplication operations. This notation also makes it easier to communicate and understand the transformations happening between layers.

  • What is mentioned about the ReLU (Rectified Linear Unit) activation function?

    -The video mentions that relatively few modern neural networks use the sigmoid activation function anymore, and instead, the ReLU activation function is commonly used as it is easier to train, especially for deep neural networks.

Outlines

00:00

๐Ÿง  Neural Networks: The Basics

This section introduces the concept of recognizing handwritten digits, emphasizing the challenge and intricacy involved in programming a computer to identify images represented by a 28x28 pixel grid. It sets the stage for discussing the relevance of machine learning and neural networks, presenting neural networks not as buzzwords but as mathematical constructs that can learn and recognize patterns. The focus is on building a basic understanding of neural networks, highlighting their ability to learn from data and recognize handwritten digits, a classic example in machine learning. This foundation is critical for understanding more complex and variant forms of neural networks explored in contemporary research.

05:02

๐Ÿ” Neural Network Layers and Recognition Process

Paragraph 2 dives into the layered structure of neural networks, explaining how each layer's activations influence the subsequent layer, culminating in the network's ability to recognize digits based on the brightness of input pixels. It outlines the theoretical hope that neural networks can intelligently identify components of digits (like loops and lines) through layers of abstraction. This section further explains the conceptual hope for how neural networks might break down complex patterns into simpler, recognizable components, paving the way for understanding the importance of each layer in processing and recognizing patterns in data.

10:02

๐Ÿ“ Mechanics of Neural Network Functionality

This paragraph elucidates the technical workings of neural networks, particularly how activations within a network are determined by weighted sums and biases, which are adjusted through the sigmoid function to keep values between 0 and 1. It outlines the complexity involved in configuring these weights and biases across layers to facilitate pattern recognition, such as edges forming loops. The network's architecture, with its thousands of parameters, demonstrates the intricate process of 'learning' through adjusting these parameters to recognize patterns in data, a concept that underpins machine learning and neural network functionality.

15:05

๐Ÿ‘๏ธ Understanding Neural Networks as Functions

Paragraph 4 conceptualizes the entire neural network as a complex function that transforms 784 input numbers (pixels) into 10 output numbers (digits), facilitated by layers of matrix vector products and the sigmoid function. It emphasizes the network's capacity to adapt its thousands of parameters through learning to achieve accurate digit recognition. The discussion includes an evolution in neural network design, shifting from sigmoid functions to more effective activation functions like ReLU for training deeper networks. This transition underscores the ongoing developments and optimizations in neural network methodologies for improved performance in tasks like image recognition.

Mindmap

Keywords

๐Ÿ’กNeural Networks

Neural networks are computing systems vaguely inspired by the biological neural networks that constitute animal brains. The video discusses neural networks in the context of machine learning, emphasizing their ability to recognize handwritten digits by processing input through layers of artificial neurons. These networks are capable of learning and making intelligent decisions based on data, illustrating the power and complexity of even the simplest form of neural networks in understanding and classifying images.

๐Ÿ’กMachine Learning

Machine learning is a subset of artificial intelligence that focuses on the development of algorithms and statistical models that enable computers to perform specific tasks without explicit instructions, relying instead on patterns and inference. The script highlights the relevance and importance of machine learning in modern technology, with neural networks being a prime example of a machine learning model that learns from data to recognize patterns, such as handwritten digits.

๐Ÿ’กActivation

In the context of neural networks, an activation refers to the value that a neuron holds, which is usually a function of its input. The video explains that neurons in the input layer of a neural network hold values representing the grayscale intensity of pixels in an image. Activations are crucial for the network's operation, as they propagate through the layers and determine the final output, showcasing the neural network's decision-making process.

๐Ÿ’กWeights and Biases

Weights and biases are fundamental components of neural networks that adjust during the learning process. Weights control the strength of the connection between neurons in different layers, and biases are added to the weighted sum before it is passed through an activation function. The video emphasizes their role in determining how activations of one layer influence those of the next, effectively shaping the network's ability to recognize complex patterns like handwritten digits.

๐Ÿ’กSigmoid Function

The sigmoid function is a type of activation function used in neural networks to convert values into a range between 0 and 1, mimicking the way neurons are activated in the brain. The video describes its application in neural networks to ensure that neuron activations stay within this bounded range, facilitating the modeling of probabilities and decisions.

๐Ÿ’กLearning

Learning in neural networks refers to the process of adjusting the weights and biases based on the data provided during training. The goal is to minimize the difference between the network's predictions and the actual target outcomes. The video plans to explore this concept in depth in a follow-up, highlighting how networks evolve and improve their ability to recognize handwritten digits through exposure to more data.

๐Ÿ’กHandwritten Digit Recognition

Handwritten digit recognition is a task that involves identifying the numbers from 0 to 9 written by hand, serving as a classic example of a pattern recognition problem tackled by neural networks. The video uses this application to illustrate the process and challenges of designing neural networks capable of learning from pixel data to accurately classify these digits.

๐Ÿ’กLayers

Layers in neural networks refer to the tiers of neurons structured in a sequence from input to output, including hidden layers in between. Each layer processes the activations from the previous layer and passes its output to the next. The video explains how these layers collectively work to abstract and interpret complex patterns in data, like identifying the features of handwritten digits.

๐Ÿ’กReLU

ReLU, or Rectified Linear Unit, is mentioned in the video as a modern alternative to the sigmoid function for activation in neural networks. It is simpler and often more effective in practice, being defined as the maximum of zero and the input. ReLU is highlighted for its role in facilitating the training of deep neural networks by providing a more efficient and less computationally expensive model of neuron activation.

๐Ÿ’กPattern Recognition

Pattern recognition is the ability to detect arrangements of characteristics or data that signal a relationship to specific conditions or attributes. The video discusses neural networks' capacity for pattern recognition, particularly in the context of identifying handwritten digits. By learning to recognize patterns in pixel values that correspond to specific numbers, neural networks demonstrate their utility in various applications that require the classification or interpretation of complex data.

Highlights

Brains can effortlessly recognize handwritten digits, even at low resolutions and with variations, which is an astonishingly complex task for computers.

Neural networks are inspired by the brain, with neurons represented as numbers between 0 and 1, and their activations are determined by the activations of neurons in the previous layer.

The hope is that neurons in hidden layers can learn to detect edges, patterns, and subcomponents that make up digits, with higher layers combining these features to recognize full digits.

Each neuron connection has a weight and bias, which are parameters that can be adjusted during training to capture relevant patterns.

The network has thousands of weights and biases that need to be tuned to make it perform the desired task, akin to tweaking various knobs and dials.

The transition of activations from one layer to the next can be represented compactly using matrix-vector notation and operations like matrix multiplication.

The entire network is essentially a function that takes in input data and produces output, albeit a highly complex function with many parameters.

The sigmoid function is used to squish the weighted sum of activations into the range between 0 and 1, mimicking the biological analogy of neurons being inactive or active.

Modern networks often use the ReLU (Rectified Linear Unit) activation function instead of sigmoid, as it is easier to train and perform well with deep networks.

Understanding the meaning of weights and biases can provide insights into how the network is solving the problem and guide further improvements.

The structure of the neural network, with its layers and connections, is motivated by the goal of breaking down the problem into simpler subproblems and combining solutions hierarchically.

The number of layers and neurons in each layer is a design choice that can be experimented with, as there is no definitive rule for the optimal structure.

Training the network involves finding the right settings for all the weights and biases to solve the problem, which is a complex optimization task.

Linear algebra concepts, such as matrices, vectors, and matrix-vector multiplication, are fundamental to understanding and working with neural networks.

While the network structure may seem complicated, its complexity is necessary to tackle challenging problems like digit recognition.

Transcripts

play00:04

This is a 3.

play00:06

It's sloppily written and rendered at an extremely low resolution of 28x28 pixels,

play00:10

but your brain has no trouble recognizing it as a 3.

play00:14

And I want you to take a moment to appreciate how

play00:16

crazy it is that brains can do this so effortlessly.

play00:19

I mean, this, this and this are also recognizable as 3s,

play00:23

even though the specific values of each pixel is very different from one

play00:27

image to the next.

play00:28

The particular light-sensitive cells in your eye that are firing when

play00:32

you see this 3 are very different from the ones firing when you see this 3.

play00:37

But something in that crazy-smart visual cortex of yours resolves these as representing

play00:42

the same idea, while at the same time recognizing other images as their own distinct

play00:47

ideas.

play00:49

But if I told you, hey, sit down and write for me a program that takes in a grid of

play00:54

28x28 pixels like this and outputs a single number between 0 and 10,

play00:59

telling you what it thinks the digit is, well the task goes from comically trivial to

play01:04

dauntingly difficult.

play01:07

Unless you've been living under a rock, I think I hardly need to motivate the relevance

play01:10

and importance of machine learning and neural networks to the present and to the future.

play01:15

But what I want to do here is show you what a neural network actually is,

play01:19

assuming no background, and to help visualize what it's doing,

play01:22

not as a buzzword but as a piece of math.

play01:25

My hope is that you come away feeling like the structure itself is motivated,

play01:28

and to feel like you know what it means when you read,

play01:31

or you hear about a neural network quote-unquote learning.

play01:35

This video is just going to be devoted to the structure component of that,

play01:38

and the following one is going to tackle learning.

play01:40

What we're going to do is put together a neural

play01:43

network that can learn to recognize handwritten digits.

play01:49

This is a somewhat classic example for introducing the topic,

play01:52

and I'm happy to stick with the status quo here,

play01:54

because at the end of the two videos I want to point you to a couple good

play01:57

resources where you can learn more, and where you can download the code that

play02:00

does this and play with it on your own computer.

play02:05

There are many many variants of neural networks,

play02:07

and in recent years there's been sort of a boom in research towards these variants,

play02:12

but in these two introductory videos you and I are just going to look at the simplest

play02:16

plain vanilla form with no added frills.

play02:19

This is kind of a necessary prerequisite for understanding any of the more powerful

play02:23

modern variants, and trust me it still has plenty of complexity for us to wrap our minds

play02:28

around.

play02:29

But even in this simplest form it can learn to recognize handwritten digits,

play02:33

which is a pretty cool thing for a computer to be able to do.

play02:37

And at the same time you'll see how it does fall

play02:39

short of a couple hopes that we might have for it.

play02:43

As the name suggests neural networks are inspired by the brain, but let's break that down.

play02:48

What are the neurons, and in what sense are they linked together?

play02:52

Right now when I say neuron all I want you to think about is a thing that holds a number,

play02:58

specifically a number between 0 and 1.

play03:00

It's really not more than that.

play03:03

For example the network starts with a bunch of neurons corresponding to

play03:08

each of the 28x28 pixels of the input image, which is 784 neurons in total.

play03:14

Each one of these holds a number that represents the grayscale value of the

play03:19

corresponding pixel, ranging from 0 for black pixels up to 1 for white pixels.

play03:25

This number inside the neuron is called its activation,

play03:28

and the image you might have in mind here is that each neuron is lit up when its

play03:32

activation is a high number.

play03:36

So all of these 784 neurons make up the first layer of our network.

play03:46

Now jumping over to the last layer, this has 10 neurons,

play03:49

each representing one of the digits.

play03:52

The activation in these neurons, again some number that's between 0 and 1,

play03:56

represents how much the system thinks that a given image corresponds with a given digit.

play04:03

There's also a couple layers in between called the hidden layers,

play04:06

which for the time being should just be a giant question mark for

play04:09

how on earth this process of recognizing digits is going to be handled.

play04:14

In this network I chose two hidden layers, each one with 16 neurons,

play04:17

and admittedly that's kind of an arbitrary choice.

play04:21

To be honest I chose two layers based on how I want to motivate the structure

play04:24

in just a moment, and 16, well that was just a nice number to fit on the screen.

play04:28

In practice there is a lot of room for experiment with a specific structure here.

play04:33

The way the network operates, activations in one

play04:35

layer determine the activations of the next layer.

play04:39

And of course the heart of the network as an information processing mechanism comes down

play04:43

to exactly how those activations from one layer bring about activations in the next layer.

play04:49

It's meant to be loosely analogous to how in biological networks of neurons,

play04:53

some groups of neurons firing cause certain others to fire.

play04:58

Now the network I'm showing here has already been trained to recognize digits,

play05:01

and let me show you what I mean by that.

play05:03

It means if you feed in an image, lighting up all 784 neurons of the input layer

play05:08

according to the brightness of each pixel in the image,

play05:11

that pattern of activations causes some very specific pattern in the next layer

play05:16

which causes some pattern in the one after it,

play05:18

which finally gives some pattern in the output layer.

play05:22

And the brightest neuron of that output layer is the network's choice,

play05:26

so to speak, for what digit this image represents.

play05:32

And before jumping into the math for how one layer influences the next,

play05:36

or how training works, let's just talk about why it's even reasonable

play05:40

to expect a layered structure like this to behave intelligently.

play05:44

What are we expecting here?

play05:45

What is the best hope for what those middle layers might be doing?

play05:48

Well, when you or I recognize digits, we piece together various components.

play05:54

A 9 has a loop up top and a line on the right.

play05:57

An 8 also has a loop up top, but it's paired with another loop down low.

play06:01

A 4 basically breaks down into three specific lines, and things like that.

play06:07

Now in a perfect world, we might hope that each neuron in the second

play06:11

to last layer corresponds with one of these subcomponents,

play06:15

that anytime you feed in an image with, say, a loop up top,

play06:18

like a 9 or an 8, there's some specific neuron whose activation is going to be close to 1.

play06:24

And I don't mean this specific loop of pixels,

play06:26

the hope would be that any generally loopy pattern towards the top sets off this neuron.

play06:32

That way, going from the third layer to the last one just requires

play06:36

learning which combination of subcomponents corresponds to which digits.

play06:41

Of course, that just kicks the problem down the road,

play06:43

because how would you recognize these subcomponents,

play06:45

or even learn what the right subcomponents should be?

play06:48

And I still haven't even talked about how one layer influences the next,

play06:51

but run with me on this one for a moment.

play06:53

Recognizing a loop can also break down into subproblems.

play06:57

One reasonable way to do this would be to first

play06:59

recognize the various little edges that make it up.

play07:03

Similarly, a long line, like the kind you might see in the digits 1 or 4 or 7,

play07:08

is really just a long edge, or maybe you think of it as a certain pattern of several

play07:13

smaller edges.

play07:15

So maybe our hope is that each neuron in the second layer of

play07:18

the network corresponds with the various relevant little edges.

play07:23

Maybe when an image like this one comes in, it lights up all of the

play07:27

neurons associated with around 8 to 10 specific little edges,

play07:31

which in turn lights up the neurons associated with the upper loop

play07:35

and a long vertical line, and those light up the neuron associated with a 9.

play07:40

Whether or not this is what our final network actually does is another question,

play07:44

one that I'll come back to once we see how to train the network,

play07:47

but this is a hope that we might have, a sort of goal with the layered structure

play07:52

like this.

play07:53

Moreover, you can imagine how being able to detect edges and patterns

play07:56

like this would be really useful for other image recognition tasks.

play08:00

And even beyond image recognition, there are all sorts of intelligent

play08:04

things you might want to do that break down into layers of abstraction.

play08:08

Parsing speech, for example, involves taking raw audio and picking out distinct sounds,

play08:12

which combine to make certain syllables, which combine to form words,

play08:16

which combine to make up phrases and more abstract thoughts, etc.

play08:21

But getting back to how any of this actually works,

play08:24

picture yourself right now designing how exactly the activations in one layer

play08:28

might determine the next.

play08:30

The goal is to have some mechanism that could conceivably combine pixels into edges,

play08:36

or edges into patterns, or patterns into digits.

play08:39

And to zoom in on one very specific example, let's say the

play08:43

hope is for one particular neuron in the second layer to pick

play08:46

up on whether or not the image has an edge in this region here.

play08:51

The question at hand is what parameters should the network have?

play08:55

What dials and knobs should you be able to tweak so that it's expressive

play08:59

enough to potentially capture this pattern, or any other pixel pattern,

play09:03

or the pattern that several edges can make a loop, and other such things?

play09:08

Well, what we'll do is assign a weight to each one of the

play09:11

connections between our neuron and the neurons from the first layer.

play09:16

These weights are just numbers.

play09:18

Then take all of those activations from the first layer

play09:21

and compute their weighted sum according to these weights.

play09:27

I find it helpful to think of these weights as being organized into a

play09:31

little grid of their own, and I'm going to use green pixels to indicate positive weights,

play09:35

and red pixels to indicate negative weights, where the brightness of

play09:38

that pixel is some loose depiction of the weight's value.

play09:42

Now if we made the weights associated with almost all of the pixels zero

play09:46

except for some positive weights in this region that we care about,

play09:50

then taking the weighted sum of all the pixel values really just amounts

play09:53

to adding up the values of the pixel just in the region that we care about.

play09:59

And if you really wanted to pick up on whether there's an edge here,

play10:02

what you might do is have some negative weights associated with the surrounding pixels.

play10:07

Then the sum is largest when those middle pixels

play10:10

are bright but the surrounding pixels are darker.

play10:14

When you compute a weighted sum like this, you might come out with any number,

play10:18

but for this network what we want is for activations to be some value between 0 and 1.

play10:24

So a common thing to do is to pump this weighted sum into some function

play10:28

that squishes the real number line into the range between 0 and 1.

play10:32

And a common function that does this is called the sigmoid function,

play10:35

also known as a logistic curve.

play10:38

Basically very negative inputs end up close to 0,

play10:41

positive inputs end up close to 1, and it just steadily increases around the input 0.

play10:49

So the activation of the neuron here is basically a

play10:52

measure of how positive the relevant weighted sum is.

play10:57

But maybe it's not that you want the neuron to

play10:59

light up when the weighted sum is bigger than 0.

play11:02

Maybe you only want it to be active when the sum is bigger than say 10.

play11:06

That is, you want some bias for it to be inactive.

play11:11

What we'll do then is just add in some other number like negative 10 to this

play11:15

weighted sum before plugging it through the sigmoid squishification function.

play11:20

That additional number is called the bias.

play11:23

So the weights tell you what pixel pattern this neuron in the second

play11:27

layer is picking up on, and the bias tells you how high the weighted

play11:31

sum needs to be before the neuron starts getting meaningfully active.

play11:36

And that is just one neuron.

play11:38

Every other neuron in this layer is going to be connected to

play11:42

all 784 pixel neurons from the first layer, and each one of

play11:46

those 784 connections has its own weight associated with it.

play11:51

Also, each one has some bias, some other number that you add

play11:54

on to the weighted sum before squishing it with the sigmoid.

play11:58

And that's a lot to think about!

play11:59

With this hidden layer of 16 neurons, that's a total of 784 times 16 weights,

play12:06

along with 16 biases.

play12:08

And all of that is just the connections from the first layer to the second.

play12:12

The connections between the other layers also have

play12:14

a bunch of weights and biases associated with them.

play12:18

All said and done, this network has almost exactly 13,000 total weights and biases.

play12:23

13,000 knobs and dials that can be tweaked and

play12:26

turned to make this network behave in different ways.

play12:31

So when we talk about learning, what that's referring to is

play12:34

getting the computer to find a valid setting for all of these

play12:37

many many numbers so that it'll actually solve the problem at hand.

play12:42

One thought experiment that is at once fun and kind of horrifying is to imagine sitting

play12:47

down and setting all of these weights and biases by hand,

play12:50

purposefully tweaking the numbers so that the second layer picks up on edges,

play12:54

the third layer picks up on patterns, etc.

play12:56

I personally find this satisfying rather than just treating the network as a total

play13:01

black box, because when the network doesn't perform the way you anticipate,

play13:04

if you've built up a little bit of a relationship with what those weights and biases

play13:09

actually mean, you have a starting place for experimenting with how to change the

play13:13

structure to improve.

play13:14

Or when the network does work but not for the reasons you might expect,

play13:18

digging into what the weights and biases are doing is a good way to challenge

play13:22

your assumptions and really expose the full space of possible solutions.

play13:26

By the way, the actual function here is a little cumbersome to write down,

play13:30

don't you think?

play13:32

So let me show you a more notationally compact way that these connections are represented.

play13:37

This is how you'd see it if you choose to read up more about neural networks. 214 00:13:41,380 --> 00:13:40,520 Organize all of the activations from one layer into a column as a vector.

play13:41

Then organize all of the weights as a matrix, where each row of that matrix corresponds

play13:50

to the connections between one layer and a particular neuron in the next layer.

play13:58

What that means is that taking the weighted sum of the activations in

play14:02

the first layer according to these weights corresponds to one of the

play14:05

terms in the matrix vector product of everything we have on the left here.

play14:14

By the way, so much of machine learning just comes down to having a

play14:17

good grasp of linear algebra, so for any of you who want a nice visual

play14:21

understanding for matrices and what matrix vector multiplication means,

play14:24

take a look at the series I did on linear algebra, especially chapter 3.

play14:29

Back to our expression, instead of talking about adding the bias to each one of

play14:33

these values independently, we represent it by organizing all those biases into a vector,

play14:38

and adding the entire vector to the previous matrix vector product.

play14:43

Then as a final step, I'll wrap a sigmoid around the outside here,

play14:46

and what that's supposed to represent is that you're going to apply the

play14:50

sigmoid function to each specific component of the resulting vector inside.

play14:55

So once you write down this weight matrix and these vectors as their own symbols,

play15:00

you can communicate the full transition of activations from one layer to the next in an

play15:05

extremely tight and neat little expression, and this makes the relevant code both a lot

play15:10

simpler and a lot faster, since many libraries optimize the heck out of matrix

play15:14

multiplication.

play15:17

Remember how earlier I said these neurons are simply things that hold numbers?

play15:22

Well of course the specific numbers that they hold depends on the image you feed in,

play15:27

so it's actually more accurate to think of each neuron as a function,

play15:31

one that takes in the outputs of all the neurons in the previous layer and spits out a

play15:36

number between 0 and 1.

play15:39

Really the entire network is just a function, one that takes in

play15:43

784 numbers as an input and spits out 10 numbers as an output.

play15:47

It's an absurdly complicated function, one that involves 13,000 parameters

play15:51

in the forms of these weights and biases that pick up on certain patterns,

play15:55

and which involves iterating many matrix vector products and the sigmoid

play15:59

squishification function, but it's just a function nonetheless.

play16:03

And in a way it's kind of reassuring that it looks complicated.

play16:07

I mean if it were any simpler, what hope would we have

play16:09

that it could take on the challenge of recognizing digits?

play16:13

And how does it take on that challenge?

play16:15

How does this network learn the appropriate weights and biases just by looking at data?

play16:20

Well that's what I'll show in the next video, and I'll also dig a little

play16:23

more into what this particular network we're seeing is really doing.

play16:27

Now is the point I suppose I should say subscribe to stay notified

play16:30

about when that video or any new videos come out,

play16:33

but realistically most of you don't actually receive notifications from YouTube, do you?

play16:38

Maybe more honestly I should say subscribe so that the neural networks

play16:41

that underlie YouTube's recommendation algorithm are primed to believe

play16:44

that you want to see content from this channel get recommended to you.

play16:48

Anyway, stay posted for more.

play16:50

Thank you very much to everyone supporting these videos on Patreon.

play16:54

I've been a little slow to progress in the probability series this summer,

play16:57

but I'm jumping back into it after this project,

play16:59

so patrons you can look out for updates there.

play17:03

To close things off here I have with me Lisha Li who did her PhD work on the

play17:07

theoretical side of deep learning and who currently works at a venture capital

play17:10

firm called Amplify Partners who kindly provided some of the funding for this video.

play17:15

So Lisha one thing I think we should quickly bring up is this sigmoid function.

play17:19

As I understand it early networks use this to squish the relevant weighted

play17:23

sum into that interval between zero and one, you know kind of motivated

play17:26

by this biological analogy of neurons either being inactive or active.

play17:30

Exactly.

play17:30

But relatively few modern networks actually use sigmoid anymore.

play17:34

Yeah.

play17:34

It's kind of old school right?

play17:35

Yeah or rather ReLU seems to be much easier to train.

play17:39

And ReLU, ReLU stands for rectified linear unit?

play17:42

Yes it's this kind of function where you're just taking a max of zero

play17:47

and a where a is given by what you were explaining in the video and

play17:52

what this was sort of motivated from I think was a partially by a

play17:56

biological analogy with how neurons would either be activated or not.

play18:01

And so if it passes a certain threshold it would be the identity function but if it did

play18:06

not then it would just not be activated so it'd be zero so it's kind of a simplification.

play18:11

Using sigmoids didn't help training or it was very difficult

play18:15

to train at some point and people just tried ReLU and it happened

play18:20

to work very well for these incredibly deep neural networks.

play18:25

All right thank you Lisha.