Simple explanation of convolutional neural network | Deep Learning Tutorial 23 (Tensorflow & Python)

codebasics
14 Oct 202023:54

Summary

TLDRThis script offers a simplified explanation of Convolutional Neural Networks (CNNs), ideal for beginners. It illustrates how CNNs can recognize patterns like handwritten digits and complex images by using filters to detect features like edges and shapes. The script clarifies the concept of feature maps, the role of ReLU for non-linearity, and the importance of pooling to reduce dimensions and computation. It also touches on the self-learning capability of CNNs to adjust filters during training, making it an intuitive and powerful tool for computer vision tasks.

Takeaways

  • 🧠 Convolutional Neural Networks (CNNs) are designed to recognize patterns in images, such as handwritten digits, by using a grid of numerical values representing pixel intensities.
  • πŸ” Traditional neural networks struggle with image recognition due to their inability to handle variations in image positioning and the immense computational load for large images.
  • 🌟 CNNs utilize filters or kernels to detect specific features within an image, such as edges or shapes, by applying a convolution operation that scans the image in a sliding window fashion.
  • πŸ”‘ The convolution operation involves multiplying the filter values with the corresponding image section and summing them up to create a feature map, which highlights areas of the image that match the filter's pattern.
  • πŸ‘€ Human brains recognize images by detecting features like eyes, nose, and ears, which is similar to how CNNs use different filters to identify features in images.
  • πŸ“ˆ CNNs reduce computational complexity through parameter sharing, where the same filter parameters are applied across the entire image, and through pooling layers that reduce the spatial dimensions of the feature maps.
  • πŸ”„ Pooling layers, such as max pooling, help in making the CNN invariant to small translations and distortions in the image by selecting the most prominent features within a region.
  • πŸ”§ The use of ReLU (Rectified Linear Unit) activation function introduces non-linearity into the CNN, which is essential for solving complex pattern recognition tasks.
  • πŸ€– CNNs learn the optimal filters during the training phase through backpropagation, without the need for manual filter selection, allowing the network to automatically adapt to the features present in the training data.
  • πŸ”„ Data augmentation techniques, such as rotating or scaling images, can be used to increase the robustness of CNNs to variations like rotation and scaling in the input images.
  • πŸ“š The script is an educational resource provided by Daval Patel, who offers tutorials on data science, machine learning, Python programming, and career guidance on his YouTube channel.

Q & A

  • What is the main issue with using a grid of numbers to represent an image for a computer?

    -The main issue is that it is too hard-coded and sensitive to shifts or variations in the image. For example, a slight shift in the position of a handwritten digit can change the representation, causing the computer to fail in recognizing the digit.

  • Why is a dense neural network not efficient for handling larger images like the one of a koala?

    -A dense neural network would require an enormous number of weights to be calculated between the input and hidden layers, leading to a high computational cost that is impractical for large images with many pixels and RGB channels.

  • How do convolutional neural networks (CNNs) address the issue of local features in images?

    -CNNs use filters or convolution operations to detect local features in images. These filters act as feature detectors that can identify patterns regardless of their position in the image, thus addressing the issue of locality.

  • What is the purpose of a feature map in CNNs?

    -A feature map is the result of applying a convolution operation with a filter. It highlights areas in the image where the specific feature the filter is designed to detect is present, effectively capturing the presence of that feature throughout the image.

  • How does the stride parameter affect the size of the feature map?

    -The stride determines the step size the filter moves across the image. A larger stride results in a smaller feature map because fewer positions are covered by the filter.

  • What is pooling, and what are its benefits in CNNs?

    -Pooling is an operation that reduces the dimensions of a feature map, typically by taking the maximum (max pooling) or average (average pooling) value within a certain window. It benefits CNNs by reducing computational load, mitigating overfitting, and making the model more tolerant to variations and distortions.

  • How does the ReLU (Rectified Linear Unit) activation function introduce non-linearity into a CNN?

    -The ReLU activation function introduces non-linearity by setting all negative values in the feature map to zero, while keeping positive values unchanged. This simple operation allows the model to learn complex patterns in the data.

  • What is the role of the fully connected layer in a CNN after the convolution and pooling layers?

    -The fully connected layer serves as the classification part of the CNN. It takes the flattened output from the convolution and pooling layers and uses it to make predictions about the image, handling the variety in inputs to classify them effectively.

  • How does a CNN learn the filters during the training process?

    -During training, a CNN uses backpropagation to adjust the filters based on the training data. The network starts with random filters and learns the optimal filter values through the training process to effectively detect features in the images.

  • What is data augmentation, and how does it help in training CNNs to handle variations like rotation and scaling?

    -Data augmentation is a technique where new training samples are artificially created by applying transformations like rotation, scaling, and translation to the existing data. This helps the CNN to learn to recognize features under various conditions and improves its ability to generalize across different image variations.

Outlines

00:00

🧠 Introduction to Convolutional Neural Networks

This paragraph introduces the concept of Convolutional Neural Networks (CNNs) in a simplified manner, suitable for high school students. It discusses the challenge of recognizing handwritten digits like '9' using a computer, and the limitations of hard-coded grids and RGB values. The paragraph explains how traditional Artificial Neural Networks (ANNs) struggle with large images due to the computational complexity of millions of weights. It also touches upon the human brain's ability to recognize features in images, such as the distinct parts of a koala, and sets the stage for the need of CNNs to mimic this feature detection capability.

05:02

πŸ” The Role of Filters in CNNs

This section delves into the function of filters in CNNs, using the example of recognizing the digit '9'. It describes how filters, or kernels, are used to detect specific features in an image, such as the loopy circle pattern at the top, the vertical line in the middle, and the diagonal line at the end. The paragraph explains the convolution operation, which involves applying these filters to the original image to create a feature map that highlights areas where the specific features are detected. It also discusses the importance of stride and the size of the filter, and how the feature map serves as a detector for specific features, making the model location invariant.

10:02

🌐 Advanced CNN Concepts: Feature Maps and Pooling

Building upon the foundation of filters, this paragraph introduces the concept of feature maps and pooling layers in CNNs. It explains how multiple feature maps can be generated by applying different filters to detect various features of an object, such as the eyes, nose, and ears of a koala. The paragraph also describes how pooling operations, specifically max pooling, reduce the dimensionality of the feature maps, thus decreasing computational load and preventing overfitting. It highlights the benefits of pooling, including feature invariance to small variations and distortions in the image.

15:05

πŸ”§ The Mechanics of CNNs: Convolution, Activation, and Pooling

This section provides a deeper understanding of the mechanics within a CNN, emphasizing the iterative process of applying convolution, activation functions like ReLU, and pooling layers. It explains how these components work together to progressively reduce the spatial dimensions of the data while extracting increasingly complex features. The paragraph also touches on the concept of parameter sharing in convolution, which contributes to the efficiency of CNNs, and the role of fully connected layers in classification after feature extraction.

20:05

πŸ›  Handling Complex Variations in CNNs

This paragraph addresses the limitations of CNNs in handling variations such as rotation and scale, and introduces the concept of data augmentation as a solution. It explains how training a CNN with a diverse set of samples, including rotated and scaled images, can help the network learn to recognize features despite these variations. The paragraph also summarizes the key components and benefits of CNNs, such as connection sparsity, location invariant feature detection, and parameter sharing, and hints at the self-learning capability of CNNs during training.

πŸ“š Summary and Future Outlook of CNNs

The final paragraph provides a concise summary of the entire explanation of CNNs, outlining the steps involved in processing an image with a CNN, from convolution and activation to pooling and classification. It emphasizes the network's ability to learn filters automatically during training, a process facilitated by backpropagation. The paragraph also introduces the author, Daval Patel, and invites viewers to follow his YouTube channel for further tutorials on data science, machine learning, and deep learning, including practical coding applications of CNNs.

Mindmap

Keywords

πŸ’‘Convolutional Neural Network (CNN)

A Convolutional Neural Network is a type of deep learning algorithm widely used in computer vision tasks. It is designed to process data with a grid-like topology, such as images. In the video, CNNs are explained as a way to handle the variability in image recognition by using filters to detect features like edges and shapes, which is crucial for tasks like recognizing handwritten digits or identifying objects in photos.

πŸ’‘Feature Map

A feature map is the result of applying a convolution operation to an input image with a specific filter. It highlights regions in the image where the filter's pattern is detected. In the context of the video, feature maps are used to detect and visualize features like the round eyes or fluffy ears of a koala, playing a key role in object recognition within a CNN.

πŸ’‘Filter or Kernel

In the video, a filter or kernel is a small matrix used in convolution operations to scan the input image and detect specific features. The script explains that these filters can be learned by the CNN during training to automatically identify patterns like the loopy circle pattern of the digit '9' or the eyes of a koala.

πŸ’‘Stride

Stride refers to the step size the filter moves across the image during the convolution process. The script uses the term to explain how the feature map's resolution can be controlledβ€”smaller strides result in a more detailed feature map, while larger strides can reduce the spatial dimensions of the output.

πŸ’‘ReLU (Rectified Linear Unit)

ReLU is an activation function used in neural networks, including CNNs, to introduce non-linearity into the model. The video script describes ReLU as a function that replaces all negative values in a feature map with zero, simplifying the model and making it faster to train without losing the essential information.

πŸ’‘Pooling

Pooling is an operation used in CNNs to reduce the spatial size of the feature maps, which in turn reduces the number of parameters and computation in the network. The video mentions max pooling, where the maximum value in a region is taken, as a way to make the model invariant to small translations and distortions in the input image.

πŸ’‘Fully Connected Layer

A fully connected layer, as mentioned in the script, is a traditional neural network layer where each neuron is connected to every neuron in the next layer. In the context of CNNs, these layers are typically placed after the convolution and pooling layers to perform classification based on the features extracted by the CNN.

πŸ’‘Backpropagation

Backpropagation is the algorithm used to train neural networks, including CNNs, by adjusting the weights of the network to minimize the error in predictions. The video script highlights that during training, CNNs use backpropagation to learn the optimal filters for feature detection automatically.

πŸ’‘Parameter Sharing

Parameter sharing is a concept where the same filter parameters are applied across the entire input image in a CNN. The script explains this as an advantage because it reduces the number of parameters the network needs to learn, simplifying the model and making it more efficient.

πŸ’‘Data Augmentation

Data augmentation is a technique used to increase the diversity of the training dataset by creating modified versions of the input data, such as rotated or scaled images. The video script uses this term to explain how CNNs can be trained to handle variations like rotation and scaling, which are common in real-world images.

πŸ’‘Overfitting

Overfitting occurs when a model learns the training data too well, including its noise and details, to the extent that it negatively impacts the model's performance on new, unseen data. The video script discusses how techniques like pooling and ReLU can help reduce overfitting by making the model more generalizable.

Highlights

A simple explanation of convolutional neural networks (CNNs) is provided, making it accessible to high school students.

CNNs are used to recognize patterns such as handwritten digits, overcoming the issue of varying digit placement.

Traditional methods of image recognition using RGB values are too hard-coded and lack flexibility for variations.

Artificial neural networks (ANNs) are introduced to handle the variety in handwritten digit recognition.

ANNs face challenges with larger images due to the computational complexity of millions of weights.

The locality of image recognition is emphasized, as the position of features like a koala's face matters.

Neuroscience insights are applied to CNNs by mimicking how humans recognize images through distinct features.

Filters or feature detectors are used in CNNs to identify small features like edges and loops in digits.

The convolution operation is explained as a process to apply filters to an image to create a feature map.

Stride and filter size are discussed as parameters that influence the convolution operation.

Feature maps are highlighted as the result of convolution, showing the activation of certain features.

The concept of location invariance in feature detection is introduced, allowing for detection regardless of feature position.

Pooling operations, such as max pooling, are explained to reduce dimensions and computation in CNNs.

Benefits of pooling include reduced overfitting, dimension reduction, and tolerance to distortions.

The combination of convolution, ReLU (Rectified Linear Unit), and pooling forms the foundation of CNNs.

The fully connected dense neural network is used in CNNs for classification after feature extraction.

Data augmentation is suggested as a technique to handle rotation and scaling in CNNs.

The self-learning capability of CNNs is emphasized, as networks learn the optimal filters during training.

A summary of CNNs is provided, outlining the process from input image to classification.

The presenter, Daval Patel, introduces himself and his educational content on data science, machine learning, and programming.

Transcripts

play00:00

i will give you a very simple

play00:01

explanation of convolutional neural

play00:04

network without using much mathematics

play00:06

so that even a high school student can

play00:08

understand it easily

play00:10

let's say you want the computer to

play00:12

recognize the handwritten digit

play00:14

9. the way computer looks at this is

play00:18

as a grid of numbers here i'm using -1

play00:22

and 1.

play00:22

in reality it will use rgb numbers

play00:26

from 0 to 255.

play00:29

the issue with this presentation is that

play00:32

this is too much hard-coded

play00:35

if you have a little shift in digit 9

play00:37

for example

play00:38

9 here was in the middle but in this

play00:40

case it is in the left

play00:43

and the representation of numbers just

play00:46

changes

play00:47

it doesn't matches match with our

play00:49

original

play00:51

number grid and computer will not be

play00:53

able to

play00:54

recognize that this is number nine

play00:57

there could be a variation since it is a

play00:59

handwritten digit

play01:01

there could be variation in how you

play01:03

write it

play01:04

which will change the two-dimensional

play01:07

representation of numbers

play01:08

and again you will not be able to match

play01:10

it with the original grid

play01:14

so we use artificial neural network

play01:17

for this kind of case to handle the

play01:19

variety

play01:21

in this deep learning series we have

play01:23

already looked at

play01:24

artificial neural network video on

play01:27

handwritten digits

play01:28

recognization if you are not seen that

play01:31

video please make sure you see it so

play01:33

that

play01:33

your fundamentals on artificial neural

play01:35

networks are clear

play01:37

in that we created a one-dimensional

play01:40

array by flattening the

play01:42

two-dimensional representation of our

play01:45

hand

play01:45

written digit number and then we build a

play01:48

neural network with

play01:49

one hidden layer and output layer

play01:52

and this dense neural network will work

play01:54

okay for a simple

play01:56

image like handwritten digit but when

play01:59

you have a bigger image

play02:00

let's see this little cute looking koala

play02:04

the image size is 1920 by 1080

play02:07

we have three as rgb channel here

play02:11

one for red green and blue in this case

play02:15

the first layer neuron itself will be

play02:18

six million

play02:19

if you have let's say hidden layer with

play02:22

4 million neurons

play02:23

you're talking about 24 million

play02:27

weights to be calculated just between

play02:30

the input and hidden layer

play02:32

and remember deep neural networks have

play02:35

many hidden layers so this can go

play02:38

easily into like 500 million or 1

play02:40

billion

play02:41

of weights that you have to compute and

play02:44

that's too much computation for your

play02:46

little computer

play02:47

see my rabbits are getting electrical

play02:50

shock

play02:51

because it's just too much to do

play02:55

so the disadvantages of using a n or

play02:58

artificial neural network for image

play03:00

classification is

play03:01

too much computation it also treats

play03:04

local pixels same as pixels far apart

play03:07

if you have koala's face in a left

play03:09

corner versus right corner

play03:12

it is still a koala doesn't matter where

play03:14

the face is located

play03:16

so the image recognization task is

play03:19

centered around the locality

play03:23

okay so if the pixels are moved around

play03:26

it should still be able to detect the

play03:29

object in an image but with a n it's

play03:32

hard

play03:33

so how does human recognize this image

play03:37

so easily so let's go into the

play03:40

neuroscience little bit

play03:41

and try to see how we as humans

play03:44

recognize

play03:45

any image so easily when we look at

play03:48

koala's

play03:49

image we look at the little features

play03:53

like this round eyes

play03:54

this black prominent flat nose

play03:58

this fluffy ears and we detect these

play04:01

features

play04:02

one by one in our brain

play04:05

there are different set of neurons

play04:07

working on these different

play04:09

features and they're firing they're

play04:11

saying yeah i found koala's ears

play04:14

yes i found koala's nose and so on

play04:18

then these neurons are connected to

play04:20

another set of neurons

play04:22

which will aggregate the results it will

play04:24

say

play04:25

if in the image you are seeing koalas

play04:27

eye nose and ears

play04:29

it means there is a koala's face in the

play04:32

image

play04:33

similarly if there is koala's hands and

play04:36

legs

play04:36

it means there is koala's body and there

play04:40

are different set of neurons which are

play04:41

connected

play04:42

to these neurons which will again

play04:44

aggregate the results saying that

play04:46

if the image has koala's head and body

play04:49

it means it is koala's image

play04:54

same thing with handwritten digit nine

play04:57

there are these little edges

play04:59

which come together and form a loopy

play05:02

circle pattern

play05:03

which is kind of like a head of digit

play05:05

nine

play05:06

in the middle you have a vertical line

play05:08

at the bottom you have a diagonal line

play05:11

sometimes you don't have diagonal line

play05:13

at all but

play05:15

we know that whenever there is a loopy

play05:18

circle

play05:18

pattern at the top vertical line in the

play05:21

middle

play05:22

diagonal line in the end that means

play05:24

digit nine

play05:27

so how can we make computers recognize

play05:29

these

play05:30

tiny features we use the concept of

play05:34

filter in case of nine

play05:38

we have three filters the first one is

play05:41

the

play05:41

head which is a loopy circle pattern

play05:45

in the middle you have vertical line in

play05:48

the end you have diagonal filter

play05:54

so we take our original image and

play05:58

we will apply a convolution operation

play06:01

or a filter operation so here i have a

play06:04

loopy circle pattern or a head filter

play06:08

this filter right here

play06:13

the way convolution operation works is

play06:16

you take three by three grid from your

play06:19

original image

play06:21

and multiply individual numbers with

play06:23

this filter so this minus 1 is

play06:25

multiplied with this one

play06:27

this one is multiplied with this one and

play06:29

so on

play06:31

in the end you get a result and then you

play06:34

find the average

play06:35

which is divided by 9 because there are

play06:36

total 9 numbers

play06:39

and whatever number you get you put it

play06:41

here

play06:42

now this particular thing is called a

play06:44

feature map so by doing this convolution

play06:47

operation you are creating a feature map

play06:49

so you do it for the second round of

play06:53

three by three grid here i'm taking a

play06:55

stride of

play06:56

one you can take a stride of two or

play06:59

three

play07:00

also you don't need to have three by

play07:03

three filter

play07:04

you can have four by four or five by

play07:06

five filter

play07:08

and then you keep on doing this

play07:12

for your entire number and in the end

play07:16

what you get is called a feature map now

play07:19

the benefit here is

play07:21

wherever you see number one or a number

play07:24

that is close to one

play07:25

it means you have a loopy circle pattern

play07:28

so this is

play07:29

detecting a feature in the case of koala

play07:32

this would be eye or a nose

play07:34

because for koala i knows ears are the

play07:37

features

play07:38

so by applying loopy pattern detector i

play07:41

got this one here in my feature map

play07:45

i also call it the feature is activated

play07:48

you know

play07:49

it got activated here

play07:52

for number six it will be activated in

play07:55

the bottom in this area

play07:58

if you have two loopy patterns the

play08:01

feature will be activated at top and

play08:03

bottom

play08:04

if your number like this it might be

play08:06

activated in different area

play08:08

in summary when you apply this filter or

play08:11

a convolution operation

play08:13

you are generating a feature map

play08:17

that has that particular feature

play08:19

detected

play08:20

so in a way filters are nothing but the

play08:22

feature detectors

play08:25

for koala's case you can have eye

play08:27

detector

play08:28

and when you apply convolution operation

play08:30

in the result see

play08:31

you got these two eyes at this location

play08:35

if the eyes are at a different location

play08:37

it will still detect because you are

play08:39

moving the filter

play08:40

throughout the image

play08:43

and they are location invariant which

play08:46

means doesn't matter

play08:47

where the eyes are in the image these

play08:50

filters

play08:51

will detect those eyes and it will

play08:53

activate those particular regions

play08:57

here i have six eyes from three

play08:59

different koalas

play09:01

and they are activated accordingly great

play09:06

the hand of koala is in this particular

play09:08

region

play09:09

for therefore when i apply hence

play09:11

detector

play09:12

it will activate here

play09:16

now for number nine and i'm just moving

play09:19

between number nine and koala so that

play09:22

the presentation is simple enough and

play09:24

you still get an idea

play09:27

in case of nine we saw that we need to

play09:30

apply three filters

play09:32

the head the middle part and the tail

play09:36

and when you apply those you get three

play09:38

feature maps

play09:40

so i apply three filters i got three

play09:43

feature maps

play09:44

and this is how these feature maps are

play09:47

represented if you're reading any online

play09:49

article

play09:50

or a book they are kind of stacked

play09:52

together

play09:53

and they almost form a 3d volume

play09:58

in case of koala my eye nose and

play10:02

ear filters will produce three different

play10:05

feature maps

play10:06

and i can apply convolution operation

play10:09

again and let's say this time the filter

play10:13

is to detect head

play10:15

by the way the filter doesn't have to be

play10:17

2d

play10:18

it can be three dimensional as well

play10:21

so just imagine this first dimension

play10:25

is representing eyes and the second

play10:27

slice is representing nose and third

play10:29

slice

play10:30

representing ears and by doing that

play10:33

filter

play10:33

you can say that koala's head in

play10:37

is in this particular region of an image

play10:40

so you are aggregating this result using

play10:43

a different filter for head

play10:45

and now this becomes a koala head

play10:48

detector

play10:49

similarly there could be koala body

play10:52

detector

play10:53

and now we got these two new feature

play10:56

maps

play10:57

where this feature map is saying that

play10:58

koala's head is at this location and

play11:01

paula's body is at this particular

play11:02

location

play11:04

then we flatten these numbers see in the

play11:06

end these are like

play11:08

two dimensional numbers so we can

play11:10

flatten them

play11:11

so to convert 2d array into 1d array

play11:15

and then when you get these two array

play11:18

just join them together after you join

play11:22

you can make a fully connected

play11:26

dense neural network for your

play11:28

classification

play11:30

now why do we need this

play11:33

fully connected network here well you

play11:37

can have a different image of koala see

play11:39

my koala is sleeping he's tired

play11:42

so now his eyes and ears are at a

play11:45

different location look at his ears

play11:47

see they're here for previous image

play11:51

the ears were in a different location

play11:54

so that generates a different type of

play11:58

flattened array here

play11:59

and you all know if you know basics

play12:02

about neural network that

play12:04

neural networks are used to handle the

play12:08

variety in your inputs

play12:10

such that it can classify those variety

play12:14

of inputs in a

play12:15

generic way here the first part where we

play12:19

use

play12:20

convolutional convolution operation

play12:23

is feature extraction part and the

play12:26

second portion where we are using dense

play12:28

neural network is called classification

play12:31

because the first part is detecting all

play12:33

the features ears nose eyes head and

play12:35

body etc

play12:37

and the second part is responsible for

play12:40

classification

play12:42

we also perform a value operation so

play12:45

so this is not a complete convolutional

play12:48

neural network

play12:49

there are two other components one is

play12:51

value

play12:53

which is nothing but if you have seen my

play12:55

activation

play12:56

video on the same deep learning tutorial

play12:59

series

play13:01

we use erect value activation to bring

play13:04

non-linearity in our model so what it

play13:07

will do is

play13:08

it will take your feature map and

play13:10

whatever negative values are there

play13:12

it just replaces that with zero it is so

play13:16

easy

play13:17

and if the value is more than zero it

play13:19

will keep it as it is

play13:21

so you see just look at the values it's

play13:23

pretty straightforward

play13:25

a value helps with making the model

play13:28

non-linear because you are

play13:31

picking bunch of values and making them

play13:33

zero so if you see my previous videos in

play13:36

this deep learning tutorial series

play13:38

you will get an idea on why it brings

play13:41

the non-linearity especially see the

play13:43

video on

play13:44

the activations in the same tutorial

play13:46

series

play13:47

the the link of this playlist is in the

play13:50

in the video description below

play13:51

so you'll understand why relu makes it

play13:53

non-linear

play13:55

but we did not address the issue of too

play13:57

much computation yet

play13:59

my rabbits are still getting electrical

play14:02

shock

play14:02

do something because see for this image

play14:06

size

play14:07

if you are applying convolution let's

play14:10

say with some padding

play14:12

you're still getting same size of image

play14:15

you did not reduce the

play14:16

image size sometimes people don't use

play14:19

padding so they

play14:20

reduce the image size but only little

play14:22

bit

play14:24

so pulling is used to reduce the size so

play14:27

main purpose of pulling is to reduce the

play14:29

dimensions so that

play14:31

my computer doesn't get this shock you

play14:33

know

play14:34

so the first pulling operation is um

play14:38

the max pulling so here you take a

play14:41

window of 2x2

play14:43

and you pick the maximum number from

play14:45

that window and put it here

play14:47

so here check this yellow window 5 1 8 2

play14:49

what is the maximum number 8

play14:51

so put 8 here here what is the maximum

play14:54

number 9 so put 9 here

play14:57

similarly here maximum number in green

play14:59

window is three so put three

play15:01

so you take the feature map apply your

play15:04

convol

play15:05

your pulling and generate a new feature

play15:08

map

play15:09

after the pulling but the new feature

play15:12

map

play15:12

is half the size if you look at the

play15:14

numbers you know you have reduced your

play15:16

16 numbers into four

play15:18

so it's too much or saving in your

play15:20

computation

play15:23

so how it will look for our digit nine

play15:25

case when you apply max pooling

play15:28

well you can do a stride of one in this

play15:31

case we did

play15:32

two by two window and stride of two

play15:34

start of two means

play15:36

once we are done with this window we

play15:38

move two points forward

play15:40

for further two pixels further in this

play15:43

case we can do one stride see this is

play15:45

one stride

play15:46

you get an idea and we keep on taking

play15:49

max

play15:51

and this is what we get when our number

play15:55

is shifted so see this is the original

play15:57

number where we got this max pooling

play15:59

map when number is shifted you get

play16:03

this pulling map so still you are

play16:05

detecting the

play16:08

loopy pattern at the top so max pulling

play16:11

along with convolution

play16:13

helps you with

play16:17

position invariant feature detection

play16:21

doesn't matter where your eyes or ears

play16:23

are in the

play16:24

image it will detect that feature for

play16:27

you

play16:29

there is average pooling also instead of

play16:31

max you just make an average see

play16:33

5 and 1 6 and 2 8 8 and 8 16.

play16:37

16 divided by 4 is 4. so

play16:40

but max pooling is more generally used

play16:43

but sometimes people use average pooling

play16:45

also

play16:47

so benefits of pooling number one

play16:49

obvious it's

play16:50

reducing your dimension and computation

play16:53

the second benefit is reduce

play16:55

overfitting because there are less

play16:56

parameters

play16:58

and the third one is model is variant

play17:01

tolerant towards variation and

play17:02

distortion because if there is a

play17:04

distortion

play17:06

and if you're picking just a maximum

play17:08

number you are capturing the

play17:09

main feature and you are filtering all

play17:13

the noise

play17:15

so this is how our complete

play17:18

convolutional neural network looks like

play17:21

in that you will have typically a

play17:24

convolution and value layer

play17:26

then you will have pulling then there

play17:28

will be another convolution value

play17:30

pulling

play17:30

there could be n number of layers for

play17:34

convolution and pulling

play17:36

and in the end you will have fully

play17:38

connected dense neural

play17:40

network in this particular case the

play17:42

first

play17:43

convolution layer is detecting eye nose

play17:46

and ears

play17:47

many times you will start with the

play17:49

little edges you don't even start with

play17:50

eye and nose but

play17:52

here for the simplicity i have put them

play17:54

but usually you start with edges then

play17:56

you go to eye nose ears then you go to

play17:58

head and body

play17:59

and then you do flattening again

play18:02

anything on the left hand side of this

play18:04

vertical line is feature extraction

play18:06

so the main idea behind convolutional

play18:09

neural network is feature extraction

play18:11

because the second part is same it is a

play18:13

simple artificial neural network

play18:15

but by doing this convolution you are

play18:17

detecting the features

play18:19

you are also reducing the dimension

play18:23

there are three benefits of convolution

play18:25

operation

play18:26

the first one is connection sparsity

play18:30

reduces overfitting connection sparsity

play18:33

means

play18:34

not every node is connected with

play18:38

every other node like in artificial

play18:40

neural network

play18:41

where we call that a dense network

play18:44

here we have a filter which we move

play18:46

around the image

play18:48

and at a time we are talking about only

play18:50

a local region

play18:52

so we are not affecting the whole image

play18:55

the second benefit is

play18:57

convolution and pulling operation

play18:59

combined

play19:00

gives you a location invariant feature

play19:03

detection

play19:04

which means koala's eye could be in the

play19:07

left corner in the right corner

play19:09

anywhere we will still detect it

play19:13

third is a parameter sharing which is

play19:16

when you learn the parameters for a

play19:19

filter

play19:20

you can apply them in the entire image

play19:25

the benefit of value is that it

play19:27

introduces non-linear

play19:29

linearity which is essential because

play19:32

when we are solving a

play19:33

deep learning problems they are

play19:35

non-linear by nature

play19:38

it also speeds up training and it is

play19:41

faster to compute

play19:42

remember value is you are just doing one

play19:44

check whether the number is greater than

play19:46

zero or not

play19:48

if it is greater than zero keep the

play19:49

number less than zero make it zero

play19:54

the benefit of pulling is that it

play19:56

reduces dimension and computation

play19:58

it reduces overfitting and makes the

play20:01

model

play20:01

tolerant to our small distortions

play20:05

how about rotation and thickness because

play20:10

by itself cnn cannot handle

play20:13

the rotation and the thickness

play20:18

so you need to have training samples

play20:22

which have some rotated and scaled

play20:24

sample you know some thick samples some

play20:26

thin samples and if you don't you can

play20:29

use

play20:30

data augmentation technique what is data

play20:33

augmentation

play20:35

let's say for handwritten digits you

play20:37

take your original data set

play20:39

and then you pick few samples and then

play20:41

you rotate them manually

play20:43

or you make them larger or you make them

play20:45

smaller

play20:47

thicker or thinner and you generate new

play20:49

samples

play20:51

by doing that you can handle rotation

play20:53

and scale

play20:55

in convolutional neural network

play20:58

once again here is a quick summary of

play21:01

what is convolutional neural network you

play21:03

can take a screenshot of this image

play21:06

put it at your desk if you are trying to

play21:08

learn cnn

play21:10

and a computer vision to summarize

play21:13

you take your input image then you apply

play21:15

convolution operation and value

play21:18

then you apply pulling again convolution

play21:20

value pulling

play21:21

and you can do this n number of times

play21:24

after that the second stage

play21:26

is classification where you use densely

play21:29

connected neural network

play21:31

now very important thing to mention here

play21:35

is these filters the network will learn

play21:38

on its own

play21:40

in previous presentation we saw that we

play21:42

applied

play21:43

those filters by hand but this is the

play21:46

beauty of convolutional neural network

play21:48

that

play21:49

it will automatically detect these

play21:52

filters

play21:52

on its own and that is part of the

play21:54

training so when the neural network is

play21:57

training or when the cnn is training

play21:59

because

play22:00

you're supplying thousands of koalas

play22:02

images here

play22:03

using that it will use back propagation

play22:06

and it will figure out

play22:08

the right amount of filters it will

play22:11

figure out the values in this filter

play22:13

and that is part of the learning or the

play22:16

back propagation

play22:18

as a hyper parameter you will specify

play22:21

how

play22:22

how many filters you want to uh have

play22:25

and what is the size of each of the

play22:27

filters that's it

play22:29

but you do not specify the exit values

play22:33

within these filters the network will

play22:36

learn those

play22:37

on its own and this is that is the most

play22:40

fascinating part about

play22:42

neural network in general in next few

play22:45

videos

play22:46

we will be doing coding using

play22:48

convolutional neural network

play22:50

and will be solving variety of computer

play22:53

vision problems

play22:54

so i hope you like this explanation if

play22:57

you don't know me

play22:58

i'm daval patel i teach

play23:01

data science machine learning python

play23:03

programming and career guidance

play23:05

on my youtube channel if you are

play23:08

starting machine learning

play23:09

and if you are looking for a very basic

play23:11

beginner's level of tutorials

play23:13

then i have a complete playlist you can

play23:16

start with

play23:16

very basic python and pandas knowledge

play23:19

on this playlist

play23:20

and can learn machine learning in a very

play23:22

very easy to understand manner

play23:24

then gradually in this playlist i try to

play23:27

cover

play23:28

data science and machine learning

play23:29

projects as well i'm continuing my deep

play23:32

learning tutorial series right now

play23:34

and my goal is to finish all the topics

play23:36

in deep learning

play23:37

including convolutional neural networks

play23:41

rnns language models and so on so

play23:44

please stay tuned uh watch my videos and

play23:47

if you have any comments or feedback

play23:49

please let me know in the video comment

play23:52

below

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Deep LearningNeural NetworksMachine LearningData ScienceImage RecognitionAI BasicsTutorialBeginner FriendlyComputer VisionConvolution