Pooling and Padding in Convolutional Neural Networks and Deep Learning

Nicolai Nielsen
23 Feb 202120:17

Summary

TLDRThis tutorial video dives into the intricacies of pooling in convolutional neural networks (CNNs), a crucial component for feature extraction and dimensionality reduction. It explains the concept of pooling, focusing on max pooling and average pooling, and their respective roles in identifying the most significant features within an image. The video also covers the use of padding to maintain image dimensions during the pooling process, which is essential for preserving detail for further feature extraction in deeper layers of the CNN. Practical examples using TensorFlow and Keras illustrate how to implement max pooling with and without padding, and how to adjust stride parameters to control the dimensionality of the output. The presenter emphasizes the importance of understanding pooling and padding for optimizing neural network performance and efficiency, especially when dealing with complex applications.

Takeaways

  • πŸ“š The video provides an introduction to pooling in convolutional neural networks (CNNs), focusing on max pooling and its application after convolutional layers.
  • 🌟 Max pooling is used to extract the most important features from a given region in the image, typically by selecting the maximum value within a defined area, such as 2x2 or 3x3.
  • πŸ”’ Average pooling is an alternative method that smoothens the information by taking the average value within the specified area, which can be useful when preserving background information is important.
  • 🌐 Global pooling operates on the entire image, either using max or average pooling to produce a single value that can be used for classification tasks without a fully connected layer.
  • πŸ“‰ Max pooling reduces the spatial dimensions of the image, which can help in downscaling while retaining important features, thus simplifying the network and improving computational efficiency.
  • πŸ”„ Padding is a technique used to maintain the dimensionality of the image after pooling by adding zeros around the image's border, which can be particularly useful in preserving the resolution for subsequent layers.
  • πŸ› οΈ The video demonstrates how to implement max pooling and padding in a CNN using TensorFlow and Keras, showcasing how to adjust pool size, strides, and padding within the model architecture.
  • ↔️ Striding is a parameter that determines how the pooling window moves across the image; a stride of one moves the window element by element, whereas a larger stride covers more ground with each step.
  • ↕️ The choice between using padding or not depends on the desired outcome: using 'same' padding maintains the image dimensions, while 'valid' reduces them, which can be useful for feature extraction at lower resolutions.
  • πŸ” The video emphasizes the importance of understanding how pooling and padding affect the CNN's performance, as they influence the network's ability to learn from and make predictions based on image features.
  • πŸ“ˆ By reducing the image dimensions and complexity through pooling, the network can focus on the most significant features, which can enhance training and prediction speeds.

Q & A

  • What is the main focus of the video?

    -The video focuses on explaining the concepts of pooling and convolutional networks, specifically discussing different pooling methods such as max pooling, average pooling, and global pooling, as well as the use of padding in convolutional neural networks.

  • Why is pooling often used after convolutional layers in CNNs?

    -Pooling is used after convolutional layers to downscale the image while retaining the most important features. It helps to reduce the complexity of the network, decrease the amount of parameters, and make the network more robust to variations in the input data.

  • What is the purpose of max pooling?

    -Max pooling is used to extract the most important pixels or features from a given region in the image. It selects the maximum value within the pooling window and is useful for keeping only the most significant information while reducing the spatial dimensions of the representation.

  • How does average pooling differ from max pooling?

    -Average pooling calculates the average value of all pixels within the pooling window, which helps to smooth out the information in the image. It is used when one wants to retain a more comprehensive representation of the image rather than just the most significant features.

  • What is global pooling and when might it be used?

    -Global pooling is a type of pooling that considers the entire image at once. It is used to reduce the spatial dimensions to a single vector, which can be useful when the fully connected layers are not needed, or when a single feature is to be extracted from the entire image.

  • Why is padding used in max pooling layers?

    -Padding is used to maintain the dimensions of the image after pooling. It adds zeros around the image, which allows the pooling operation to be performed at the edges of the image without losing spatial resolution.

  • What is the effect of using strides in max pooling?

    -Strides determine how many pixels the filter moves over the image for each iteration. A larger stride value means the pooling operation will cover more pixels in each step, which can reduce the spatial dimensions of the output image more quickly.

  • How does the choice between 'same' and 'valid' padding affect the output dimensions of the image?

    -Using 'same' padding keeps the output dimensions the same as the input dimensions after pooling, while 'valid' padding does not add any padding and thus reduces the output dimensions based on the pool size and stride.

  • What is the role of a fully connected layer in a CNN?

    -A fully connected layer is typically the last layer in a CNN, where the high-level features extracted by the convolutional and pooling layers are used to make predictions or classifications. It takes the flattened output from the previous layers and applies a dense network structure for the final task.

  • How does the complexity of CNNs differ from that of a standard ANN?

    -CNNs are more complex than standard ANNs due to the additional layers and the presence of a large number of trainable parameters, such as the various filters used in convolutional layers, which the network learns during training.

  • What are some applications of CNNs mentioned in the video?

    -The video mentions that CNNs are used in applications like autonomous driving by companies like Tesla, where they are utilized for tasks such as recognizing objects and making decisions based on visual input.

  • How can one join the community for discussing neural networks and deep learning?

    -The video encourages viewers to join a Discord server, which is linked in the video description, to discuss topics related to neural networks, deep learning, computer vision, and to seek help or inspiration for their projects.

Outlines

00:00

πŸ“š Introduction to Pooling in Convolutional Neural Networks

The video begins with an introduction to pooling layers in convolutional neural networks (CNNs). It discusses the importance of pooling after convolutional layers to reduce dimensionality and focus on the most important features. The presenter also mentions joining a Discord server for community discussions and support. A recap of the previous video on CNNs is provided, highlighting the use of convolution and pooling to extract features from images. Different pooling methods, such as max pooling and average pooling, are introduced, along with the concept of padding to maintain image dimensions during pooling.

05:01

πŸ” The Role of Max Pooling in CNNs

This paragraph delves into the specifics of max pooling, explaining how it extracts the most important pixels from a given area in an image. The use of max pooling in downscaling images while retaining significant information is emphasized. It also contrasts max pooling with average pooling, which smoothens the information and is less commonly used. The paragraph includes an example of applying max pooling to a 5x5 image, resulting in a downsampled 3x3 image that retains the most important features. The potential issue of applying max pooling too many times is discussed, along with the solution of using padding to maintain image dimensions.

10:02

πŸ–ΌοΈ Padding in Pooling Layers to Preserve Image Dimensions

Padding is introduced as a technique to apply zeros around an image before max pooling, which helps preserve the original dimensions of the image after pooling. The difference between padding's effect on max pooling (where zeros do not affect the result) and average pooling (where zeros can lower the average value) is explained. The paragraph also discusses when to use padding and when not to use it, depending on the desired outcome. An example of applying padding to a 5x5 image and then max pooling is provided, resulting in an image that retains its original dimensions but with the most significant values extracted.

15:03

πŸ› οΈ Specifying Pooling Parameters in Neural Network Models

The technical aspects of specifying pooling parameters in neural network models are covered in this paragraph. It discusses the different classes for pooling, such as average pooling, max pooling, and global pooling, and their application in 1D, 2D, and 3D contexts. The paragraph explains how to import the necessary modules in TensorFlow and Keras for implementing these layers. It also details how to specify the pool size, strides, and padding within the max pooling layer. An example using Keras to create a sequential model with convolutional and max pooling layers is provided, including how to adjust the stride and padding to control the dimensionality of the image as it passes through the network.

20:04

πŸ”§ Tuning Neural Networks with Max Pooling and Padding

The final paragraph focuses on the practical application of max pooling and padding in tuning neural networks. It explains how these techniques can be used to reduce the complexity of the network, improve training and prediction speeds, and enhance the network's ability to learn important features from images. The importance of understanding how max pooling and padding affect the network is emphasized, as it allows for better optimization and feature extraction. The video concludes with a reminder to subscribe and enable notifications for more content, and a teaser for upcoming tutorials on computer vision and deep learning.

🎢 Closing with Music

The video script ends with a short musical interlude, indicated by the [Music] tag. This suggests a pause or transition to the end of the video content.

Mindmap

Keywords

πŸ’‘Pooling

Pooling is a technique used in convolutional neural networks (CNNs) to reduce the spatial dimensions of the feature maps generated by convolutional layers. It simplifies the network by reducing the number of parameters, which can lead to a more efficient and less complex model. In the video, pooling is discussed as a method applied after convolutional layers to downscale images and extract the most important features, using operations like max or average pooling.

πŸ’‘Convolutional Networks

Convolutional networks, a type of deep learning model, are designed to process data with grid-like topology, such as images. They consist of a series of convolutional layers that apply filters to input data to extract features. The video focuses on how convolutional networks utilize pooling layers to further process the feature maps for tasks like image classification.

πŸ’‘Padding

Padding in the context of CNNs refers to the technique of adding rows and columns of zeros, or 'padding', around the input image before applying convolutional filters. This can help preserve the dimensions of the image after pooling, maintaining the resolution needed for subsequent layers to extract more features. The video explains that padding can be specified in the max pooling layer to control the dimensionality of the output.

πŸ’‘Max Pooling

Max pooling is a type of pooling operation that selects the maximum value from a window of input values. It is used to reduce the spatial dimensions of the input volume for the next layer in a CNN. In the video, max pooling is discussed as a common method to downscale images while retaining the most important information, such as the highest values within a region.

πŸ’‘Average Pooling

Average pooling is another pooling operation that calculates the average value of a window of input values. It is used to smooth out the information in the feature maps, which can be useful when the exact location of a feature is not as important as its presence. The video contrasts average pooling with max pooling, noting that it can be used when a more generalized feature representation is desired.

πŸ’‘Global Pooling

Global pooling is a pooling operation that considers the entire feature map as a single window, collapsing it into a single vector element. This is typically used in the final layers of a CNN where the goal is to aggregate information across the entire feature map. The video mentions global pooling as an alternative to fully connected layers for certain applications.

πŸ’‘Strides

Strides in CNNs determine how the convolutional filter or pooling window moves across the input image. A stride of one means the window moves one pixel at a time, while a larger stride value means the window moves more pixels at a time, which can reduce the output dimensions. The video discusses how specifying strides can affect the dimensionality of the pooled feature maps.

πŸ’‘Fully Connected Layer

A fully connected layer in a neural network is a layer where each neuron is connected to every neuron in the subsequent layer. In the context of CNNs, it is usually placed after the convolutional and pooling layers to perform classification based on the extracted features. The video explains that after downscaling and feature extraction, a fully connected layer can classify the input, such as identifying whether an image contains a cat, dog, or car.

πŸ’‘Feature Extraction

Feature extraction is the process of automatically identifying and extracting relevant features from input data, such as images. In CNNs, this is typically done through convolutional layers that apply filters to detect patterns or structures. The video emphasizes the importance of feature extraction in CNNs, where pooling layers like max pooling help to refine the most important features for the task at hand.

πŸ’‘Convolution

Convolution is a mathematical operation that is widely used in image processing and computer vision. In the context of CNNs, it involves the application of a convolutional filter to the input image to detect specific features. The video script explains that convolution is the process by which CNNs apply filters to images, which is a precursor to pooling operations.

πŸ’‘Sequential Model

A sequential model in deep learning frameworks like Keras is a linear stack of layers that creates a neural network. It is called 'sequential' because the layers are added one by one in a sequence. The video demonstrates how to create a sequential model with convolutional and max pooling layers, highlighting how the model's layers are specified and how they interact to process input data.

Highlights

Introduction to pooling and convolutional networks, discussing the importance of pooling layers in CNNs.

Explanation of padding in convolutional layers and its role in maintaining the dimensions of the feature maps.

Differentiation between max pooling and average pooling, and their respective advantages in feature extraction.

The concept of global pooling as an alternative to fully connected layers for certain applications.

Demonstration of how max pooling can downscale images while retaining important features.

The use of padding to maintain the dimensionality of an image after applying max pooling.

How strides in max pooling affect the dimensionality of the output and the concept of striding.

Practical example of applying max pooling to an image and the resulting effect on image resolution.

Importance of choosing between max pooling and average pooling based on the specific needs of a project.

The impact of applying max pooling multiple times on the resolution and utility of the image data.

Use of padding to preserve image dimensions when using max pooling in consecutive layers.

Explanation of how to specify pooling parameters such as pool size and strides in a neural network model.

Live coding demonstration of implementing max pooling and padding in a Keras sequential model.

How to adjust the model to reduce image dimensions using different stride values and padding options.

The significance of reducing image complexity and dimensions for improving training and prediction speed in CNNs.

Overview of the different types of pooling and their utility in tuning a neural network for optimal performance.

Encouragement for viewers to subscribe and stay updated with future content on computer vision and deep learning.

Transcripts

play00:00

hey guys welcome to new video in this

play00:01

new networks and deep learning tutorial

play00:02

in this video here we're going to talk

play00:04

about pooling and convolutional networks

play00:06

and then we're talking about pooling

play00:07

income election needle networks we're

play00:08

also going to cover uh padding and why

play00:11

we're adding padding and what padding is

play00:12

when we actually like doing pooling

play00:14

because in convolutional needle networks

play00:15

it is often like we often use pulling

play00:18

after convolutional layers so we'll have

play00:19

these layers here in our convolutional

play00:21

neural networks and then the

play00:22

convolutional layers will be followed up

play00:24

by a pulling layer

play00:26

or we can do some different kind of

play00:27

operations inside of that pooling layer

play00:29

depending on our application or like the

play00:31

purpose of our new network but first of

play00:33

all remember to join the discord server

play00:34

i'll link it down in the description

play00:35

here so you can join the community where

play00:37

we talk about some different kind of

play00:38

stuff within naval networks deep

play00:40

learning computer vision or and also if

play00:42

you have some problems with some of your

play00:43

projects yeah you can go ask them in

play00:45

there and chat with other people or if

play00:47

you just want some inspiration for your

play00:48

own projects and stuff like that so make

play00:50

sure to join the discord server and come

play00:51

chat with us

play00:53

so let's jump into the first slide here

play00:54

where we're going to first of all we're

play00:55

going to have a short recap of the last

play00:57

video that we made which is was like

play00:59

kind of an introduction to convolutional

play01:01

neural networks so this is what we're

play01:02

going to focus on like uh from from now

play01:05

on in this tutorial here like previously

play01:07

we mainly focused on

play01:08

artificial neural networks where we

play01:10

talked about the different kind of

play01:11

layers and neurons and also like some

play01:13

different kind of parameters that we

play01:14

that we can tune on and how we can

play01:15

actually like fine-tune and

play01:18

fine-tune our neural network and do

play01:19

transforming and stuff like that but in

play01:21

the last one we talked about like

play01:22

convolution like what convolution is

play01:24

like how we apply convolution on on the

play01:26

images that we pass through our neural

play01:28

network by applying some filters that

play01:29

we're convolving with the image and then

play01:31

we're gonna do some different kind of

play01:32

operations and then we're gonna like

play01:34

train on neural networks to extract some

play01:35

different kind of features um in the

play01:37

different layers that we have in our

play01:39

convolutional label neural network

play01:41

and then we talked about like after we

play01:43

have this convolutional layer here we

play01:44

actually like often use this pulling

play01:46

layer followed up

play01:47

after the convolution layer here and

play01:49

this is what we're going to focus on in

play01:50

this video here we're going to talk

play01:51

about some different kind of methods

play01:53

that we can use for pooling and then we

play01:54

can actually like um also add something

play01:57

called padding to our pulling layers so

play01:58

we get some different kind of features

play02:00

uh which can be helpful and useful in

play02:02

some different kind of situations and

play02:04

then at the end here of the

play02:05

convolutional neural network here we can

play02:06

actually like have a fully connected

play02:07

layer like or an artificial needle

play02:09

network where we can actually like pass

play02:11

it past like the images here that we

play02:12

flattened from because from the like the

play02:14

feature extracting region here and then

play02:16

we can actually like do some

play02:17

classification here at the end where we

play02:18

can get out like if this is a cat dog

play02:21

car or something like that or we get a

play02:23

probability of some specific situation

play02:25

that we're in and then we also talked

play02:27

about some of the applications of

play02:28

convolutional neural networks like how

play02:30

tesla and weymour uses like

play02:31

convolutional neural networks and deep

play02:32

learning in general to to like for

play02:34

example saw salt full cell driving um

play02:36

vehicles and then we also talked about

play02:38

the trainable parameters that we have in

play02:40

convolution neural networks because

play02:41

convolution neural networks is is a lot

play02:43

more complex than just like a normal

play02:45

artificial neural network because we

play02:47

have more trainable parameters we have a

play02:49

lot of different kind of like filters

play02:50

and stuff like that that we can tune on

play02:52

and the neural network like it tried to

play02:54

find the best parameters for those while

play02:56

training

play02:57

so we went over that in the previous

play02:58

videos so make sure to check that out

play03:00

before you like continue with this video

play03:01

here as well if you're not familiar with

play03:03

convolution neural networks and what

play03:05

they are

play03:06

but i'll jump to the first slide here in

play03:07

this video here where we're going to

play03:08

talk about like what pulling is in

play03:10

convolutional layers and also like uh

play03:12

why we use it and why it can be useful

play03:14

and like all those just like how we can

play03:16

do it so pulling is actually like when

play03:20

we're actually going to apply some some

play03:22

filter or like some operations on top of

play03:24

our image so often we will extract some

play03:26

features in our convolutional layer and

play03:28

then we follow it up by a pulling layer

play03:30

and up the idea behind the pulling layer

play03:32

is here that we're actually looking

play03:34

at a region here in our image so in this

play03:36

case here we specify region which is

play03:38

three by three so we'll look at this

play03:39

region here with the nine pixels or like

play03:42

the nine elements here in this array and

play03:44

this could represent that an image for

play03:45

example

play03:46

and we can also do this on a lot of

play03:47

other different kind of like just like

play03:49

matrices and stuff like that so we're

play03:51

looking at this three by three right

play03:52

here or like this region here in the

play03:54

image

play03:55

and then we can actually like apply some

play03:56

different kind of methods um or like

play03:58

pooling methods on this region here that

play04:00

we're looking at so we have these

play04:02

different types of pulling here so we

play04:04

both have like have max pooling we have

play04:05

average pooling and then we have some

play04:07

global pooling but in the case we're

play04:09

looking at a three by three area here in

play04:11

our in our image or in our like matrix

play04:13

then we're actually like then we can

play04:15

actually like apply here mag pooling or

play04:17

average pooling on this local area so

play04:19

the idea behind max bowling is that we

play04:20

look at all the elements here in this

play04:22

area and then we just pick uh pick the

play04:24

element with the highest values in this

play04:26

case here if we apply max pooling in

play04:28

this area here or this region here uh

play04:31

the resulting value here would i feel

play04:33

like b5 and then we just slide this um

play04:35

slide this like kernel or like a kind of

play04:37

like filler through our whole image here

play04:40

and then we just move it like pixel by

play04:42

pixel or like

play04:43

like element by element and then we just

play04:45

take the max value from from that area

play04:48

um going through the whole image or the

play04:50

whole matrix over here

play04:52

but we can also like apply something

play04:54

called average pooling where we just

play04:56

like sum all the values here and then we

play04:57

take the average of all the values here

play05:00

so we have some different kind of

play05:01

advantages and disadvantages for both of

play05:03

them but we often use like max pooling

play05:05

after we have this convolutional

play05:07

convolutional layer because we then we

play05:08

only extract the most important uh

play05:11

features and the most important like um

play05:14

pixels in that area that we're looking

play05:15

at but we can also like use average

play05:17

pooling which will just like smoothen

play05:19

out like the information or like the

play05:20

things we want to extract in our image

play05:22

so let's say we have like the a digit

play05:24

image like for example in the mnist data

play05:26

set where we have like the handwritten

play05:28

uh or handwritten

play05:30

digits so if we just want to look like

play05:32

for example add a one here like the

play05:34

digit one then we can actually just use

play05:36

max pooling because then we can that

play05:37

then we can like sort of like downscale

play05:39

our image and then we only want like the

play05:41

most important information which is just

play05:43

like a straight line down um with high

play05:45

values in our image so we can actually

play05:47

like just use max pooling and get the

play05:48

maximum values and then we'll end up

play05:50

with like a really uh low scale or like

play05:53

a really

play05:54

an image with a low resolution and like

play05:56

a and like a really low scale so we can

play05:58

actually like only extract the most

play06:00

important information and then we don't

play06:02

need to like store a lot of different

play06:03

kind of values and stuff like that which

play06:05

has some very nice advantages when we're

play06:07

operating with convolutional neural

play06:08

networks uh when they get into more like

play06:11

complex applications and projects but we

play06:13

can also use something called a global

play06:14

pooling which is kind of the same we can

play06:16

both use like average pooling or max

play06:18

pooling

play06:18

in global pooling as well but in global

play06:20

pooling we're actually looking at the

play06:22

whole image here so we take the whole

play06:23

image and then we just apply like max

play06:25

global pooling for example and we'll

play06:27

just take uh the highest value in the

play06:30

whole image here or in like in in the

play06:32

array or like um in the matrix that

play06:34

we're looking at so we can use for

play06:36

example instead of having the fully

play06:38

fully connected layer at the end where

play06:39

we just have like an output nearing then

play06:41

we can actually just like apply global

play06:43

pooling uh to the image at the end layer

play06:45

and then we just get a one value out if

play06:48

if we want to get that from our

play06:50

application or project

play06:51

so the most used type of pooling and why

play06:53

it's used and like how it affects the

play06:55

convolution neon networks is probably

play06:56

like max pooling so we often as i

play06:58

already said we use uh convolutional

play07:00

layer and then it is followed up by max

play07:02

only layers where we just look at a

play07:04

region and then we go through the whole

play07:06

and go and then go through the whole

play07:07

image

play07:08

element by element and then we just take

play07:10

the max element from the area that we're

play07:12

looking at

play07:13

so this is very often used because we

play07:15

only want to extract the most important

play07:17

information um when we're going layer by

play07:19

layer in our convolutional layer like a

play07:21

neural network but if you want it in

play07:22

your application you don't only want to

play07:24

like extract the most important

play07:26

information and you still want to keep

play07:27

like like for example some of the

play07:28

background or like some of the pixels

play07:30

around the most important things then

play07:32

you can actually like use average

play07:33

pooling instead of max pooling but it

play07:35

really depends on your application and

play07:36

your project that you're doing so now

play07:38

we're going to see an example of like

play07:40

how we can use max pooling on an image

play07:41

for example so we have this image here

play07:44

to the left where we have like this

play07:45

matrix here where we're represented in

play07:47

pixels here or like values for each of

play07:49

the pixels and then we want to apply max

play07:51

pulling which is a three by three in

play07:53

this case here so we look at this three

play07:54

by three area or region here on the

play07:56

image and then we just take the max

play07:58

value from this from distribution here

play08:00

which is in this case five here and then

play08:02

we apply it to the first element up here

play08:04

because this is like the first region

play08:05

that we're looking at in the image from

play08:07

top uh top left corner here and then we

play08:09

just go element by element and do the

play08:11

exact same thing here where we're

play08:12

looking at at the region and finding the

play08:14

max value so we can see when we're

play08:16

applying max pooling on our image here

play08:18

we actually like downscale our image so

play08:20

beforehand we have this five by five

play08:22

image here and then when you apply max

play08:24

pulling this three by three here then we

play08:26

actually like downscale our image here

play08:27

to o2 so we only have like a three by

play08:29

three image so we actually like

play08:32

downscaler image so we don't have that

play08:33

much complexity and then we extract the

play08:35

most important features uh from our

play08:37

image by only taking the max values and

play08:39

then we downscale it to like compress

play08:41

our image and only get the most

play08:43

important information

play08:45

but if we for example like extract it

play08:47

too many times or like apply max

play08:48

pointing too many times then our

play08:50

dimensions or like the scale or like our

play08:53

resolution of the image here can

play08:54

actually like be too low to actually

play08:56

like be able to extract more useful

play08:58

information then we can apply something

play09:00

called padding to our max pooling layer

play09:02

or like max pooling when we're doing

play09:04

that on our image but i'll talk about

play09:06

that in the next slide here and how we

play09:07

can actually like still keep the

play09:09

dimension off our image even though

play09:11

we're applying this max pool in here

play09:13

with varying um with varying like area

play09:16

that we're looking at so we just can

play09:17

continue doing the same thing down here

play09:19

with the max pulling here then we can

play09:21

see that when we act like striding and

play09:23

we can actually like specify like how

play09:24

many strides we want to take for each

play09:26

like time we iterate through our image

play09:28

here so in case we just tried one here

play09:30

we would actually like just strike the

play09:31

image or like the filter here that we're

play09:33

applying by one so we just take this

play09:35

region here instead and then we look at

play09:37

the max value inside of this region here

play09:39

and then we extract that and put it at

play09:41

the next element up here and then it

play09:43

will be a six and then at the next

play09:44

iteration we take

play09:46

this

play09:47

area here take the max value and then

play09:49

when we've done that we slide one down

play09:51

here and then we start from the start

play09:52

again and then we take this area here

play09:54

take the max value and then we just keep

play09:55

doing that until we have we have been

play09:57

over like the whole image and we have

play09:59

done like applied the whole like max

play10:01

pooling layer from our image and then we

play10:03

have actually like downscale image or

play10:05

only three by three with like the most

play10:07

um with the most significant values or

play10:09

the most important information in our

play10:11

image

play10:13

so we're doing max pooling we actually

play10:14

like can apply something called padding

play10:16

and what what what is meant with padding

play10:18

is that we just apply like zeros around

play10:20

around our our image here so we just

play10:22

apply like um a row here and a column of

play10:25

servers here around our image so this

play10:27

means that when we're actually like

play10:28

applying max pooling we're actually like

play10:30

looking look we'll actually look at this

play10:32

area here instead of only this area down

play10:34

here we'll just take the image and then

play10:36

we actually like just look at the max

play10:37

values here so actually like when we're

play10:39

using max bullet it's good to use uh

play10:42

padding because we're just taking the

play10:43

maximum values of these zeros here they

play10:45

won't affect the image but if we're

play10:47

using average pooling for example

play10:49

when we take the average of these values

play10:51

here it will actually like affect um

play10:53

effective result when we're taking the

play10:55

average with this padding here because

play10:56

we can see that we had like

play10:58

five series in this case here and when

play11:00

we take the average of the whole region

play11:01

here we'd like to like get a lower

play11:03

number compared to if we took the

play11:04

average of this region here um instead

play11:06

so we're actually like smoothing out our

play11:08

image more by taking the average pooling

play11:10

approach but in this case here we just

play11:12

take the max value here and we can see

play11:13

that

play11:14

we have this image here where we have

play11:15

like a five by five image and then we

play11:17

apply padding where we just uh like add

play11:20

the zeros here around our image and then

play11:22

when we apply max polling we actually

play11:24

like keep the same dimensions at as the

play11:26

original image so we extract the most

play11:28

important features from our image but we

play11:30

still keep the same dimension that we

play11:32

can then pass further in in the new

play11:34

network and then we can extract more

play11:36

important features um in the next

play11:38

upcoming layers and then i'm gonna have

play11:39

to like apply max pooling again but at

play11:42

the start we actually like often in the

play11:44

first couple of convolution layers um in

play11:46

the nil network we apply like max

play11:48

pooling without the padding here because

play11:50

we only want to like we kind of like

play11:51

won't just downscale our our image so we

play11:53

don't have that much complexity and

play11:55

we're only extracting most important

play11:56

features but we're going down to a

play11:58

really low resolution like we often like

play12:00

want to keep uh dimensions so we have

play12:02

like a really good dimension and we then

play12:04

we extract more and more features uh

play12:06

going through the layers in our

play12:08

convolution neural network

play12:11

so now we're going to talk about like

play12:12

pooling and patterning carriers and how

play12:14

we can specify it and then later here

play12:15

and after that we're going into like uh

play12:17

google collab and we and then we're

play12:19

going to see like how we can specify

play12:20

this here in an actual neural network in

play12:22

code but first of all here we have this

play12:24

mac pulling a class here so we have two

play12:26

different kind of classes that are

play12:27

window already like we both have average

play12:28

pooling max pooling and global pooling

play12:30

and we both have in like in 1d 2d and 3d

play12:33

but in this case here we're looking at

play12:35

an at images so we have this 2d max

play12:37

polling that we're going to apply then

play12:39

we can actually just extract it from

play12:40

this tensorflow carriers layers module

play12:43

here where we can then get the max

play12:45

pooling layer uh layer here and then we

play12:47

need to specify the pool size here which

play12:49

is like the kind of like a pool or like

play12:51

fill the size that we want to apply so

play12:52

in this case here is a two by two but

play12:54

the examples we went over here in the

play12:56

slides is actually like three by three

play12:58

and then we can also specify the strides

play13:00

so the stretch here is is like

play13:02

insecurity so it's a tubal of two

play13:03

integers or none and it's the stride

play13:05

values which specifies how far the

play13:07

pooling windows move for each pulling

play13:09

step or like for each iteration and if

play13:11

it's none it will just default to pool

play13:13

size so in the case of strides here we

play13:14

can actually like specify how many

play13:16

strikes per iteration that needs to take

play13:18

so the examples we went over here in the

play13:20

slides were actually like took strides

play13:22

of only like the size of one so we just

play13:24

went uh element by element but we can

play13:26

actually like specify it it needs to

play13:28

like skip and skip a row or a column

play13:31

when we're actually like striding

play13:32

through

play13:33

our max pooling layer or if we specify

play13:35

there's a non here with the default one

play13:37

it will just make the same strides as

play13:39

the pulling uh the pool size here so in

play13:41

the case here we just specify none and

play13:43

we have a pool size of two by two here

play13:45

we like to like make two strides um two

play13:48

strides in our in our in our max pooling

play13:50

layer and that will actually like uh

play13:52

give half the dimensions of the image

play13:54

when we're using strides equal to two um

play13:56

in this case here and if we were using

play13:58

strides equal to one and we were adding

play14:00

padding

play14:01

to our max pooling layer then we'll

play14:03

actually keep the dimensions which we're

play14:04

going to see in code as well and then we

play14:06

can specify here if you want to use

play14:08

padding or we don't want to use padding

play14:10

in our max pooling layer so we're now

play14:12

jumping to google up here and i'm going

play14:13

to show you how we can actually like do

play14:15

this max pooling and pattern that i've

play14:16

just showed you in the slides and how we

play14:18

can actually apply it on a sequential

play14:19

model and convolutional neural network

play14:21

here in keras so first of all here we're

play14:23

just going to import tensorflow with

play14:24

carers and different kind of like

play14:26

activation uh the flattened layer

play14:28

convolutional layer and the max pooling

play14:29

layer that we're going to apply here in

play14:31

this video here so i'm just going to run

play14:33

this blur code here and it will import

play14:35

the different kind of stuff that we need

play14:36

and if you want to use average pooling

play14:38

for example a global max pooling and

play14:39

stuff like that you need to specify it

play14:41

up here in the import here so in this

play14:42

case we're just importing the max

play14:44

pooling two dimensional layer here then

play14:47

we can go down here to our second engine

play14:48

model here and see like what's going on

play14:50

so first of all we have these

play14:51

convolutional layers here in our

play14:53

sequential model and each of the

play14:55

convolutional layers here are follow up

play14:56

by a max pooling layer and then inside

play14:58

the max pooling layer here as we saw

play15:00

from from the carriage documentation we

play15:02

need to specify the pool size

play15:04

and like the region that we're going to

play15:06

look at in our image and then we're

play15:07

going to specify this criteria that

play15:09

we're taking for each iteration that

play15:11

we're going through our image that we're

play15:12

passing through the neural network and

play15:14

then at the end here we specified that

play15:16

we want to use padding in this max

play15:17

pooling layer here and if we don't want

play15:19

to have padding we just specified as

play15:21

valid instead of the same here

play15:24

so we'll do this here for each of the

play15:26

convolutional layers that we have so we

play15:27

apply max pulling off the convolutional

play15:29

layer and we just have like the same

play15:30

parameters here so we have the strides

play15:32

equal to one and like the padding which

play15:34

is just the same here and we also add

play15:36

padding to our actual like convolutional

play15:38

layer so if we run this model here it

play15:40

will just like create this signature

play15:41

model here with convolutional and max

play15:43

pooling layers and then at the end we

play15:45

flatten the output and then we pass it

play15:47

to to like a dense or like the fully

play15:49

connected layer here at the end so we

play15:51

can do some classification problem but

play15:53

if you take the summary here of the

play15:54

model so we can see like what's actually

play15:56

going on then we can see we start with

play15:58

this 224 by 224 image and then we have

play16:01

the like the number of fillers here at

play16:02

the end but we can see that we actually

play16:04

keep the dimensions of our image when

play16:06

we're passing it through our whole new

play16:08

network here and this is because we're

play16:10

using this padding equal to the same

play16:12

here and then we also specify the

play16:14

strikes here so we're going like one by

play16:16

one for each iteration uh so this will

play16:18

keep the dimensions of our image when we

play16:20

pass it through our neural network here

play16:22

which consists of different kind of

play16:24

convolutional layers and also different

play16:25

kind of uh max pooling layers

play16:27

but if we actually want to reduce our

play16:30

dimension of the images and we actually

play16:31

actually often want to use to do that um

play16:34

in the start of our convolution neil

play16:35

network then we can actually like just

play16:37

specify here if we want that we don't

play16:39

want to like for example uh we don't

play16:41

want to have um have padding so we just

play16:43

specify this valid keyboard here and

play16:45

then we can also like specify for

play16:47

example we want to take two strides

play16:48

instead of

play16:49

instead of taking only one stride one we

play16:51

are going to apply this max pulling like

play16:53

pull here that we stride over our image

play16:55

that we get that we get from this

play16:56

convolutional layer here

play16:58

and but in this case here if we're going

play17:00

to like apply this strides equal to two

play17:03

here is it is the same as the non

play17:04

keyboard here uh because if if we

play17:06

specify none here it will just be like

play17:08

this default parameter and it will just

play17:10

take the same number of strides as we're

play17:12

specified here um in the pool size so in

play17:14

this case here we're going to try to run

play17:16

this sequential model here and create a

play17:17

new synchronizer model where in our max

play17:19

pooling right here the first max cooling

play17:21

right here we're going to take two

play17:23

strides instead of only one drive and

play17:24

then we're not going to apply padding in

play17:26

this case here which will reduce the

play17:28

dimensions of the image here so if we go

play17:30

down here and run a new model here and

play17:32

take a new summary of the new model that

play17:33

we've created uh where we're now taking

play17:35

like the the number of strides here is

play17:37

is two in the first max pool and layer

play17:39

and we're not using pal uh padding in

play17:41

this case here so we actually like

play17:42

halfing the dimension so instead of we

play17:44

have this 200 by 224 by 224 image we

play17:48

only have 112 by 112

play17:51

image in the dimensions and then we just

play17:53

passed this through here in the rest of

play17:55

the layers in our neil network because

play17:57

we're using strides equal to one and

play17:59

we're adding adding padding for the

play18:01

other different kind of match pulling

play18:02

layers here that were specified but if

play18:04

you want to reduce the dimensions of the

play18:06

images uh even more then we can actually

play18:08

like just specify the strides equal to

play18:10

some other number in the next max

play18:12

pooling layer or if we don't want to add

play18:14

padding we just set it up here and then

play18:16

it will reduce the dimensions uh further

play18:19

if we still if we still don't only have

play18:21

the most important information that we

play18:23

want to extract from the images whenever

play18:24

we pass it through our new network here

play18:27

so this is how we can actually use max

play18:28

pulling and padding in in our sequencing

play18:31

model here in keras and it can be used

play18:33

for a lot of different kind things and

play18:34

it and it's actually like good for

play18:36

tuning our neural network and reducing

play18:38

some of the complexity in a new network

play18:40

because often we don't want to pass our

play18:42

new net like our image with this uh

play18:44

which is like fully uh dimensions here

play18:46

which is 224 by 224. we actually like

play18:49

want to reduce uh the dimensions and the

play18:51

complexity of our image and only extract

play18:54

the most important information so our

play18:55

convolutional needle network can learn

play18:57

the most important features in our image

play18:59

and then do predictions on that and then

play19:02

also reducing like like

play19:03

like the training speed and the

play19:05

prediction speed and just in general the

play19:07

complexity of our neural network so this

play19:09

is what we can do with max pooling and

play19:11

padding so this is why it's so important

play19:12

to know like how it affects the neural

play19:14

network how we can specify why we want

play19:17

to use max pooling when we want to use

play19:19

average pooling and when we use want to

play19:20

use like padding or no no padding so

play19:23

that's pretty much it for this video

play19:24

here guys we've been over like the

play19:26

different kind of stuff in convolutional

play19:27

neural networks with max pooling global

play19:30

pooling average pooling and a lot of

play19:31

different kind of stuff which is really

play19:33

useful when we're creating uh neural

play19:34

networks so thank you guys for watching

play19:36

this video and remember the subscribe

play19:37

button and bell notification here under

play19:39

the video and also like this video here

play19:41

if you like the content and you want

play19:42

more in the future because it just

play19:43

really helps me and the youtube channel

play19:45

out in a massive way and

play19:47

i'm currently doing a computer vision

play19:49

tutorial in opencv in both like cbs plus

play19:51

and python and then we're later on we're

play19:53

going to combine it with deep learning

play19:55

here so we can see like how uh computer

play19:57

vision and deep learning works together

play19:59

and go hand in hand so if you're

play20:01

interested in that tutorial i'll link to

play20:02

it up here or else on just see the next

play20:04

video guys bye for now

play20:06

[Music]

play20:12

[Music]

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Deep LearningConvolutional NetworksPooling TechniquesMax PoolingAverage PoolingGlobal PoolingNeural NetworksMachine LearningImage ProcessingTensorFlowKeras