Gender Classification From Vocal Data (Using 2D CNNs) - Data Every Day #090

Gabriel Atkin
8 Nov 202025:37

Summary

TLDRIn this video, the creator explores gender recognition using vocal features with a dataset. Initially, a traditional two-layer neural network is implemented to predict gender based on acoustic properties. The model achieves impressive accuracy (98%) and AUC scores. For fun, the creator then reshapes the data into 2D images and applies a convolutional neural network (CNN). Although the CNN shows comparable results, the traditional model remains slightly superior. The video highlights different machine learning techniques while experimenting with unconventional methods, ultimately encouraging viewers to explore creative approaches.

Takeaways

  • 🎙️ The dataset used in this project is for gender recognition based on vocal data, where statistics such as vocal ranges are provided.
  • 📊 The dataset includes various features like means, mins, maxes, and ranges, created from recorded voice samples to identify male or female voices based on acoustic properties.
  • 🧠 The model used starts with a traditional neural network architecture, which includes two hidden layers for prediction.
  • 📉 The labels (male and female) were encoded using `LabelEncoder`, mapping 'male' to 1 and 'female' to 0.
  • 🧪 The data is split into training and test sets, and features are standardized using `StandardScaler` to give each column a mean of 0 and unit variance.
  • 💻 A simple dense neural network model was created using TensorFlow, with a 20-feature input layer, 64 neurons in two hidden layers, and a single output neuron using a sigmoid activation function for binary classification.
  • 🏆 The model achieved a high accuracy of 98% and an AUC of 0.99, indicating excellent performance in classifying the voices.
  • 🧪 For experimentation, the data was reshaped into 5x5 matrices and passed through a Convolutional Neural Network (CNN), though this was mainly for curiosity.
  • 📊 Even though the CNN approach was unconventional, it yielded decent results with a slightly lower accuracy (95%) but a higher AUC (0.999), showing potential.
  • 🚀 The creator of the script suggests further exploration of CNNs and other methods to optimize results but recognizes the simple two-layer neural network as the most efficient solution for this dataset.

Q & A

  • What is the purpose of the dataset in the video?

    -The dataset is designed for gender recognition by voice, where acoustic properties of recorded voice samples are analyzed to identify if a voice is male or female.

  • How does the video creator preprocess the labels in the dataset?

    -The video creator uses the `LabelEncoder` from `sklearn` to transform the 'label' column, converting 'male' and 'female' into numerical values (0 for female, 1 for male).

  • What is the main model used for gender prediction in the video?

    -The main model used is a two-hidden-layer neural network implemented with TensorFlow and Keras. It takes vocal features as input and outputs a probability estimate for predicting gender.

  • Why does the video creator scale the data before training the model?

    -The creator scales the data using `StandardScaler` to normalize the features, ensuring they have a mean of 0 and unit variance. This makes it easier for the neural network to learn by putting all features on a similar scale.

  • What results did the creator achieve with the initial neural network model?

    -The initial neural network model achieved 98% accuracy and an AUC (Area Under the Curve) score of 0.99 on the test set.

  • What alternative approach did the video creator try, and why?

    -The creator tried using a Convolutional Neural Network (CNN) by reshaping the vocal data into 2D 'image' matrices. This approach was mostly for experimentation and curiosity, as CNNs are typically used for image data.

  • How did the video creator handle the fact that the vocal feature vectors had only 20 elements when trying to use a CNN?

    -Since 20 is not a perfect square, the creator padded the vectors with zeros to make them 25 elements long, which can then be reshaped into a 5x5 matrix for input into a CNN.

  • What was the result of using the CNN approach on this dataset?

    -The CNN approach yielded slightly lower accuracy (95%) but a higher AUC score (0.998) compared to the simpler neural network, indicating it performed well but did not surpass the simpler model in accuracy.

  • Why does the creator include an early stopping callback during training?

    -Early stopping is used to monitor the validation loss and stop training when the loss stops improving for a few epochs, preventing overfitting and saving the best weights during training.

  • What visualization technique did the creator use to display the newly structured data as 'images'?

    -The creator used `matplotlib` to display the reshaped 5x5 matrices as images. These images represented the vocal data after padding, with zero values displayed as a solid color.

Outlines

00:00

🎤 Introduction to Gender Recognition by Voice Dataset

The video begins with an introduction to a gender recognition task using a dataset that includes vocal statistics. The dataset is composed of acoustic properties of voice samples, which are analyzed to classify them as either male or female. The speaker explains that the goal is to use vocal features, such as means, minimums, and maximums, to predict gender, and outlines the plan to experiment with two neural networks: a traditional multi-layer neural network and a convolutional neural network (CNN).

05:02

🛠️ Preparing the Dataset and Encoding Labels

The second section dives into data preprocessing. The speaker uses Pandas to load the dataset and examines the columns, noting that the data is already numerical. They highlight that the label column needs encoding and use the LabelEncoder from Scikit-learn to convert gender labels (male and female) into binary values (0 for female, 1 for male). Additionally, they confirm that there are no missing values in the dataset and demonstrate how to map encoded labels back to their original categories.

10:03

📊 Scaling and Splitting the Dataset

This part discusses scaling the data to make it easier for the model to learn. A StandardScaler is applied to ensure all features have mean 0 and unit variance. The data is then split into training and test sets using the train_test_split function from Scikit-learn, with 70% of the data used for training and the rest for testing. The speaker also confirms that the feature set consists of 20 features with 3,168 examples in total.

15:08

🧠 Building a Basic Neural Network

In this section, the speaker outlines the process of building a traditional neural network using TensorFlow. The model consists of an input layer (20-dimensional feature vector), two hidden dense layers with 64 neurons each, and an output layer with a single neuron using a sigmoid activation function to predict gender. The speaker compiles the model using the Adam optimizer, binary cross-entropy loss function, and accuracy and AUC as metrics. The model is trained for 100 epochs, with early stopping to prevent overfitting.

20:10

🎯 Evaluating the First Model

After training the neural network, the model is evaluated on the test set, achieving an impressive 98% accuracy and 0.99 AUC. The speaker expresses satisfaction with the performance, noting that further tweaks might slightly improve the model. Despite the excellent results, the speaker decides to experiment with a CNN, acknowledging that the existing model is already highly effective.

25:10

🖼️ Experimenting with 2D CNN by Reshaping Data

Here, the speaker explores the idea of using a 2D Convolutional Neural Network (CNN) by reshaping the 20-dimensional feature vectors into 5x5 matrices (padding the vectors with zeros to create square images). This approach, inspired by a Kaggle competition, allows the model to treat the data as images. After resolving some technical issues with padding and data types, the speaker successfully reshapes the data into 5x5 matrices, ready for use in a CNN.

🏗️ Building the CNN Model

A new CNN model is built using two Conv2D layers, each followed by max-pooling layers. The Conv2D layers use 16 and 32 filters respectively, with kernel sizes of 3 and 2. After flattening the output from the convolutional layers, the speaker applies a dense layer at the end to output the prediction. Despite some initial issues with kernel sizes and pooling, the model is constructed and compiled successfully.

🧪 Evaluating the CNN and Comparing Results

After training the CNN, the speaker evaluates its performance. While the CNN achieves a slightly lower accuracy (95%) compared to the traditional neural network, it shows an impressive AUC score of 0.998. The speaker reflects on how the CNN approach is unconventional but effective, and considers further experiments, such as using rectangular images instead of padding. They acknowledge that the simpler neural network still outperforms but appreciate the exploratory value of the CNN approach.

👋 Wrapping Up and Final Thoughts

In the final section, the speaker summarizes the key takeaways from the video. They express excitement about the results from both models, particularly the excellent performance of the basic neural network. Despite the exploratory nature of the CNN experiment, it also yielded promising results. The speaker encourages viewers to subscribe for more content and thanks them for watching, concluding with a positive farewell.

Mindmap

Keywords

💡Gender recognition by voice

This refers to the task of identifying a person's gender based on vocal characteristics. In the video, the data used consists of acoustic features such as pitch, vocal range, and frequency, which are analyzed to predict whether a voice belongs to a male or female. The project revolves around building models to perform this classification using machine learning techniques.

💡Neural network

A neural network is a machine learning model inspired by the human brain's structure, consisting of layers of interconnected nodes or neurons. The video describes the use of two neural networks to classify gender from vocal data, a traditional neural network with two hidden layers and a convolutional neural network (CNN). These networks learn patterns in the data to make accurate predictions.

💡Convolutional neural network (CNN)

A CNN is a type of deep learning model often used for image recognition, but in this video, it is applied to structured data. The CNN attempts to transform vocal features into a 2D image format to process the data in a unique way. This network uses convolutional layers to extract spatial hierarchies, although it's not expected to outperform the traditional neural network in this context.

💡Label encoding

Label encoding is a preprocessing technique that converts categorical data, like 'male' and 'female', into numerical values. In the video, this process is done using sklearn's LabelEncoder, which maps 'female' to 0 and 'male' to 1, making the data usable for machine learning models.

💡Data scaling

Scaling refers to the process of standardizing the range of independent variables or features. In the video, StandardScaler from sklearn is used to ensure that all features have mean zero and unit variance, helping the neural network learn more effectively by preventing some features from dominating others.

💡Train-test split

Train-test split refers to dividing the dataset into a training set, used to train the model, and a test set, used to evaluate its performance. The video uses the sklearn function 'train_test_split' to create a 70/30 split for training and testing the model, ensuring that the model's generalization performance is measured accurately.

💡Binary cross-entropy

Binary cross-entropy is a loss function commonly used in binary classification tasks. It measures the difference between the predicted probability and the actual label. In the video, this loss function is used to train the neural network to classify gender, as the output is a probability between 0 and 1 (male or female).

💡Early stopping

Early stopping is a technique used to prevent overfitting by halting training when the model's performance on a validation set stops improving. In the video, the early stopping callback monitors validation loss and stops training after three epochs of no improvement, restoring the best model weights for final evaluation.

💡AUC (Area Under the Curve)

AUC refers to the area under the ROC curve, which measures a model's ability to distinguish between classes. A higher AUC score indicates better performance. In the video, the neural network achieves a high AUC of 0.99, showing that it is very effective at differentiating between male and female voices.

💡Max pooling

Max pooling is a down-sampling technique used in CNNs to reduce the spatial dimensions of the input, retaining only the most important features. In the video, max pooling is applied after convolutional layers to reduce the size of the feature maps and help the network focus on the most relevant information.

Highlights

Introduction of a gender recognition dataset based on vocal range data.

The dataset includes vocal statistics such as means, mins, maxes, and ranges derived from recorded voice samples.

Plan to predict gender using neural networks, including a two-layer neural network and a convolutional neural network (CNN).

Initial setup includes loading essential libraries like NumPy, Pandas, Matplotlib, and TensorFlow for model building.

Data preprocessing: Encoding labels (male/female) into binary form (0 or 1) using Scikit-learn’s LabelEncoder.

Scaling features using StandardScaler to ensure all columns have a mean of 0 and unit variance for easier model learning.

Initial model: A simple two-hidden-layer neural network using TensorFlow, achieving 98% accuracy in gender prediction.

The architecture of the initial neural network is explained, with a focus on dense layers and sigmoid activation to predict probabilities.

Introduction of early stopping with TensorFlow’s callback to avoid overfitting by monitoring validation loss.

The first neural network achieves outstanding results: 98% accuracy and an AUC of 0.99, highlighting the efficiency of the model.

For experimentation, the plan is to transform the feature vectors into 2D image-like data to apply CNNs.

Reshaping the 20-feature vector into a 5x5 matrix, using padding to create a square matrix for CNN input.

Building a CNN with two convolutional layers and pooling layers to see how the model performs on the transformed data.

Results of the CNN approach show a 95% accuracy and slightly higher AUC compared to the traditional neural network.

The CNN approach, despite being unconventional for this type of data, shows promising results with near-excellent performance.

Transcripts

play00:02

[Music]

play00:11

hi guys

play00:11

welcome back to data everyday um today

play00:14

we're looking at

play00:15

a gender recognition by voice data set

play00:19

i mean this is really the task the data

play00:21

set is

play00:23

um they're records from different people

play00:27

and these are statistics about

play00:30

vocal ranges and

play00:33

it's basically vocal data you can see we

play00:36

have

play00:37

a lot of uh like means mins max's

play00:42

ranges it says here this database was

play00:45

created to identify a voice as male or

play00:47

female based on upon acoustic properties

play00:50

of the voice and speech so it actually

play00:53

comes from

play00:54

uh recorded voice samples which were

play00:57

then analyzed

play00:58

and built create these features were

play01:01

created from them

play01:03

so let's get into the notebook uh what

play01:04

i'm going to try to do is like

play01:07

um as as it would like us to do

play01:10

try to predict the gender of a given

play01:12

person based on

play01:14

these vocal features and we're going to

play01:17

use

play01:17

i wrote two different neural networks um

play01:21

one a cnn and this is mainly just for

play01:24

fun

play01:24

uh what we're going to do is uh we're

play01:27

first going to just use a traditional

play01:28

two hidden layer

play01:29

neural network and then we're going to

play01:31

try something else we're going to

play01:32

restructure the data

play01:33

into like two dimensional images

play01:36

and we're going to try to use a

play01:37

convolutional neural network on that

play01:40

alright so let's get started i have

play01:42

numpy pandas and matplotlib

play01:45

the essentials then i have for

play01:48

pre-processing

play01:49

label encoder standard scalar and the

play01:51

train test split function from sklearn

play01:54

and then i have tensorflow which i'm

play01:55

also going to use a number of uh

play01:59

functions from keras module

play02:03

tensorflow all right let's load in the

play02:06

data

play02:06

using pandas.readcsv

play02:11

and we get the file path up here

play02:13

voice.csv

play02:14

just copy that paste it in and take a

play02:17

look

play02:18

and you notice first thing is that we

play02:21

can't see all the columns so i'm going

play02:23

to go into the console

play02:24

and write pandas dot set option

play02:27

max columns

play02:31

none and that will give us all the

play02:34

columns

play02:36

and you can see they're already in

play02:37

numerical form because

play02:39

these features were created from voice

play02:41

samples

play02:44

and we just noticed that the label

play02:48

column needs to be encoded

play02:51

before we go any further we should check

play02:52

if we have any missing values although i

play02:54

doubt we will

play02:56

yeah no missing values 3 3 100

play02:59

entries and you can see there's not no

play03:01

non-nulls

play03:02

sorry no nulls in any of the columns

play03:06

all right let's encode the labels

play03:10

so i'm just going to use sklearns a

play03:12

label encoder for this

play03:15

which uh we just create a new object and

play03:17

then

play03:19

the column we want to encode is called

play03:20

label so data sub

play03:22

label equals label encoder

play03:26

dot fit transform

play03:30

data sub label and that will just

play03:33

change so change it so that uh male and

play03:36

female are assigned zero or one

play03:39

and we'll run that and we can actually

play03:40

take a look at um

play03:44

you can take a look at which values were

play03:46

mapped to which

play03:47

by enumerating label encoder

play03:52

dot classes underscore and when we

play03:56

enumerate it we can then turn it into a

play03:57

dictionary to get a mapping

play03:59

of what went to which and you can see

play04:01

zero went to female one went to mill

play04:03

other way around female into zero male 1

play04:06

to one

play04:07

so you can see if we were to look at

play04:09

data now

play04:11

we have ones and zeros whereas before we

play04:13

had male and female

play04:16

okay let's split and scale the data

play04:21

so we're going to split it into x and y

play04:24

y is what we're trying to predict

play04:25

just so just our label column data sub

play04:28

label

play04:28

and i'll make a deep copy of it and x is

play04:31

going to be everything except

play04:33

label so we're going to drop it from

play04:35

axis one

play04:36

make a copy of that and now we have

play04:39

split our data into two

play04:41

sections y is just a vector x is a

play04:44

matrix

play04:46

and let's create a scalar

play04:50

standard scalar this is

play04:53

a scalar from sk learn that will give

play04:56

each column

play04:57

in x mean 0 and unit variance

play05:01

so all of the columns will take on a

play05:03

similar range of values

play05:05

it makes it easier for our model to

play05:06

learn

play05:08

so x equals scalar dot fit transform

play05:15

x simple as that

play05:18

now if we look at x we no longer have a

play05:20

data frame but you can see

play05:21

the values have been scaled so that they

play05:24

all take around

play05:25

they all have mean zero and most of the

play05:28

values lie

play05:29

in the negative one to one range

play05:32

okay so now we'll split it

play05:36

uh horizontally

play05:40

or vertically i don't know how you said

play05:41

would you want to call it but uh what i

play05:43

mean is let's get a training test set

play05:46

so uh x trade and x test y train y test

play05:51

equals uh train test split x y so this

play05:54

function from escape learn we'll just

play05:56

split our xy into a training test set

play05:59

we'll give a train size of 70

play06:03

why don't i include a random state as

play06:05

well how about uh

play06:07

42

play06:10

all right now we have four different

play06:11

sets for data

play06:13

and we can begin modeling and training

play06:17

so let's take a look at our feature data

play06:20

so

play06:20

we have 20 features and 3168 examples

play06:26

i'm going to start building a tensorflow

play06:28

neural network just the most standard

play06:30

architecture which is start with

play06:35

a dense layer sorry an input

play06:40

and we'll pass in the shape of

play06:44

a single feature sorry single feature

play06:46

vector will be

play06:48

20 a vector of length 20. so i can

play06:52

access that with

play06:52

x dot shape sub 1 and put the combo to

play06:56

indicate a vector

play06:58

then x equals tf.carus.layers.dense

play07:01

there's a dense layer we'll give it 64

play07:04

activations

play07:05

and a relu activation function

play07:08

pass it in inputs and i'm going to copy

play07:10

that and make a second one but pass an

play07:12

x so it's going to go through two hidden

play07:14

layers very standard

play07:16

and then given outputs which will be

play07:19

another dense layer but it will only

play07:21

output one

play07:21

value which will be

play07:25

a probability estimate so sigmoid scales

play07:28

are between zero and one

play07:31

so that we get a probability for how

play07:34

likely a given person is male

play07:39

all right we'll create our model which

play07:40

will be tf.model

play07:43

and we're passing inputs and outputs all

play07:45

right

play07:47

so let's take a look we'll use

play07:48

model.summary to see

play07:50

what our how our shape is changing we

play07:53

start off with a feature

play07:54

vector of length 20.

play07:57

it gets uh it goes

play08:01

to the dense layer the first dense layer

play08:03

which has 64

play08:04

nodes and then those 64 nodes get

play08:06

connected to the next 64 nodes

play08:08

and those final 64 nodes all

play08:11

are there's a linear combination that

play08:13

returns a single value

play08:15

from all 64. and that single value is

play08:19

if it's over 0.5 we'll say it's male and

play08:21

if it's under 0.5 we'll say female

play08:25

so let's uh compile our model

play08:29

so we'll give an optimizer of atom

play08:33

uh for loss we'll give binary cross

play08:35

entropy

play08:38

and metrics how about we include

play08:41

accuracy and auc auc is just uh

play08:45

much better at uh

play08:48

it considers performance within each

play08:51

class rather than just

play08:52

pure how much how well did we do

play08:57

across all examples so we'll give that a

play09:00

name auc

play09:03

all right then i'm going to fit the

play09:05

model and store it store the history of

play09:07

the fit

play09:08

in history it's a model.fit

play09:12

we're fitting on the train set so

play09:14

x-trade and y-train

play09:16

i'll give it a validation split of 20

play09:19

percent

play09:20

a batch size of 32 and we'll train for

play09:23

100 epochs

play09:25

because i choose such a high number of

play09:27

epochs because i'm also going to include

play09:29

a callback function

play09:32

just tf.keras dot callbacks

play09:36

dot early stopping this allows us to

play09:40

monitor a value in this case validation

play09:43

loss

play09:44

and when we notice that the loss stops

play09:46

improving or stops decreasing

play09:49

we will wait for a certain number of

play09:52

epochs

play09:53

say three and if it's still

play09:56

still not decreasing after three epochs

play09:58

we're going to stop the training

play10:00

and restore the weights from the best

play10:03

epoch

play10:04

so restore best weights equals true

play10:08

all right we'll run that and should stop

play10:12

after some number

play10:13

think nine let's see how we did

play10:17

model dot evaluate x test

play10:20

y test so we evaluate on the test set we

play10:24

have an accuracy of 98 percent

play10:26

and an auc of 0.99 so absolutely

play10:28

fantastic

play10:30

um really couldn't hope for a better

play10:33

performance than this

play10:34

perhaps we could actually improve if we

play10:36

just tweak a few things

play10:38

make it perfect but i'm not going to

play10:40

spend too much time this video doing

play10:42

that

play10:42

i would like to actually try a different

play10:44

approach

play10:46

now i do not expect i'll just say this

play10:48

before i do it

play10:49

i do not expect this approach to yield

play10:52

greater performance this is absolutely

play10:54

fantastic i have no reason to change

play10:56

this

play10:57

if i were really caring about getting

play10:58

the best performing model i'd probably

play11:00

keep something simple like this

play11:02

but i want to try to use 2d cnn's just

play11:05

for

play11:05

just for fun just to see if we can do

play11:08

this

play11:09

and what i mean is so a two-dimensional

play11:12

convolutional

play11:14

layer takes in an image essentially

play11:18

or a matrix of pixel

play11:21

data it doesn't have to be pixel data

play11:23

actually it just has to be

play11:24

a two-dimensional matrix and

play11:29

uh if you don't know the math behind

play11:31

convolutional layers you should go

play11:33

check them out very cool basically it

play11:35

just slides this little

play11:36

um it it

play11:40

it takes data from the image

play11:43

from little sections of the image i'm

play11:46

not going to go into it

play11:47

in detail but um basically what we're

play11:50

going to do is we're going to

play11:51

reformat our ex our sequences

play11:54

well here let me show you what i mean

play11:57

here's x right

play11:59

uh if i do it like a date as a data

play12:02

frame

play12:02

let me just take a look better look at

play12:04

it it the

play12:05

x is basically each example

play12:09

is a sequence of values of 20 values

play12:13

right and it's in a one-dimensional

play12:16

vector

play12:17

now we could reshape this vector so that

play12:21

we can stack it into

play12:22

like a square and then use that square

play12:25

as the two-dimensional

play12:27

image that we can feed into our

play12:29

compositional network

play12:31

i got this idea from someone on kaggle

play12:33

was talking about how they

play12:34

they got um some interesting results

play12:37

using this in a competition i can't

play12:40

remember exactly but

play12:42

this is just an idea i had let's see

play12:46

let's see how it goes

play12:48

so how am i going to do this so we have

play12:51

to work with x

play12:52

x is our our feature information but

play12:55

currently it's in this

play12:56

this format of just these long uh

play12:59

vectors

play13:00

so each example is a vector of length

play13:02

20.

play13:04

so what i want to think of is if i

play13:06

wanted to make

play13:07

that into a square uh i'm going to need

play13:10

it to be like

play13:13

for example yes it will be the length of

play13:16

the original vector has to be a perfect

play13:17

square

play13:19

and the next highest perfect square from

play13:21

20 is 25

play13:24

obviously if we need the we need it to

play13:26

be of equal dimension for it to be a

play13:28

square

play13:29

so if i look at the shape of x

play13:33

currently it's of line 20 and what i can

play13:35

do is pad

play13:36

the sequences using

play13:45

tf.keras.preprocessing.sequence.pad

play13:46

sequences

play13:48

and this uh function will take uh

play13:51

let's pass an x and we'll set a max

play13:54

length

play13:54

to 25 and i'm going to say padding

play13:58

equals post

play14:00

what this will do is take all of our um

play14:03

our 20 or our vectors of length 20

play14:07

and add five zeros to end of each one

play14:11

so if i look at the shape of this you

play14:14

can see it's the same but there's

play14:15

five extra values at the end and if i

play14:18

wanted to

play14:19

look at this as a data frame so i'll

play14:22

take shape off the end

play14:24

you can see uh

play14:27

oh very interesting hmm

play14:30

so i didn't know about this actually

play14:32

we're losing some information here

play14:35

i had no idea it it turns them into

play14:38

integers and that we can't have that

play14:40

absolutely can't um so we're gonna have

play14:43

to figure out how

play14:44

to avoid this is there a way let me look

play14:46

this up

play14:47

pad sequences

play14:51

is there a way to keep it from doing

play14:52

that

play14:54

d type oh yeah let's specify d type

play14:58

d type equals numpy.float

play15:08

okay that's good alright we're good to

play15:10

go

play15:11

awesome so maybe i'll get some better

play15:13

performance than than what i had before

play15:14

because i didn't realize that it was uh

play15:16

stripping off the decimal values all

play15:20

right

play15:21

so this is going to be it's exactly the

play15:24

same as before but we have these five

play15:25

extra zeros at the end

play15:27

and the reason for adding those five is

play15:30

because now

play15:31

let's just take this data frame view off

play15:35

so let's make that the new x so now x

play15:38

has these extra zeros at the end

play15:40

we can reshape it now

play15:44

to keep the same number of elements in

play15:46

the first dimension

play15:47

but change the other dimensions to be

play15:49

five by five

play15:51

and if you look at that um let's just

play15:54

get the shape of that

play15:56

it is now instead of uh 3168

play16:00

by 25 the 25 has been restructured into

play16:03

5x5

play16:04

arrays so um

play16:09

all right let's let's see

play16:12

there's one last thing usually um so

play16:15

that's

play16:16

let's make that new x take shape off

play16:20

and then the last thing to do is um

play16:22

usually

play16:23

image data has an extra dimension to

play16:26

represent the number of color channels

play16:28

so right now the shape of x is uh

play16:31

3168 by five by five i want to make a

play16:35

3168 by five

play16:36

by five by one and to do that i can

play16:40

use numpy dot expand dimensions

play16:43

expand dims

play16:46

on x and the axis we want to expand

play16:49

across is 3

play16:50

which will just be the fourth axis there

play16:54

so run that now if we look at the shape

play16:57

we have this extra dimension

play16:59

and we have a nice image format

play17:03

let me just put these same block

play17:07

and we have this uh yeah okay so let's

play17:11

actually take a look at these as images

play17:14

so because they're in an image form we

play17:16

can actually view them

play17:17

as images so

play17:21

let's create a new map plot lib figure

play17:23

give it a fix size

play17:25

i don't know it could be anything to a

play17:26

12 by 12 sounds good

play17:30

and then for i in range nine so i'm just

play17:32

going to display nine of the images

play17:34

we're going to create a new subplot in a

play17:36

three by three grid

play17:38

indexed by i plus one and then

play17:41

we'll use plt dot image show or i am

play17:44

show

play17:46

of x sub i

play17:49

and that will give us the first image of

play17:52

of

play17:53

size five by five by one actually i'm

play17:56

pretty sure

play17:57

we need to squeeze this i don't i think

play18:00

image show doesn't like the extra one

play18:02

so let's do numpy.squeeze

play18:06

just to get rid of that extra one we're

play18:08

still going to use it in the original x

play18:10

but uh

play18:11

for the image show function we want to

play18:12

get rid of it and then i'll just turn

play18:14

off the axis

play18:16

on the side the axis marks

play18:19

all right and then plt.show

play18:23

and these are our new

play18:26

feature images and you'll notice the

play18:29

line across the bottom

play18:30

is always zero so we always have it as a

play18:34

as a solid color because these are

play18:37

our pad zeros um

play18:41

so this is fantastic right

play18:44

uh

play18:48

hmm what does it mean

play18:51

so this is actually just a uh

play18:54

the the brighter the color the higher

play18:57

the value i believe

play18:59

um actually so zero is like the solid

play19:02

color

play19:02

and then darker colors are negative and

play19:04

brighter colors are positive

play19:07

uh and so each one of these squares

play19:10

actually

play19:10

is represented it is representing

play19:13

one of these values so there's like 20

play19:17

different values like this across and

play19:20

each one of those is represented by a

play19:21

new square now

play19:24

so it looks like colors to us but to the

play19:28

to the algorithm it's still just

play19:30

numerical data so

play19:31

we'll be able to feed this into our

play19:33

model

play19:36

all right so now we have a new x uh

play19:38

let's create a new x train x test

play19:42

y train y test

play19:46

train test split x y

play19:49

train size of seventy 70 and same random

play19:52

state as before

play19:54

42. all right and now let's build

play19:58

a new model so before our model looks

play20:01

like this right let's copy this down

play20:03

into here and oops

play20:07

instead of using just two hidden layers

play20:10

like that

play20:11

i'm going to use the standard cnn

play20:14

architecture which is

play20:16

going to use a we'll do a convolutional

play20:20

so conf 2d layer

play20:23

that takes a

play20:27

the number of filters which will make 16

play20:30

and then the size of a kernel which will

play20:32

make three

play20:33

and an activation function which will be

play20:36

relu

play20:37

this will take in inputs and then we'll

play20:40

have

play20:41

a max pooling layer

play20:46

max pulling 2d

play20:49

and we'll pass an x to that and so i'm

play20:52

going to copy this three times

play20:55

whoops

play20:59

i make sure these are x and this is

play21:01

going to be 32

play21:02

and this will be 64.

play21:05

then when we're done we'll flatten it

play21:08

with a flattened layer

play21:13

and we'll feed it through a final dense

play21:15

layer at the end

play21:20

all right so let's see how this goes

play21:25

uh we have problem incompatible

play21:32

oh i didn't pass anything in here

play21:35

no we're still getting a problem conf

play21:38

2d2 is incompatible with layer

play21:42

oh i specified the wrong shape here so

play21:46

no longer are we dealing with just a

play21:48

single vector we

play21:49

need three values five by five by one

play21:53

so shape would be five by five by one

play21:57

but i'll just represent it in terms of x

play21:59

dot shape

play22:00

x dot shape sub one x dot shape sub

play22:05

two and x dot shape

play22:08

sub three

play22:12

oh what is this negative dimension

play22:17

uh

play22:22

oh wait

play22:29

let me try just copy and pasting what i

play22:31

have before

play22:39

okay um give me one second

play22:43

okay i figured i figured out what i did

play22:45

wrong um i just had

play22:46

um i my kernel sizes were just way too

play22:49

big

play22:50

um for the pooling that was going on

play22:54

we are our images are so small that we

play22:56

can't we were trying to pull into

play22:58

negative dimensions

play22:59

so uh don't worry about that we're just

play23:02

going to keep it with two convolutional

play23:04

layers

play23:04

a kernel size of two here kernel size of

play23:06

one here and see how that works

play23:09

uh so right so we're taking in this

play23:11

image and we're going to run through the

play23:13

convolutional layer max pooling layer

play23:15

convolutional air max pulling layer then

play23:17

we'll flatten it out into a single

play23:18

vector

play23:19

apply it through a last dense layer and

play23:21

then give our output

play23:23

so if we look at uh here you can see the

play23:26

shape

play23:26

converges down to it starts like this

play23:29

sort of goes up a little comes back down

play23:32

into

play23:33

one all right so uh let's

play23:37

let's now train okay so i'm going to

play23:39

grab

play23:40

this code and put it over here

play23:44

i just make sure everything's sort of

play23:46

similar i don't think we have to change

play23:48

anything

play23:49

it should be the same all right i'll run

play23:51

that

play23:52

and let's evaluate the model when we're

play23:54

done

play24:00

and we actually get some pretty good

play24:02

results

play24:03

um so before me is just a standard

play24:07

uh way we got a 98 accuracy and 0.997

play24:12

auc

play24:13

here we actually have a higher auc and

play24:16

the lower accuracy by

play24:17

only three percent um which is you know

play24:20

that's pretty good

play24:21

like um this is higher than i got before

play24:25

i guess because i didn't realize i was

play24:27

cutting out the uh float

play24:29

values here um i realize

play24:32

we may also be able to do this without

play24:34

padding the zeros on the bottom

play24:36

i'm not sure if that will contribute

play24:40

uh well that will give us better results

play24:42

we could just keep a rectangular image

play24:44

and instead of making it a perfect

play24:46

square

play24:48

uh maybe a four by five image

play24:51

but yeah so i mean this from some pretty

play24:53

good results especially

play24:55

uh considering how sort of like

play24:58

unconventional this method

play24:59

seems it seems like this still wins

play25:04

just a better performance but

play25:08

i would love to look into this more and

play25:10

figure out if there is a way to make

play25:11

this

play25:11

more effective than the standard two

play25:14

hidden layer

play25:15

boring sequential model but you know

play25:18

this is very cool

play25:20

i hope you think it is also this is

play25:22

going to wrap up today's video

play25:23

thank you so much for watching i hope

play25:26

you enjoyed the video

play25:27

if you did make sure to subscribe and

play25:28

hit the bell for more content

play25:30

and leave any comments you have in the

play25:31

section below i'll see you guys tomorrow

play25:33

have a fantastic day

Rate This

5.0 / 5 (0 votes)

Related Tags
Gender RecognitionVoice DataNeural NetworksCNNData ScienceTensorFlowMachine LearningPython CodingDeep LearningModel Training