Image classification + feature extraction with Python and Scikit learn | Computer vision tutorial
Summary
TLDRIn this advanced computer vision tutorial, Felipe demonstrates building an image classifier using feature extraction with the 'image-to-vec with pytorch' repository. The tutorial utilizes a weather dataset categorized into cloudy, rain, shine, and sunrise images. Felipe guides viewers through data preparation, model training with a RandomForestClassifier from scikit-learn, performance testing, and model saving. The video concludes with a demonstration of model inference, achieving a high accuracy of 94.4%, and encourages viewers to compare this method with previous tutorials using YOLO v8 and Teachable Machine.
Takeaways
- 😀 Felipe introduces an advanced computer vision tutorial focused on image classification and feature extraction.
- 🌐 The tutorial uses a weather dataset with four categories: cloudy, rain, shine, and sunrise, which has been used in previous videos.
- 📁 The dataset is organized into 'train' and 'val' directories for training and validation data respectively.
- 🛠 Felipe demonstrates using a GitHub repository called 'image-to-vec with pytorch' for feature extraction from images.
- 💻 The process begins in PyCharm by creating a new Python project and setting up a virtual environment with Python 3.8.
- 📦 Necessary Python packages are installed, including 'image-to-vec/pytorch', 'scikit-learn', and 'pillow'.
- 🔍 The script iterates over images in the dataset, using 'os' and 'PIL' to open and process images, computing features with 'Image2Vec'.
- 📝 Features and labels are stored in lists and then saved in a dictionary with keys for training and validation data and labels.
- 🌳 A 'RandomForestClassifier' from 'scikit-learn' is chosen for the machine learning model, which is trained using the training data and labels.
- 📊 The model's performance is tested using the validation data, achieving a 94.4% accuracy score.
- 🔒 The trained model is saved using 'pickle' for future use, and a new script 'infer.py' demonstrates how to load and make predictions with the model.
Q & A
What is the main topic of the video?
-The main topic of the video is building an image classifier using feature extraction for image classification in the field of computer vision.
What dataset is used in the tutorial?
-The tutorial uses a weather dataset that includes four categories: cloudy, rain, shine, and sunrise.
What are the two main directories in the dataset?
-The two main directories in the dataset are 'train' for training data and 'val' for validation data.
What is the GitHub repository mentioned for feature extraction?
-The GitHub repository mentioned for feature extraction is called 'image-to-vec-with-pytorch', which is used to extract features from images.
Which Python packages are installed for this tutorial?
-The Python packages installed for this tutorial are 'image-to-vec-pytorch', 'scikit-learn', and 'pillow'.
What is the first step in the process described in the video?
-The first step in the process is to prepare the data for training the image classifier.
How are the features extracted from the images?
-Features are extracted from the images using the 'Image2Vec' object from the 'image-to-vec-pytorch' package by calling the 'get_vec' method on the image.
What classifier from scikit-learn is used in the tutorial?
-The tutorial uses the 'RandomForestClassifier' from scikit-learn for the image classification model.
How is the model's performance evaluated?
-The model's performance is evaluated using the accuracy score computed with the validation data, which is unseen data for the model.
What is the final accuracy achieved by the model in the video?
-The final accuracy achieved by the model in the video is 94.4 percent.
How is the trained model saved for later use?
-The trained model is saved using the 'pickle' library by opening a file named 'model.p' in write-binary mode and calling 'pickle.dump' with the model object.
How can the saved model be used for making predictions?
-The saved model can be loaded using 'pickle.load' by opening the 'model.p' file in read-binary mode and then used to make predictions by calling the 'predict' method on new feature data.
Outlines
📚 Introduction to Image Classification and Feature Extraction Tutorial
Felipe introduces his channel and the video's objective to build an image classifier using feature extraction. He mentions using a weather dataset with four categories: cloudy, rain, shine, and sunrise. Felipe provides a quick overview of the dataset and its structure, consisting of training and validation directories. He also introduces the 'image-to-vec with pytorch' GitHub repository, which will be used for feature extraction, and outlines the steps for the tutorial: preparing data, training the model, testing its performance, and saving the model.
🔍 Preparing the Data for Image Classification
The script details the process of setting up the data directories for training and validation using the 'os' module in Python. It explains how to iterate through categories and images within those categories to read and extract features from each image using the 'Image2Vec' feature extractor. The extracted features and corresponding labels are stored in lists, which are then organized into a dictionary with keys for training and validation data and labels.
🌳 Training the Image Classifier with RandomForestClassifier
After preparing the data, Felipe moves on to training the image classifier using the RandomForestClassifier from scikit-learn. He explains the simplicity of the training process, which involves fitting the model with training data and labels. The script also includes importing necessary libraries and creating an instance of the RandomForestClassifier, emphasizing the ease of use and robustness of this classifier.
📊 Testing the Classifier's Performance with Validation Data
The script describes how to test the trained classifier's performance using unseen validation data. It outlines the use of the accuracy score from scikit-learn to measure the model's accuracy. Felipe executes the script to demonstrate the successful training and testing process, achieving a high accuracy rate of 94.4 percent, and suggests comparing this result with previous tutorials using different classifiers.
💾 Saving and Loading the Trained Model for Inference
Felipe explains how to save the trained model using the pickle library and provides a step-by-step guide to save the model as 'model.p'. He then demonstrates how to load the saved model for making predictions on new images. The script includes creating a new file 'infer.py' for inference, loading the model, and using it to predict the category of a validation image, which successfully matches the expected 'cloudy' category.
Mindmap
Keywords
💡Image Classification
💡Feature Extraction
💡Computer Vision
💡Dataset
💡Training Data
💡Validation Data
💡Random Forest Classifier
💡Accuracy Score
💡Pickle
💡Inference
Highlights
Introduction to an advanced computer vision tutorial on image classification and feature extraction.
Use of the weather dataset with four categories: cloudy, rain, shine, and sunrise for image classification.
Demonstration of the dataset's structure with separate directories for training and validation data.
Utilization of the 'image-to-vec-with-pytorch' GitHub repository for feature extraction from images.
Explanation of setting up a new Python project in PyCharm with a virtual environment and Python 3.8.
Installation of necessary Python packages: image-to-vec-pytorch, scikit-learn, and pillow.
Coding process initiation in main.py for training an image classifier and performing feature extraction.
Description of the four-step process: data preparation, model training, performance testing, and model saving.
Importing the Image2Vec class from image-to-vec-pytorch for feature extraction.
Preparation of data by iterating through image categories and computing features using Image2Vec.
Creation of lists for features and labels, and their organization into a dictionary for training and validation sets.
Introduction of the RandomForestClassifier from scikit-learn for the image classification model.
Training the RandomForestClassifier using the prepared training data and labels.
Testing the classifier's performance using the validation data and calculating the accuracy score.
Achievement of a 94.4 percent accuracy score with the trained image classifier.
Comparison of the achieved accuracy with previous tutorials using YOLO v8 and Teachable Machine.
Saving the trained model using the pickle library for future use.
Demonstration of how to load the saved model and make predictions with it using infer.py.
Correct prediction of the image category 'cloudy' as an example of the model's inference capability.
Conclusion of the tutorial with an invitation to like, subscribe, and compare results with other methods.
Transcripts
Hey my name is Felipe and welcome to my channel. In this video we are going to work with image classification
and feature extraction we are going to build an image classifier which internally
uses feature extraction in order to classify images this is a very very very advanced computer
vision tutorial and now let's get started and now let me show you the data we will be using in
today's tutorial today we are going to work with the same weather dataset we already used in two
of my previous videos in my previous video where I showed you how to train an image classifier using
yolo v8 and also in my previous video where I showed you how to train an image classifier using
teachable machine, exactly the same weather dataset and now let me show you super quickly how this
dataset looks like you can see we have four different categories and these categories are
cloudy rain shine and sunrise now let me show you super quickly how each one of these categories
looks like and you can see this is the Cloudy category now if I show you the rain category you
can see we have many many many pictures of super super super rainy days and if I show you the shine
category this is how the images look like and if I show you the sunrise category you can see
we have many many many pictures of sunrises so this is the data we will be using in this
tutorial and you can see we have two directories one of them is train and this is where the training
data is located and then we have another directory which is called val and this is the validation
data now let me show you something else if I go to my browser this is the GitHub repository we
are going to use as a feature extractor because remember in this tutorial we are going to work
with image specification but we're also going to work with feature extraction and this is exactly
how we are going to extract features from more images this repository is called image to vec
with pytorch and this repository is available as a python package so this is exactly how we
are going to extract features from our images and this is the repository we are going to be using in
this tutorial now let's go to pycharm because now it's time to start working on this tutorial now
it's time to create a new python project so I go to file new project and I'm going to select
the directory where I want to create this project which in my case is something like
here tutorial okay and I'm going to create this new project using a virtual environment and python
3.8 I click on create, create from existing sources, this window, the python project has been created
and now let's install the requirements we need in order to work on this tutorial so I'm going
to file, settings, then project, python interpreter, I'm going to click on this plus button and the
first Library we are going to install is this one right we are going to work with this repository so
we are going to install this python package so we need to install image to vec slash pytorch so
let's find this Library image to vec and this is the one we need pytorch okay I'm going to
click here and then install package and that's pretty much all then we also need to install
two additional libraries one of them is scikit learn so I'm going to search for scikit learn
this one over here I'm going to press install package and then I also need to install pillow
and that's pretty much all so my python packages are being installed but in the meanwhile let's
start working in this tutorial let's start working on all the coding of all of this tutorial so
I'm going to press here on OK and then I am going to create a new file which is main.py
main dot py okay and this is the file we are going to use in order to train our image classifier in
order to do the feature extraction and also the image classification so I am going to enlarge it
a little and the first thing I'm going to do is to write the four steps we will take in order to
complete this process the first one is to prepare the data we are going to use in order to train
this classifier then we are going to train the model we are going to train a classifier
then we are going to test the performance of this classifier and finally we are going to
save the model into our computer right so these are the four steps of today's tutorial and now
let's get started and the first thing we need to do is to import from image to vec pytorch import
Image2Vec okay and then we are going to create image to vec... we are going to create
a new object which will be the feature extractor we are going to use in order to extract features
from our images and now let's prepare the data so I am going to define a new variable which is data
directory and this is the directory where my data is located and my data is located in the current
directory, this is the directory where I have created this project this is the main.py file in which
we are currently working in and this is where my data is located it's in a directory which is
called data and within this directory there's another directory which is called weather dataset
so I am going to define the data dir as data whether dataset... then let's create two additional
variables one of them will be train directory and this is where my training data is located
which is here... this is something more like this... os path join data directory and then train okay
and then the validation directory which is going to be something like this
okay and I need to import os otherwise this is not going to work, so import...
os... okay and that's pretty much all remember within weather dataset we have
two directories one of them is called train and the other one is called val now let's continue
now let's walk through all the files in these two directories and this is how we're going to do: for
dir in
train directory, val directory
for category in os listdir directory right we are going to iterate in each one of these categories
and we are going to Define another variable we are going
to make another loop which will be for image path in os path
join dir category right so we are iterating in all the images which are within this
directory right so we are iterating in all the categories so we are iterating in each
one of these four directories we have over here and then for each one of these directories
we are going to iterate in all the images you can see over here right now we are going to
Define another variable which is going to be image path underscore and this is os path join
dir directory category and then image path okay
then we are going to make another import which is from PIL import image because we are going to
read each one of these images we're going to load each one of these images like this image dot open
image path underscore and this will be image okay and now we are going to compute all the
features from this image and this is how we're going to do we are going to call image to vec
dot get vec and we are going to input the image and that's it, this is all it takes to
compute all the features from this image and this is going to be something like features
okay now I am going to create two lists I'm going to do it over here
one of them will be features and the other one will be labels okay... and this object maybe
it's a better idea to call it something like image features right something like that
and now we are going to append features dot append image features right and then labels dot append
and the label will be something like the category right these are our labels... each one of these
directory names is our categories right these are our four categories so in order to append
the label we need to append the directory name we need to append this variable over here which
is called category now let's continue now I'm going to Define another variable which is data
this is going to be a dictionary and I'm going to do something like this I am going to walk not only
in the directories but I'm also going to Define another variable which is going to be j dir in enumerate
train dir val dir... something like this so now this is going to take these two values... dir is going
to be train dir in the first iteration and it's going to be val dir in the second
iteration and j will be 0 in the first iteration and it will be 1 in the second iteration so by
doing so we can do something like this now I'm going to say data training data
validation data
and this will be j equal to features and then
the same but with the labels training labels validation labels j and this will be labels right
remember that in the first iteration dir is training directory and j will be zero so if
we access the index number 0 from this list we are going to get training data and then in the
second iteration j will be one so we we are going to access the second element which is validation
data so we are going to create this key according to the iteration we are in and the same is going
to happen for the labels right but in order to be more clear let me show you exactly how this
variable we have over here how this dictionary we have over here how it looks like so I'm just going
to execute the code as it is and at the end I'm going to say something like print data dot keys
right let's take a look at this dictionary we have over here and let's see exactly how it looks
like let's see exactly what are the keys for this dictionary so I'm just going to press play
okay the execution is now completed and these are the keys we have saved in this dictionary
training data and training labels validation data validation labels so I invite you to take a look
at this code I invite you to go through this code once and again until it's 100% clear for
you exactly what we are doing over here please notice we are iterating in all the images, in
all the training images and all the validation images we are opening these images we are
computing the features and then we are saving this data into these two lists we have over here and
then we are saving these lists in this dictionary under this very appropriate keys right so please
go through this code once and again until it is 100% clear for you but for now let's continue
we have saved all of our data into this dictionary and now it's time to train the model and in order
to train this model let's go to my browser super quickly because we are going to use
a model, a classifier from scikit learn and let me show you all the different classifiers we could
use from scikit learn, I also invite you to take a very close look to this website so
you are more familiar with all the different classifiers you could use from scikit learn in
this tutorial we are going to use random Forest classifier we are going to use this classifier
over here but we could also use any other classifier from here right the only reason
we are going to use a random forest classifier is because I like it I really like this classifier
I have used it many times in many projects I think it's very robust it's very easy to use
it's very easy to understand what's going on I really like this classifier but that's the
only reason why we are going to use a random forest classifier we could also use any other classifier
from here so let's go back to pycharm and now I'm going to make another import which is
from sklearn dot ensemble import random forest classifier and then I am going to take this
value random forest classifier and I'm going to define a new... another variable which is called model
and model will be RandomForestClassifier right we are creating a new instance of this classifier
and then in order to train the model the only thing we need to do is to call model.fit
we need to input the training data which is here
the training labels which are here
and that's it that's all that's all it takes in order to train this classifier you can see how
simple this is now let's continue now it's time to test the performance of the image classifier
we have just trained in order to do so I am going to make another import which is from sklearn
dot matrix import accuracy score and I'm going to make something like this model.predict
validation data
sorry data validation data we want to access our validation data which is saved here
and this will be something like y pred
then we are going to call accuracy score we're going to... we're
going to input y pred and we're also going to input data validation labels
okay so you can see we are computing the accuracy score with the validation data
which is completely and absolutely unseen data for our model, for our classifier, you
can see here we are training our classifier with the training data, the validation data
is completely unseen data, the classifier has never seen this data before so this
is exactly the type of data we should use in order to compute the accuracy score in
order to test the performance of our model so this is going to be something like score
and now I am going to print score and then we are going to save the model but let's take it one
step at a time let's execute this script let's see if everything is okay, okay
the execution is now completed we didn't have any error whatsoever and this is the accuracy we got a
94.4 percent accuracy which is a very very very high level of accuracy now remember I told you
this is exactly the same dataset we used in two of my previous videos in my previous video where I
showed you how to train an image classifier using yolo V8 and also my previous video where I showed
you how to train an image classifier using teachable machine this is exactly the same dataset so I
invite you to take a look at those two videos and to compare the accuracy we are getting here with
this classifier with the accuracy we got in those two videos right using yolo V8 and using teachable
machine I invite you to compare this accuracy with the accuracy we got in those two other
videos that's your homework that's your homework from today's tutorial and now let's continue so
the only thing we need to do now is to continue to the next step which is saving the model and
this is how we are going to save the model we are going to import another library which is pickle
and we are going to say something like this with open the file name will be something like model.p
this will be WB... as f... and we are going to call pickle.dump
then this will be the object we are going to save which in our case is model
and then f... and that's pretty much all then I'm going to close the file and now I'm going to run
this file again okay and now if we go to my file system you can see that now we will have a file
which is called model.p and this is the model we have just saved so everything is working just fine and
now let me show you how to take this model how to load this model to produce inferences with it
right so I'm going back to pycharm and I'm going to create a new file which is called infer.py
and this is where we are going to make our inference so I am going to take a
few lines from here I am going to import this object and I'm also going to import this other...
this other object from PIL and then this is going to be a very very similar process I'm going to
create this object over here which is image to vec this is the object we are going to
use in order to compute our features and then I'm just going to define an image path which will be
one of my images in the validation data I'm just going to take a random image something like this
cloudy 4.jpg a random image in this directory and then I am going to do something like image will be...
image.open...
image path and then I am going to compute features from this image doing something like image to vec
image to vec dot get vec image
this will be features okay then I am going to load the model or maybe
I can load the model over here so I'm going to say something like with open
model.p this will be RB as f... and model will be pickle dot load f and obviously
we need to import pickle otherwise this is not going to work
import pickle and that's pretty much all and now we are going to call model dot
predict and we are going to input the features and this is how we are going to do it and this
will be something like prediction okay and now let's print prediction and let's see what happens
and this is the result we are getting cloudy which is exactly this image category remember
we are taking this image from the Cloudy directory and absolutely all the images in
this directory belong to the cloudy category so everything seems to be working just fine and this
is going to be all for this video my name is Felipe I'm a computer vision engineer if you
enjoyed this video I invite you to click the like button and I also invite you to
subscribe to my channel this is going to be all for today and see you on my next video
浏览更多相关视频
YOLOv8: How to Train for Object Detection on a Custom Dataset
Project 06: Heart Disease Prediction Using Python & Machine Learning
YOLOv7 | Instance Segmentation on Custom Dataset
Plant Leaf Disease Detection Using CNN | Python
Training a model to recognize sentiment in text (NLP Zero to Hero - Part 3)
Machine Learning Tutorial Python - 15: Naive Bayes Classifier Algorithm Part 2
5.0 / 5 (0 votes)