All Machine Learning algorithms explained in 17 min
Summary
TLDRIn this video, Tim, a seasoned data scientist, offers a comprehensive overview of essential machine learning algorithms, guiding viewers on selecting the right algorithm for their data challenges. The script covers supervised and unsupervised learning, detailing algorithms like linear regression, logistic regression, KNN, SVM, and neural networks. It also touches on clustering with K-means and dimensionality reduction with PCA, providing a foundational understanding of how these algorithms can be applied to real-world problems.
Takeaways
- π¨βπ« The speaker, Tim, is an experienced data scientist who has taught various machine learning algorithms to students.
- β±οΈ The presentation aims to provide an overview of crucial machine learning algorithms within 17 minutes.
- π€ Machine learning is a subset of AI that involves algorithms capable of learning from data and making predictions on new, unseen data.
- π Machine learning is primarily divided into supervised and unsupervised learning, each with its own set of algorithms and applications.
- π In supervised learning, algorithms are trained on labeled data to predict outcomes, such as house prices or image classifications.
- π Unsupervised learning involves finding patterns in data without any pre-existing labels, like grouping emails into categories.
- π Linear regression is a foundational algorithm in supervised learning, used to model linear relationships between variables.
- π Logistic regression is used for classification tasks, predicting the probability of an outcome based on input variables.
- π« The K-nearest neighbors (KNN) algorithm makes predictions based on the 'K' closest data points in the feature space.
- π Support Vector Machines (SVM) are used for classification and regression by finding the optimal boundary that separates different classes.
- π³ Decision trees and their ensemble methods like Random Forests and boosting are powerful for handling complex decision-making processes.
- π§ Neural networks, including deep learning, are capable of automatically learning complex features from data, making them versatile for a wide range of tasks.
Q & A
What is the main goal of the video presented by Tim, the data scientist?
-The main goal of the video is to provide an intuitive understanding of major machine learning algorithms, helping viewers decide which algorithm is suitable for their specific problem and to stop feeling overwhelmed by the field.
How does Tim define machine learning according to the script?
-Tim defines machine learning as a field of study in artificial intelligence that involves the development and study of statistical algorithms capable of learning from data and generalizing to unseen data, thus performing tasks without explicit instructions.
What are the two main subfields of machine learning mentioned in the script?
-The two main subfields of machine learning mentioned are supervised learning and unsupervised learning.
What is the difference between supervised and unsupervised learning as described in the script?
-Supervised learning involves a dataset with independent variables and a dependent variable that is supposed to be predicted, using known output values or labels for training. Unsupervised learning, on the other hand, involves no known truth about the data, and the algorithm groups data points by similarity without any further instructions.
What are the two broad categories within supervised learning according to the script?
-The two broad categories within supervised learning are regression and classification. Regression predicts a continuous numeric target variable, while classification assigns a discrete categorical label to data points.
Can you explain the concept of linear regression as presented in the script?
-Linear regression is a supervised learning algorithm that attempts to determine a linear relationship between an input variable and an output variable. It fits a linear equation to the data by minimizing the sum of the squares of the distances between data points and the regression line, aiming to minimize prediction errors for new data points.
What is logistic regression and how does it differ from linear regression?
-Logistic regression is a variant of linear regression used for classification tasks. Unlike linear regression, which predicts a continuous output, logistic regression predicts the probability of a categorical output variable using input variables. It fits a sigmoid function to the data to estimate probabilities.
How does the K-Nearest Neighbors (KNN) algorithm work, as described in the script?
-The KNN algorithm is a non-parametric algorithm used for both regression and classification. For a new data point, it predicts the target to be the average of its K nearest neighbors in the feature space. The choice of K, a hyperparameter, affects the model's performance, with smaller values leading to overfitting and larger values leading to underfitting.
What is the core concept of the Support Vector Machine (SVM) algorithm?
-The core concept of the SVM algorithm is to draw a decision boundary that separates data points of the training set as distinctly as possible. It maximizes the margin between different classes, making the decision boundary generalize well and be less sensitive to noise and outliers.
What is the difference between ensemble methods like Random Forests and Boosting as described in the script?
-Random Forests use an ensemble method called bagging, where multiple decision trees are trained on different subsets of the training data and vote on the classification by majority. Boosting, on the other hand, trains models sequentially, with each model focusing on correcting the errors of the previous model. Random Forests are less prone to overfitting and faster to train, while Boosting can achieve higher accuracies but is slower and more prone to overfitting.
How does the concept of neural networks relate to the idea of feature engineering as presented in the script?
-Neural networks extend the idea of feature engineering by implicitly and automatically designing complex features for the model without human guidance. By adding layers of unknown variables, or hidden layers, the network learns to represent complex patterns in the data, such as recognizing shapes or features in images, without explicitly defining these features.
What is the primary goal of unsupervised learning algorithms like K-means clustering?
-The primary goal of unsupervised learning algorithms like K-means clustering is to find underlying structures in the data without any prior knowledge of the data's labels. K-means aims to partition the data into K distinct, non-overlapping subgroups or clusters based on the similarity of data points.
How does Principal Component Analysis (PCA) contribute to dimensionality reduction as described in the script?
-PCA contributes to dimensionality reduction by identifying the directions (principal components) along which the data varies the most and retaining these directions while discarding others that contribute less to the variance. This process reduces the number of features in the dataset, potentially removing redundancy and noise, and can improve the efficiency of machine learning models.
Outlines
π― Introduction to Machine Learning Algorithms
In this introductory paragraph, Tim, a data scientist with over 10 years of experience, presents an overview of machine learning (ML) algorithms, aiming to help viewers select the right algorithm for their problem. He highlights the importance of understanding key ML algorithms and distinguishes between supervised and unsupervised learning. Tim also sets the goal of providing an intuitive grasp of the major algorithms, focusing on how to apply them effectively.
π Overview of Supervised and Unsupervised Learning
This section explains the two main types of machine learning: supervised and unsupervised. In supervised learning, a known dataset with input (features) and output (labels) is used to train models for prediction. Examples include house price prediction and object classification (e.g., cat or dog). In unsupervised learning, there are no labels, and the goal is to find patterns or groupings in the data, like clustering emails into categories without prior labels.
π Supervised Learning: Regression and Classification
Supervised learning is broken down into two subfields: regression and classification. Regression predicts a continuous target variable, like house prices, while classification assigns categorical labels, such as labeling an email as spam or non-spam. Examples include predicting house prices with regression or using features like height and weight to classify emails into different categories. Tim discusses how regression finds relationships, and classification assigns discrete categories based on input variables.
π Linear Regression: The Foundation of Machine Learning
Linear regression is introduced as the simplest and most fundamental supervised learning algorithm. It fits a linear equation to data points to predict an output based on input features, minimizing prediction errors. Tim explains the concept of fitting data to a regression line to find relationships, like predicting height based on shoe size. He also touches on how this basic idea extends to more complex models, such as neural networks.
π Logistic Regression and K-Nearest Neighbors (KNN)
Logistic regression is introduced as a basic classification algorithm used to predict categorical outputs. The example given is predicting gender based on height and weight, using a sigmoid function to estimate probabilities. Tim then explains K-Nearest Neighbors (KNN), a non-parametric algorithm that classifies data based on the 'K' nearest neighbors. KNN is useful for both regression and classification, with examples of predicting weight or gender based on neighboring data points.
πͺ Support Vector Machines (SVM): Decision Boundaries
Support Vector Machines (SVM) are introduced as supervised algorithms primarily used for classification. The core idea is to create a decision boundary that separates data points into different classes. Tim explains how SVMs work by maximizing the margin between classes, making them effective in high-dimensional spaces. The use of kernel functions, which allow for complex nonlinear decision boundaries, is also highlighted as a powerful feature of SVMs.
π§ Naive Bayes Classifier and Decision Trees
The Naive Bayes classifier is discussed, which uses Bayes' theorem to calculate probabilities and classify data based on word occurrences in tasks like spam detection. Although the algorithm assumes independence between features, it is computationally efficient for tasks like text classification. Tim then introduces decision trees, which use a series of binary decisions to classify data. The goal is to create 'pure' leaf nodes by splitting data in a way that minimizes misclassifications.
π² Random Forests and Ensemble Learning
Tim explains how decision trees are combined into ensemble algorithms like Random Forests and boosting. In Random Forests, multiple decision trees are trained on random subsets of data, and their predictions are combined to improve accuracy. Boosting, on the other hand, trains models sequentially, with each new model trying to correct the errors of the previous ones. Boosted trees can achieve higher accuracy but are more prone to overfitting and require more training time.
π€ Neural Networks: The Power of Deep Learning
Neural networks are introduced, starting with the concept of logistic regression, then extending to multilayer perceptrons. Neural networks use hidden layers of variables to extract features and make predictions, automating feature engineering. As more layers are added, this becomes deep learning, allowing the network to learn complex patterns and relationships, such as recognizing digits from pixel data in images. Tim emphasizes how neural networks excel at tasks where manual feature engineering is difficult.
π Clustering: Unsupervised Learning and K-Means
Clustering is discussed as an unsupervised learning method that aims to find natural groupings in data without predefined labels. Tim uses examples to explain the difference between clustering and classification. The K-Means algorithm, which groups data by minimizing the distance to randomly chosen cluster centers, is introduced. The challenge of choosing the right number of clusters is highlighted, with references to other clustering methods like hierarchical clustering and DBSCAN.
π Dimensionality Reduction and Principal Component Analysis (PCA)
Dimensionality reduction is introduced as a way to simplify datasets by reducing the number of features while retaining important information. Tim explains Principal Component Analysis (PCA), which identifies directions (principal components) that capture the most variance in the data. This can help reduce redundancy in large datasets and improve the efficiency and robustness of machine learning models. He uses the example of combining height and length into a single 'shape' feature to illustrate the process.
π Conclusion and Machine Learning Cheat Sheet
In the final section, Tim wraps up the video by providing a summary of the algorithms discussed and a helpful cheat sheet from Scikit-learn to guide viewers in choosing the right algorithm for their specific problem. He also mentions a roadmap for learning machine learning, directing viewers to another video for further learning resources.
Mindmap
Keywords
π‘Machine Learning
π‘Supervised Learning
π‘Unsupervised Learning
π‘Linear Regression
π‘Logistic Regression
π‘K Nearest Neighbors (KNN)
π‘Support Vector Machine (SVM)
π‘Neural Networks
π‘Ensemble Methods
π‘Dimensionality Reduction
π‘K Means Clustering
Highlights
Overview of the most important machine learning algorithms.
Simple strategy for picking the right algorithm for your problem.
Machine learning is divided into supervised and unsupervised learning.
Supervised learning involves predicting a target variable based on features.
Unsupervised learning involves finding patterns without known labels.
Regression predicts a continuous numeric target variable.
Classification assigns a discrete categorical label to data points.
Linear regression determines a linear relationship between input and output variables.
Logistic regression is used for predicting categorical output variables.
K-Nearest Neighbors (KNN) algorithm predicts based on the average of nearest neighbors.
Support Vector Machine (SVM) creates a decision boundary to classify data points.
Naive Bayes classifier is based on Bayes' theorem and is used for text classification.
Decision trees create a series of yes/no questions to partition data.
Ensemble methods like Random Forest and Boosting combine multiple models for better predictions.
Neural networks implicitly create complex features for better predictions.
Deep learning involves multiple layers of hidden features for complex data representation.
K-means clustering is a method for finding groups in unlabeled data.
Dimensionality reduction techniques like PCA help in reducing data complexity.
Practical advice on choosing the right algorithm for different types of machine learning problems.
Transcripts
in the next 17 minutes I will give you
an overview of the most important
machine learning algorithms to help you
decide which one is right for your
problem my name is Tim and I have been a
data scientist for over 10 years and
taught all of these algorithms to
hundreds of students in real life
machine learning boot camps there is a
simple strategy for picking the right
algorithm for your problem in 17 minutes
you will know how to pick the right one
for any problem and get a basic
intuition of each algorithm and how they
relate to each other my goal is to give
as many of you as possible an intuitive
understanding of the major machine
learning algorithms to make you stop
feeling overwhelmed according to
Wikipedia machine learning is a field of
study in artificial intelligence
concerned with the development and study
of statistical algorithms that can learn
from data and generalize to unseen data
and thus perform tasks without explicit
instructions much of the recent
advancements in AI are driven by neural
networks which I hope to give you an
intuitive understanding of by the end of
this video Let's divide machine learning
into its subfields generally machine
learning is divided into two areas
supervised learning and unsupervised
learning supervised learning is when we
have a data set with any number of
independent variables also called
features or input variables and a
dependent variable also called Target or
output variable that is supposed to be
predicted we have a so-called training
data set where we know the True Values
for the output variable also called
labels that we can train our algorithm
on to later predict the output variable
for new unknown data examples could be
predicting the price of a house the
output variable based on features of the
house say square footage location year
of construction Etc categorizing an
object as a cat or a dog the output
variable or label based on features of
the object say height weight size of the
ears color of the eyes Etc unsupervised
learning is basically any learning
problem that is not supervised so where
no truth about the data is known so
where a supervised algorithm would be
like showing a little kid what a typical
cat looks like and what a typical dog
looks like and then giving it a new
picture and asking it what animal it
sees an unsupervised algorithm would be
giving a kid with no idea of what cats
and dogs are a pile of pictures of
animals and asking it to group by
similarity without any further
instructions examples of unsupervised
problems might be to sort all of your
emails into three unspecified categories
which you can then later inspect and
name as you wish the algorithm will
decide on its own how it will create
those categories also called clusters
let's start with supervised learning
arguably the bigger and more important
branch of machine learning there are
broadly two subcategories in regression
we want to predict a continuous numeric
Target variable for a given input
variable using the example from before
it could be predicting the price of a
house given any number features of a
house and determining their relationship
to the final price of the house we might
for example find out that square footage
is directly proportional to the price
linear dependence but that the age of
the house has no influence on the price
of the house in classification we try to
assign a discrete categorical label also
called a class to a data point for
example we may want to assign the label
spam or no spam to an email based on its
content sender and so on but we could
also have more than two classes for
example junk primary social promotions
and updates as Gmail does by default now
let's dive into the actual algorithms
starting with the mother of all machine
learning algorithms linear regression in
general supervised learning algorithms
try to determine the relationship
between two variables we try to find the
function that Maps one to the other
linear regression in its simplest form
is trying to determine a linear
relationship between two variables
namely the input and the output we want
to fit a linear equation to the data by
minimizing the sum of squares of the
distances between data points and the
regression Line This simply minimizes
the average distance of the real data to
our predictive model in this case the
regression line and should therefore
minimize prediction errors for new data
points a simple example of a linear
relationship might be the height and
shoe size of a person where the
regression fit might tell us that for
every one unit of shoe size increase the
person will be on average 2 Ines taller
you can make your model more complex and
fit multi-dimensional data to an output
variable in the example of the shoe size
you might for example want to include
the gender age and ethnicity of the
person to get an even better model many
of the very fancy machine learning
algorithms including neural networks are
just extensions of this very simple idea
as I will show you later in the video
logistic regression is a variant of
linear regression and probably the most
basic classification algorithm instead
of fitting a line to two numerical
variables with a presumably linear
relationship you now try to predict a
categorical output variable using
categorical or numerical input variables
let's look at an example we now want to
predict one of two classes for example
the gender of a person based on height
and weight so a linear regression
wouldn't make much sense anymore instead
of fitting a line to the data we now fit
a so-called sigmoid function to the data
which looks like this the equation will
now not tell us about a linear
relationship between two variables but
will now conveniently tell us the
probability of a data point falling into
a certain class given the value of the
input variable so for example the
likelihood of an adult person with a
height of 180 cm being a man would be
80% this is completely made up of course
the K nearest neighbors algorithm or KNN
is a very simple and intuitive algorithm
that can be used for both regression and
class ification it is a so-called
non-parametric algorithm the name means
that we don't try to fit any equations
and thus find any parameters of a model
so no true model fitting is necessary
the idea of KNN is simply that for any
given new data point we will predict the
target to be the average of its K
nearest neighbors while this might seem
very simple this is actually a very
powerful predictive algorithm especially
when relationships are more complicated
than a simple linear relationship in a
classification example we might say that
the gender of a person will be the same
as the majority of the five people
closest in weight and height to the
person in question in a regression
example we might say that the weight of
a person is the average weight of the
three people closest in height and of
chest circumference this makes a ton of
intuitive sense you might realize that
the number three seems a bit arbitrary
and it is K is called a hyperparameter
of the algorithm and choosing the right
K is an art choosing a very small number
of K say one or two will lead to your
model predicting your training data set
very well but not generalizing well to
unseen data this is called overfitting
choosing a very large number say 1,000
will lead to a worst fit over overall
this is called underfitting the best
number is somewhere in between and
Depends a lot on the problem at hand
methods for finding the right
hyperparameters include cross validation
but are beyond the scope of this video
support Vector machine is a supervised
machine learning algorithm originally
designed for classification tasks but it
can also be used for regression tasks
the core concept of the algorithm is to
draw a decision boundary between data
points that separates data points of the
training set as well as possible as the
name suggests a new unseen data point
will be classified according to where it
falls with respect to the decision
boundary let's take this arbitrary
example of trying to classify animals by
their weight and the length of their
nose in this simple case of trying to
classify cats and elephants the decision
boundary is a straight line the svm
algorithm tries to find the line that
separates the classes with the largest
margin possible that is maximizing the
space between the different classes this
makes the decision boundary generalize
well and less sensitive to noise and
outliers in the training data the
so-called support vectors are the data
points that sit on the edge of the
margin knowing the support vectors is
enough to classify new data points which
often makes the algorithm very memory
efficient one of the benefits of SPM is
that it is very powerful in high
Dimensions that is if the number of
features is large compared to the size
of the data in those higher dimensional
cases the decision boundary is called a
hyperplane another feature that makes
svms extremely powerful is the use of
so-called kernel functions which allow
for the identification of Highly complex
nonlinear decision boundaries kernel
functions are an implicit way to turn
your original features into new more
complex features using the so-called
kernel trick which is beyond the scope
of this video this allows for efficient
creation of nonlinear decision
boundaries by creating complex new
features such as weight divided by
height squared also called the BMI this
is called implicit feature engineering
neural networks take the idea of
implicit feature engineering to the next
level as I will explain later possible
kernel functions for svms are the linear
the polinomial the RBF and the sigmoid
kernel another fairly simple classifier
is the naive Bas classifier that gets
its name from B theorem which looks like
this I believe it's easiest to
understand naive Bay with an example use
case that it is often used for spam
filters we can train our algorithm with
a number of spam and non-spam emails and
count the occurrences of different words
in each class and thereby calculate the
probability of certain words appearing
in spam emails and non-spam emails we
can then quickly classify a new email
based on the words it contains by by
using base theorem we simply multiply
the different probabilities of all words
in the email together this algorithm
makes the false assumption that the
probabilities of the different words
appearing are independent of each other
which is why we call this classifier
naive this makes it very computation
Ally efficient while still being a good
approximation for many use cases such as
spam classification and other text-based
classification tasks decision trees are
the basis of a number of more complex
supervised learning algorithms in its
simplest form a decision tree looks
somewhat like this the decision tree is
basically a series of yes no questions
that allow us to partition a data set in
several Dimensions here is an example
decision tree for classifying people
into high and lowrisk patients for heart
attacks the goal of the decision tree
algorithm is to create so-called Leaf
nodes at the bottom of the tree that are
as pure as possible meaning instead of
randomly splitting the data we try to
find splits that lead to the resulting
groups or leaves to be as pure as
possible which is to say that as few
data points as possible are
misclassified while this might seem like
a very basic and simple algorithm which
it is we can turn it into a very
powerful algorithm by combining many
decision trees together combining many
simple models to a more powerful complex
model is called an ensemble algorithm
one form of ensembling is bagging where
we train multiple models on different
subsets of the training data using a
method called bootstrap
a famous version of this idea is called
a random Forest where many decision
trees vote on the classification of your
data by majority vote of the different
trees in the random Forest random
forests are very powerful estimators
that can be used both for classification
and regression the randomness comes from
randomly excluding features for
different trees in the forest which
prevents overfitting and makes it much
more robust because it removes
correlation between the trees another
type of Ensemble method is called
boosting where instead of running many
decision trees in parallel like for
random forests we train models in
sequence where each model focuses on
fixing the errors made by the previous
model we combine a series of weak models
in sequence thus becoming a strong model
because each sequential model tries to
fix the errors of the previous model
boosted trees often get to higher
accuracies than random forests but are
also more prone to overfitting its
sequential nature makes it slower to
train than random forests famous
examples of boosted trees are Ada boost
gradient boosting and XG boost the
details of which are beyond the scope of
this video now let's get to the reigning
king of AI neural networks to to
understand neural networks let's look at
logistic regression again say we have a
number of features and are trying to
predict a target class the features
might be pixel intensities of a digital
image and the target might be
classifying the image as one of the
digits from 0 to 9 now for this
particular case you might see why this
might be difficult to do with logistic
regression because say the number one
doesn't look the same when different
people write it and even if the same
person writes it several times it will
look slightly different each time and it
won't be the exact same pixels
illuminated for every instance of the
number one all of the instances of the
number one have commonality however like
they all have a dominating vertical line
and usually no Crossing Lines as other
digits might have and usually there are
no circular shapes in the number one as
there would be in the number eight or or
nine however the computer doesn't
initially know about these more complex
features but only the pixel intensities
we could manually engineer these
features by measuring some of these
things and explicitly adding them as new
features but artificial neural networks
similarly to using a kernel function
with a support Vector machine are
designed to implicitly and automatically
design these features for us without any
guidance from humans we do this by
adding additional layers of unknown
variables between the input and output
variables in its simplest form this is
called a single layer percep chop which
is basically just a multi-feature
regression task now if we add a hidden
layer the hidden variables in the middle
layer represent some hidden unknown
features and instead of predicting the
target variable directly we try to
predict these hidden features with our
input features and then try to predict
the target variables with our new hidden
features in our specific example we
might be able to say that every time
several pixels are illuminated next to
each other they represent a horizontal
line which can be a new feature to try
and predict the digit in question even
though we never explicitly defined a
feature called horizontal line This is a
much simplified view of what is actually
going on but hopefully this gets the
point across we don't usually know what
the hidden features represent we just
train the neural network to predict the
final Target as well as possible the
hidden features we can Design This Way
are limited in the case of the single
hidden layer but what if we add a layer
and have the hidden layer predict
another hidden layer what if we now had
even more layers this is called Deep
learning and can result in very complex
hidden features so that might represent
all kinds of complex information in the
pictures like the fact that there is a
face in the picture however we will
usually not know what the hidden
features mean we just know that they
result in good predictions all we have
talked about so far is supervised
learning where we wanted to predict a
specific Target variable using some
input variables however sometimes we
don't have anything specific to predict
and just want to find some underlying
structure in our data that's where
unsupervised learning comes in a very
common unsupervised problem is
clustering it's easy to confuse
clustering with classification but they
are conceptually very different
classification is when we know the
classes we want to predict and have
training data with true labels available
shown as colors here like pictures of
cats and dogs clustering is when we
don't have any labels and want to find
unknown clusters just by looking at the
overall structure of the data and trying
to find potential clusters in the data
for example we might look at a
two-dimensional data set that looks like
this any human will probably easily see
three clusters here but it's not always
as straightforward as your data might
might also look like this we don't know
how many clusters there are because the
problem is unsupervised the most famous
clustering algorithm is called K means
clustering just like for KNN K is a
hyperparameter and stands for the number
of clusters you are looking for finding
the right number of clusters again is an
art and has a lot to do with your
specific problem and some trial and
error in domain knowledge might be
required this is beyond the scope of
this video K means is very simple you
start by randomly selecting centers for
your K clusters and assigning all data
points to the cluster center closest to
them the Clusters here are shown in blue
and green you then recalculate the
cluster centers based on the data points
now assigned to them you can see the
centers moving closer to the actual
clusters you then assign the data points
again to the new cluster centers
followed by recalculating the cluster
centers you repeat this process until
the centers of the Clusters have
stabilized while K means is the most
famous and most common clustering
algorithm other algorithms exist
including some where you don't need to
specify the number of clusters like
hierarchical clustering and DB scan
which can find clusters of arbitrary
shape but I won't discuss them here the
last type of algorithm I will leave you
with is dimensionality reduction the
idea of dimensionality reduction is to
reduce the number of features or
dimensions of your data set keeping as
much information as possible usually
this group of algorithms does this by
finding correlations between existing
features and removing potentially
redundant Dimensions without losing much
information for example do you really
need a picture in high resolution to
recognize the airplane in the picture or
can you reduce the number of pixels in
the image as such dimensionality
reduction will give you information
about the relationships within your
existing features and it can also be
used as a pre-processing step in your
supervised learning algorithm to reduce
the number of features in your data set
and make the algorithm more efficient
and robust an example algorithm is
principal component analysis or PCA
let's say we are trying to predict types
of fish based on several features like
length height color and number of teeth
when looking at the correlations of the
different features we might find that
height and length are strongly
correlated and including both both won't
help the algorithm much and might in
fact hurt it by introducing noise we can
simply include a shape feature that is a
combination of the two this is actually
extremely common in large data sets and
allows us to reduce the number of
features dramatically and still get good
results PCA does this by finding the
directions in which most variance in the
data set is retained in this example the
direction of most variant is a diagonal
this is called the first principal
component or PC and can become our new
shape feature the second principal
component is orthogonal to the first and
only explains a small fr C of the
variant of the data set and can thus be
excluded from our data set in this case
in large data sets we can do this for
all features and rank them by explain
variants and exclude any principal
components that don't contribute much to
the variant and thus wouldn't help much
in our ml model this was all common
machine learning algorithms explained if
you are overwhelmed and don't know which
algorithm you need here is a great cheat
sheet by syit learn that will help you
decide which algorithm is right for
which type of problem if you want a road
map on how to learn machine learning
check out my video on that
Browse More Related Video
Types of Machine Learning for Beginners | Types of Machine learning in Hindi | Types of ML in Depth
GEOMETRIC MODELS ML(Lecture 7)
Machine Learning Interview Questions | Machine Learning Interview Preparation | Intellipaat
Supervised vs. Unsupervised Learning
All Learning Algorithms Explained in 14 Minutes
Types Of Machine Learning | Machine Learning Algorithms | Machine Learning Tutorial | Simplilearn
5.0 / 5 (0 votes)