Deep Learning(CS7015): Lec 2.5 Perceptron Learning Algorithm
Summary
TLDRThe video script discusses the Perceptron Learning Algorithm, a foundational machine learning technique for binary classification. It revisits a movie rating example to illustrate the algorithm's process of learning weights and threshold from data. The script explains the concept of convergence and the perceptron's goal of correctly classifying data points as positive or negative. It delves into the algorithm's steps, including initializing weights randomly and adjusting them based on errors made during classification. The geometric interpretation of the perceptron's decision boundary and the importance of the dot product in determining the angle between data points and the weight vector are highlighted. The script also addresses potential issues with convergence and the assumption of linearly separable data.
Takeaways
- 📚 The lecture introduces the Perceptron Learning Algorithm, which is a method for learning weights and thresholds in a perceptron model.
- 🎬 The example of movie preferences is used to illustrate the learning process, where past movie data is used to predict whether a new movie will be liked or not.
- 🔢 The data includes various features (variables) that are not just Boolean but can be real numbers, such as a rating between 0 and 1.
- 🤖 The goal of the perceptron is to learn from the data and correctly classify new movies as liked (label 1) or not liked (label 0).
- 🧠 The Perceptron Learning Algorithm starts with random initialization of weights and adjusts them based on the training data to minimize errors.
- 🔄 Convergence in the algorithm is defined as the point when no more errors are made on the training data and predictions do not change.
- 📉 The algorithm involves picking a random data point and adjusting the weights if the prediction is incorrect (error is made).
- 📚 The adjustment of weights involves adding the data point vector to the weight vector if the point is misclassified.
- 📏 The decision boundary is represented by the equation w transpose x = 0, which divides the input space into two halves.
- 📐 The geometric interpretation of the algorithm is that the weight vector w is orthogonal to the decision boundary line.
- 🔍 The algorithm's effectiveness is based on the dot product and the angles between the weight vector and the data points, with adjustments made to ensure correct classification.
Q & A
What is the Perceptron Learning Algorithm?
-The Perceptron Learning Algorithm is a method used to train a perceptron, which is a type of linear classifier. It adjusts the weights of the perceptron iteratively to correctly classify the given data points into two categories, typically represented as positive and negative labels.
What is the purpose of revisiting the movie example in the script?
-The movie example is used to illustrate a more complex scenario where the perceptron has to learn from data that includes both binary and real-valued variables, such as ratings, to classify whether a movie is liked or not.
What does the perceptron model try to achieve with the given data?
-The perceptron model aims to learn the appropriate weights and threshold such that it can correctly classify new, unseen data points based on the training data provided.
How are the weights in the perceptron initialized in the learning algorithm?
-The weights in the perceptron are initialized randomly at the beginning of the learning algorithm. This includes both the weights for the input features and the bias term.
What is meant by 'convergence' in the context of the Perceptron Learning Algorithm?
-Convergence in this context means that the perceptron has learned the weights to an extent where it no longer makes any errors on the training data, or its predictions on the training data do not change with further iterations.
How does the Perceptron Learning Algorithm handle misclassified points?
-If a point is misclassified, i.e., the dot product of the weight vector and the input vector is of the wrong sign, the algorithm adjusts the weight vector by adding the input vector to it, effectively correcting the misclassification.
What is the geometric interpretation of the Perceptron Learning Algorithm?
-Geometrically, the perceptron is trying to find a decision boundary that separates the positive and negative points. This boundary is represented by the line where the dot product of the weight vector and any point on the line equals zero, indicating orthogonality.
Why is the dot product used in the Perceptron Learning Algorithm?
-The dot product is used because it provides a measure of the similarity between two vectors. In the context of the perceptron, it helps determine the classification of a point and the angle between the weight vector and the input vectors.
What is the relationship between the angle formed by the weight vector and the data points, and their classification?
-Data points classified as positive have an angle with the weight vector that is less than 90 degrees, indicating a positive dot product. Conversely, data points classified as negative have an angle greater than 90 degrees, indicating a negative dot product.
Why might the Perceptron Learning Algorithm not converge in some cases?
-The algorithm might not converge if the data is not linearly separable, meaning there is no single hyperplane that can separate all positive and negative points without error.
What is the final step in the Perceptron Learning Algorithm?
-The final step is to make a pass over the data and check if any further corrections are needed. If no corrections are made, indicating that all points are correctly classified, the algorithm has converged and can be stopped.
Outlines
🧠 Introduction to Perceptron Learning Algorithm
This paragraph introduces the Perceptron Learning Algorithm, a method for learning weights and thresholds in a perceptron model. It revisits a movie example with a list of m movies, each associated with a class indicating whether the movie is liked or not. The data includes n different variables such as ratings on a scale from 0 to 1, which are real numbers, not just Boolean. The goal is for the perceptron to learn from this data and predict the correct label (1 for liked, 0 for not liked) for any given movie. The learning problem involves adjusting the weights to separate positive (liked) from negative (not liked) examples. The Perceptron Learning Algorithm is presented, which starts with random weight initialization and iteratively adjusts the weights based on the error made on the training data, aiming for convergence when no more errors are made.
📚 Geometric Interpretation of Perceptron Learning
The second paragraph delves into the geometric interpretation of the Perceptron Learning Algorithm. It discusses the decision boundary represented by the equation w^T x = 0, which divides the input space into two halves. Points on this line satisfy the condition of being orthogonal to the weight vector w, forming a 90-degree angle. The paragraph explains that points in the positive half-space (labeled 1) have an angle less than 90 degrees with w, while points in the negative half-space (labeled 0) have an angle greater than 90 degrees. The algorithm's adjustments to the weight vector aim to correct these angles to ensure proper classification. The explanation uses the dot product and the cosine of the angle between vectors to illustrate the relationship between the weight vector and the data points.
🔄 Perceptron Algorithm Execution and Convergence
The final paragraph describes the execution of the Perceptron Learning Algorithm on a toy data set. It explains the process of randomly picking data points and adjusting the weight vector w based on whether the current classification is incorrect. The paragraph illustrates how the algorithm iteratively corrects the angle between the weight vector and the data points to ensure that positive points have an angle less than 90 degrees and negative points have an angle greater than 90 degrees. The convergence of the algorithm is discussed, noting that it will stop when no further corrections are needed, and all data points are correctly classified. However, the paragraph also raises a potential issue with the algorithm never converging in some cases, which is attributed to cyclic adjustments that do not lead to a stable solution.
Mindmap
Keywords
💡Perceptron Learning Algorithm
💡Weights
💡Threshold
💡Convergence
💡Positive and Negative Inputs
💡Dot Product
💡Decision Boundary
💡Orthogonal
💡Half Space
💡Linearly Separable
Highlights
Introduction to the Perceptron Learning Algorithm.
Revisiting the movie example with a more complex dataset involving real-valued ratings.
Explanation of the perceptron's goal to match labels based on past movie preferences.
Description of the perceptron learning weights and threshold from data.
Algorithm's initialization of weights randomly without prior knowledge.
Definition of convergence in the context of the perceptron learning algorithm.
Algorithm's step-by-step process for adjusting weights based on positive and negative inputs.
Geometric interpretation of the perceptron's decision boundary and its relation to angles.
Explanation of the dot product and its role in determining the angle between vectors.
Clarification on why the perceptron algorithm adjusts weights to reduce angles to less than 90 degrees.
Illustration of the perceptron algorithm's convergence through a toy data set example.
Demonstration of the perceptron's iterative process to correct misclassified points.
Discussion on the perceptron's limitation and conditions for convergence.
Assumption of linear separability as a prerequisite for the perceptron algorithm's convergence.
Highlight of the perceptron's inability to converge in non-linearly separable cases.
Final summary of the perceptron learning algorithm's process and its geometric intuition.
Transcripts
We will now go to the next module which is the Perceptron Learning Algorithm.
We now see a more principled approach of learning these weights and threshold, but before that
we will just again revisit our movie example and make it slightly more complicated.
Now, here what the situation is that we are given a list of m movies and a class associated
with each movie indicating whether we like the movie or not, right.
So, now we have given some data of the past m movies that we have seen and whether we
like this movie or not and now instead of these three variables, we have these n different
variables based on which we are making decisions.
And notice that some of these variables are real, right.
They are not Boolean anymore, right.
The rating could be any real number between 0 to 1, ok and now based on this data what
do we want is the perceptron to do actually.
So, I have given you some data.
These factors I have also given you the label 1 and 0.
So, if the perceptron if I tell you my perceptron has now learnt properly, what would you expected
it to do; perfect match, right.
So, whenever I feed it, one of these movies it should give me the same label as was there
in my data, right and again there are some movies for which I have a label 1 which are
positive and some movies which I have a label 0.
So, I am once again looking to separate the positives from the negatives, right.
So, it should adjust the weights in such a way that I should be able to separate, right.
So, that is the learning problem that we are interested in, ok.
So, now with that I will give you the algorithm, ok.
This is the Perceptron Learning Algorithm.
We have certain positive inputs which had the label 1, we have certain negative inputs
which had the label 0 and now I don't know what the weights are and I have no prior knowledge
of what the weights are going to be.
I need to learn them from the data.
So, what I am going to do is, I am just going to initialize these weights randomly as I
am also going to pick up some random values for this.
So, this should be small n.
So, this should be small n and now, here is the algorithm while not convergence do something.
So, before I tell you what to do, can you tell me what is meant by convergence?
When will you say that it has converged?
When it is not making any more errors on the training data, right, or its predictions are
not changing on the training data.
So, that is the definition of convergence, ok.
Now, here is the algorithm.
I pick up a random from point from my data which could either be positive or negative,
right.
So, it comes from the union of positive negative.
Basically all the data that I have I pick up a random point from there.
If the point is positive, right and this is the condition which happens what does this
tell me?
If the point was positive, what did I actually want greater than 0, but the condition is
less than 0.
That means, I have made an error.
So, I have made an error, then I will just add x to w I see a lot of thoughtful nodding
and I hope you are understanding what is happening.
Let us see.
So, what is w actually?
a dimensional.
N dimension.
N dimension n plus 1, right because w naught is also inside there.
So, actually there should be w naught also here, right and what is x again n dimensional,
right and that is why this addition is valid.
So, let us understand that w and x both are n dimensional, ok.
Now, let us look at the other if condition.
Can you guess what the other if condition is if x belongs to n and.
Summation.
Summation is greater than equal to 0, then?
So, that means you have completely understood how this algorithm works.
Well, that is right.
So, now consider two vectors w and x.
So, remember what we are trying to prove is or get an intuition.
Not prove, actually get an intuition for why this works.
ok
So, we will consider two vectors w and x and this is what my vectors look like very similar
to the case that we are considering w0 to wn and 1 to n.
So, this again x naught is just 1, right.
ok Now, this condition that I have been talking
about is nothing, but the dot product.
How many of you have gone through the prerequisites for today's lecture?
Ok good.
So, it is just a dot product, ok.
Now, we can just read write the perceptron rule as this instead of the dot product.
I mean instead of using that summation thing, we can just say that it is a dot product.
ok Now, we are interested in finding the line
w transpose x equal to 0.
So, that is our decision boundary, right? which divides the input into two halves, ok.
Now, every point on this line satisfies the equation w transpose x equal to 0.
What does that mean actually?
So, just a simple example is that if I have the line x1 plus x2 equal to 0, then all the
points which lie on the line satisfy this equation, right.
So, you could have 1 minus 1, 2 minus 2 and so on, but 2, 2 is cannot be a point on this
line.
At every point lying on this line satisfies this equation.
So, every point lying on this line actually satisfies the equation w transpose x equal
to 0.
So, can you tell me what is the angle between w and any point on this line?
How many say how many of you say perpendicular?
Ok.
Why?
Dot product is 0, right.
So, if the dot product is 0, they are orthogonal right.
So, that means if I take this line, then my vector w is orthogonal to this.
It is orthogonal to this point or this point to this point to every point on the line which
is just the same as saying that the vector is perpendicular to the line itself, right
as simple as that.
So, the angle is 90 degrees because the dot product gives you the cos alpha and that is
0, right and since it is perpendicular as I said to every point of the line it is just
perpendicular to the line itself, right.
So, this is what the geometric interpretation looks like.
This is our decision boundary w transpose x and the vector w is actually orthogonal
to this line and that is exactly the intuition that we have built so far.
ok Now, let us consider some points which are
supposed to lie in the positive half space of this line, right.
That means these are the points for which the output is actually 1, ok.
Now, can you tell me what is the angle between any of these points and w or you guys are
actually trying to tell me the angle we have got some measuring stuff, no.
So, I will give you three options i.e. equal to 90, greater than 90 and less than 90.
Less than 90.
Less than 90 it is obvious from the figure, right.
Now, if I take any point which lies in the negative half space, what is the angle going
to be between them?
It is greater than 90, right.
Again obvious and it also follows from the fact that cos alpha is w transpose x by something
and we know that for the positive points w transpose x is greater than equal to 0, right.
That means, cos alpha would be greater than equal to 0, that means the angle alpha would
be less than 90 degrees and for the negative points w transpose x is actually less than
0.
That means, cos alpha would be less than 0 that means alpha would be greater than 90
degrees, right.
So, it actually follows from the formula itself, but it is also clear from the figure.
So, keeping this picture in mind let us revisit the algorithm, ok.
So, this is the algorithm, ok.
Now, let us look at the first condition which was this.
Now, if x belongs to p and w transpose x is less than 0, then means that the angle between
x and the current w is actually greater than 90 degrees, but what do we want it to be?
Less than 90 degrees and our solution to do this is, but we still do not know why this
works?
Now, anyone knows why this works?
So, let us see why this works.
So, what is the new cos alpha going to be?
It is going to be proportional to this, right.
It is going to be proportional to this.
I will just substitute what w new is, fine.
That means, if cos alpha new is going to be greater than cos alpha, what is alpha new
going to be?
It will be less than and that is exactly what we wanted.
This angle was actually greater than 90 degrees.
So, you want to slowly move it such that it becomes less than 90 degrees.
It is not going to get solved in one iteration and that is why till convergence.
So, we will keep doing this.
I will keep picking xs again and again till it reaches convergence.
That means, till we are satisfied with that condition, right.
Let us look at the other condition x belongs to n and w transpose x was greater than equal
to 0, then it means that the angle alpha is actually less than 90 degrees and we want
it to be the opposite, ok.
I will just quickly skim over this w minus this x, right.
Ok I forgot to mention that this is actually a positive quantity, right.
I mean that is why that result holds, ok.
That means, cos alpha new is going to be less than cos alpha and this slight bit of mathematical
in correctness, I am doing here, but that does not affect the final result.
So, I will just gloss over that and you can go home and figure it out, but still it does
not take away from the final intuition and interpretation, right.
So, now the new cos alpha is going to be less than the original cos alpha; that means, the
angle is going to be greater and that exactly what we wanted, ok.
So, we will now see this algorithm in action for a toy data set.
So, this is the toy data set we have and we have initialized w to a random value and that
turns out to be this, right.
I just picked up some random value for w and ended up with this particular configuration
for w. ok Now, we observe that currently w transpose
x is less than 0 for all the positive points and it is actually greater than equal to 0
for all the negative points.
If you do not understand w transpose x, it is just that the all the positive angle points
actually have a greater than 90 degree angle and all the negative points actually have
a less than 90 degree angle.
So, this is exactly opposite of the situation that we want and now from here on, we want
to actually run the perceptron algorithm, right and try to fix this w.
How does it work?
Remember we randomly pick a point, ok.
So, say we pick the point p1, do we need to apply a correction?
Yes.
Yes.
Why, because it is a positive point and the condition is violated, right.
So, now we add w equal to w plus x and we get this new w.
So, notice that we have a new w, ok.
We again repeat this, we again pick a new point and this time we have picked p2.
Do we need a correction?
Yes.
At least from the figure it looks like the angle is greater than 90, right.
So, we will again do a correction.
We will add w is equal to w plus p, right.
This x is actually, sorry p2 and this is where we end up, ok.
Now, again we pick a point randomly n1.
Do we need a correction?
So, this is what our w is.
This line here and n1, right.
So, we need a correction, ok.
Now, what is the correction going to be?
It will be minus and then, the w changes, ok.
Now, we pick another point n3.
Do we need a correction?
No at least on the figure it seems like the angle is greater than 90 and we continue this,
right.
For n2, we do not need a correction.
Now, for p3 again we do not need a correction.
The angle looks less than 90.
Sorry actually it is we need a correction.
The angle is slightly greater than 90 and this is our correction, and now we keep cycling.
Now, as I keep cycling over the points, I realize that I no longer need any correction.
It should be obvious from the figure that for this particular value of w, now all my
positive points are making an angle less than 90 and all my negative points are actually
making an angle greater than 90.
That means, by definition now my algorithm has converged, right.
So, I can just stop it.
So, I can just make one pass over the data.
If nothing changes, I will just say it has converged, ok.
Now, does anyone see a problem with this?
Never converge.
It will never converge in some cases.
So, can someone tell me why?
We are considering only cases where the data is linearly separable, right.
That we already assumed.
So, what you are trying to tell me is that you are going over these points cyclically.
So, let me just rephrase and put words in your mouth that what you are trying to tell
me actually is that I take a point, I adjust w, but now for the next point I maybe go back
to the same w because that point asked me to move it again and I keep doing this again
and again and basically, end up nowhere, right.
That is why this will never converge.
That is exactly what you are trying to tell me, right.
Now, that is exactly what I am forcing you to tell me.
So, that is not the case, right.
This algorithm will converge.
Ver Más Videos Relacionados
Perceptron Learning Algorithm
Perceptron Training
Unit 1.4 | The First Machine Learning Classifier | Part 2 | Making Predictions
11. Implement AND function using perceptron networks for bipolar inputs and targets by Mahesh Huddar
Lecture 3.4 | KNN Algorithm In Machine Learning | K Nearest Neighbor | Classification | #mlt #knn
KMeans
5.0 / 5 (0 votes)