Deep Learning(CS7015): Lec 2.5 Perceptron Learning Algorithm

NPTEL-NOC IITM
23 Oct 201813:35

Summary

TLDRThe video script discusses the Perceptron Learning Algorithm, a foundational machine learning technique for binary classification. It revisits a movie rating example to illustrate the algorithm's process of learning weights and threshold from data. The script explains the concept of convergence and the perceptron's goal of correctly classifying data points as positive or negative. It delves into the algorithm's steps, including initializing weights randomly and adjusting them based on errors made during classification. The geometric interpretation of the perceptron's decision boundary and the importance of the dot product in determining the angle between data points and the weight vector are highlighted. The script also addresses potential issues with convergence and the assumption of linearly separable data.

Takeaways

  • 📚 The lecture introduces the Perceptron Learning Algorithm, which is a method for learning weights and thresholds in a perceptron model.
  • 🎬 The example of movie preferences is used to illustrate the learning process, where past movie data is used to predict whether a new movie will be liked or not.
  • 🔢 The data includes various features (variables) that are not just Boolean but can be real numbers, such as a rating between 0 and 1.
  • 🤖 The goal of the perceptron is to learn from the data and correctly classify new movies as liked (label 1) or not liked (label 0).
  • 🧠 The Perceptron Learning Algorithm starts with random initialization of weights and adjusts them based on the training data to minimize errors.
  • 🔄 Convergence in the algorithm is defined as the point when no more errors are made on the training data and predictions do not change.
  • 📉 The algorithm involves picking a random data point and adjusting the weights if the prediction is incorrect (error is made).
  • 📚 The adjustment of weights involves adding the data point vector to the weight vector if the point is misclassified.
  • 📏 The decision boundary is represented by the equation w transpose x = 0, which divides the input space into two halves.
  • 📐 The geometric interpretation of the algorithm is that the weight vector w is orthogonal to the decision boundary line.
  • 🔍 The algorithm's effectiveness is based on the dot product and the angles between the weight vector and the data points, with adjustments made to ensure correct classification.

Q & A

  • What is the Perceptron Learning Algorithm?

    -The Perceptron Learning Algorithm is a method used to train a perceptron, which is a type of linear classifier. It adjusts the weights of the perceptron iteratively to correctly classify the given data points into two categories, typically represented as positive and negative labels.

  • What is the purpose of revisiting the movie example in the script?

    -The movie example is used to illustrate a more complex scenario where the perceptron has to learn from data that includes both binary and real-valued variables, such as ratings, to classify whether a movie is liked or not.

  • What does the perceptron model try to achieve with the given data?

    -The perceptron model aims to learn the appropriate weights and threshold such that it can correctly classify new, unseen data points based on the training data provided.

  • How are the weights in the perceptron initialized in the learning algorithm?

    -The weights in the perceptron are initialized randomly at the beginning of the learning algorithm. This includes both the weights for the input features and the bias term.

  • What is meant by 'convergence' in the context of the Perceptron Learning Algorithm?

    -Convergence in this context means that the perceptron has learned the weights to an extent where it no longer makes any errors on the training data, or its predictions on the training data do not change with further iterations.

  • How does the Perceptron Learning Algorithm handle misclassified points?

    -If a point is misclassified, i.e., the dot product of the weight vector and the input vector is of the wrong sign, the algorithm adjusts the weight vector by adding the input vector to it, effectively correcting the misclassification.

  • What is the geometric interpretation of the Perceptron Learning Algorithm?

    -Geometrically, the perceptron is trying to find a decision boundary that separates the positive and negative points. This boundary is represented by the line where the dot product of the weight vector and any point on the line equals zero, indicating orthogonality.

  • Why is the dot product used in the Perceptron Learning Algorithm?

    -The dot product is used because it provides a measure of the similarity between two vectors. In the context of the perceptron, it helps determine the classification of a point and the angle between the weight vector and the input vectors.

  • What is the relationship between the angle formed by the weight vector and the data points, and their classification?

    -Data points classified as positive have an angle with the weight vector that is less than 90 degrees, indicating a positive dot product. Conversely, data points classified as negative have an angle greater than 90 degrees, indicating a negative dot product.

  • Why might the Perceptron Learning Algorithm not converge in some cases?

    -The algorithm might not converge if the data is not linearly separable, meaning there is no single hyperplane that can separate all positive and negative points without error.

  • What is the final step in the Perceptron Learning Algorithm?

    -The final step is to make a pass over the data and check if any further corrections are needed. If no corrections are made, indicating that all points are correctly classified, the algorithm has converged and can be stopped.

Outlines

00:00

🧠 Introduction to Perceptron Learning Algorithm

This paragraph introduces the Perceptron Learning Algorithm, a method for learning weights and thresholds in a perceptron model. It revisits a movie example with a list of m movies, each associated with a class indicating whether the movie is liked or not. The data includes n different variables such as ratings on a scale from 0 to 1, which are real numbers, not just Boolean. The goal is for the perceptron to learn from this data and predict the correct label (1 for liked, 0 for not liked) for any given movie. The learning problem involves adjusting the weights to separate positive (liked) from negative (not liked) examples. The Perceptron Learning Algorithm is presented, which starts with random weight initialization and iteratively adjusts the weights based on the error made on the training data, aiming for convergence when no more errors are made.

05:04

📚 Geometric Interpretation of Perceptron Learning

The second paragraph delves into the geometric interpretation of the Perceptron Learning Algorithm. It discusses the decision boundary represented by the equation w^T x = 0, which divides the input space into two halves. Points on this line satisfy the condition of being orthogonal to the weight vector w, forming a 90-degree angle. The paragraph explains that points in the positive half-space (labeled 1) have an angle less than 90 degrees with w, while points in the negative half-space (labeled 0) have an angle greater than 90 degrees. The algorithm's adjustments to the weight vector aim to correct these angles to ensure proper classification. The explanation uses the dot product and the cosine of the angle between vectors to illustrate the relationship between the weight vector and the data points.

10:08

🔄 Perceptron Algorithm Execution and Convergence

The final paragraph describes the execution of the Perceptron Learning Algorithm on a toy data set. It explains the process of randomly picking data points and adjusting the weight vector w based on whether the current classification is incorrect. The paragraph illustrates how the algorithm iteratively corrects the angle between the weight vector and the data points to ensure that positive points have an angle less than 90 degrees and negative points have an angle greater than 90 degrees. The convergence of the algorithm is discussed, noting that it will stop when no further corrections are needed, and all data points are correctly classified. However, the paragraph also raises a potential issue with the algorithm never converging in some cases, which is attributed to cyclic adjustments that do not lead to a stable solution.

Mindmap

Keywords

💡Perceptron Learning Algorithm

The Perceptron Learning Algorithm is a foundational method in machine learning for training a perceptron model, which is a type of linear classifier. It involves adjusting the weights of the model iteratively to correctly classify data points. In the video's context, the algorithm is used to learn from a set of movies and their associated preferences, with the goal of separating liked (positive) from disliked (negative) movies. The script describes the algorithm's process, including initializing weights randomly and updating them based on whether the current classification is correct.

💡Weights

In the context of machine learning, 'weights' refer to the parameters of a model that are adjusted during the learning process to improve its performance. In the perceptron model, weights are associated with each feature of the input data and are used to compute the output, which in this case is a binary classification indicating whether a movie is liked or not. The script discusses how these weights are initially set randomly and then iteratively refined through the learning algorithm.

💡Threshold

The 'threshold' in the perceptron model is a value that helps determine the classification of the input data. It is used in conjunction with the weighted sum of the inputs to decide whether the output should be classified as positive or negative. The script does not explicitly define the threshold but implies its role in the decision-making process of the perceptron.

💡Convergence

In machine learning, 'convergence' refers to the state where the learning algorithm has made sufficient progress and no longer needs to make adjustments to the model's parameters. It indicates that the model has learned to classify the training data correctly, and further iterations will not improve its performance. The script explains that the perceptron learning algorithm continues to update weights until it reaches a state of convergence, meaning it no longer makes errors on the training data.

💡Positive and Negative Inputs

In the script, 'positive inputs' and 'negative inputs' refer to the labeled data used to train the perceptron. Positive inputs are those associated with a label of '1', indicating that the movie is liked, while negative inputs have a label of '0', indicating a disliked movie. The perceptron learning algorithm uses these labeled examples to adjust the weights such that it can correctly classify new, unseen data.

💡Dot Product

The 'dot product' is a mathematical operation that takes two vectors and returns a single number, which is the sum of the products of their corresponding entries. In the context of the perceptron, the dot product of the weight vector and the input vector is used to calculate the decision value that determines the output classification. The script explains that the perceptron rule can be expressed in terms of the dot product, which is central to understanding how the algorithm makes decisions.

💡Decision Boundary

The 'decision boundary' is a concept in machine learning that defines the line or hyperplane that separates different classes in the feature space. For the perceptron, this boundary is the set of points for which the dot product of the weight vector and the input vector equals zero. The script discusses the geometric interpretation of the decision boundary and how it divides the input space into positive and negative classes.

💡Orthogonal

In geometry, 'orthogonal' means perpendicular at a right angle. In the context of the script, the weight vector is said to be orthogonal to the decision boundary. This is because the dot product, which includes the weight vector, equals zero for points on the decision boundary, indicating a right angle (90 degrees) between them. The script uses the concept of orthogonality to explain the geometric relationship between the weight vector and the decision boundary.

💡Half Space

In the context of the perceptron and decision boundaries, a 'half space' refers to one of the two regions divided by the boundary. Points on one side of the boundary have one classification (positive or negative), while points on the other side have the opposite classification. The script mentions the positive half space, which contains points that, when classified by the perceptron, result in a positive label.

💡Linearly Separable

Data is considered 'linearly separable' if it can be divided into different classes by a single straight line (in two dimensions) or hyperplane (in higher dimensions). The script implies that the perceptron learning algorithm works under the assumption that the data is linearly separable, meaning that there exists a decision boundary that can perfectly separate positive from negative examples.

Highlights

Introduction to the Perceptron Learning Algorithm.

Revisiting the movie example with a more complex dataset involving real-valued ratings.

Explanation of the perceptron's goal to match labels based on past movie preferences.

Description of the perceptron learning weights and threshold from data.

Algorithm's initialization of weights randomly without prior knowledge.

Definition of convergence in the context of the perceptron learning algorithm.

Algorithm's step-by-step process for adjusting weights based on positive and negative inputs.

Geometric interpretation of the perceptron's decision boundary and its relation to angles.

Explanation of the dot product and its role in determining the angle between vectors.

Clarification on why the perceptron algorithm adjusts weights to reduce angles to less than 90 degrees.

Illustration of the perceptron algorithm's convergence through a toy data set example.

Demonstration of the perceptron's iterative process to correct misclassified points.

Discussion on the perceptron's limitation and conditions for convergence.

Assumption of linear separability as a prerequisite for the perceptron algorithm's convergence.

Highlight of the perceptron's inability to converge in non-linearly separable cases.

Final summary of the perceptron learning algorithm's process and its geometric intuition.

Transcripts

play00:13

We will now go to the next module which is the Perceptron Learning Algorithm.

play00:17

We now see a more principled approach of learning these weights and threshold, but before that

play00:23

we will just again revisit our movie example and make it slightly more complicated.

play00:27

Now, here what the situation is that we are given a list of m movies and a class associated

play00:34

with each movie indicating whether we like the movie or not, right.

play00:37

So, now we have given some data of the past m movies that we have seen and whether we

play00:41

like this movie or not and now instead of these three variables, we have these n different

play00:45

variables based on which we are making decisions.

play00:48

And notice that some of these variables are real, right.

play00:51

They are not Boolean anymore, right.

play00:53

The rating could be any real number between 0 to 1, ok and now based on this data what

play00:59

do we want is the perceptron to do actually.

play01:01

So, I have given you some data.

play01:03

These factors I have also given you the label 1 and 0.

play01:06

So, if the perceptron if I tell you my perceptron has now learnt properly, what would you expected

play01:11

it to do; perfect match, right.

play01:12

So, whenever I feed it, one of these movies it should give me the same label as was there

play01:17

in my data, right and again there are some movies for which I have a label 1 which are

play01:21

positive and some movies which I have a label 0.

play01:24

So, I am once again looking to separate the positives from the negatives, right.

play01:27

So, it should adjust the weights in such a way that I should be able to separate, right.

play01:31

So, that is the learning problem that we are interested in, ok.

play01:33

So, now with that I will give you the algorithm, ok.

play01:36

This is the Perceptron Learning Algorithm.

play01:38

We have certain positive inputs which had the label 1, we have certain negative inputs

play01:44

which had the label 0 and now I don't know what the weights are and I have no prior knowledge

play01:50

of what the weights are going to be.

play01:51

I need to learn them from the data.

play01:52

So, what I am going to do is, I am just going to initialize these weights randomly as I

play01:56

am also going to pick up some random values for this.

play01:59

So, this should be small n.

play02:03

So, this should be small n and now, here is the algorithm while not convergence do something.

play02:09

So, before I tell you what to do, can you tell me what is meant by convergence?

play02:14

When will you say that it has converged?

play02:18

When it is not making any more errors on the training data, right, or its predictions are

play02:23

not changing on the training data.

play02:25

So, that is the definition of convergence, ok.

play02:27

Now, here is the algorithm.

play02:29

I pick up a random from point from my data which could either be positive or negative,

play02:33

right.

play02:34

So, it comes from the union of positive negative.

play02:35

Basically all the data that I have I pick up a random point from there.

play02:39

If the point is positive, right and this is the condition which happens what does this

play02:47

tell me?

play02:49

If the point was positive, what did I actually want greater than 0, but the condition is

play02:54

less than 0.

play02:55

That means, I have made an error.

play02:56

So, I have made an error, then I will just add x to w I see a lot of thoughtful nodding

play03:05

and I hope you are understanding what is happening.

play03:07

Let us see.

play03:08

So, what is w actually?

play03:09

a dimensional.

play03:10

N dimension.

play03:11

N dimension n plus 1, right because w naught is also inside there.

play03:15

So, actually there should be w naught also here, right and what is x again n dimensional,

play03:24

right and that is why this addition is valid.

play03:26

So, let us understand that w and x both are n dimensional, ok.

play03:29

Now, let us look at the other if condition.

play03:31

Can you guess what the other if condition is if x belongs to n and.

play03:36

Summation.

play03:37

Summation is greater than equal to 0, then?

play03:40

So, that means you have completely understood how this algorithm works.

play03:45

Well, that is right.

play03:46

So, now consider two vectors w and x.

play03:49

So, remember what we are trying to prove is or get an intuition.

play03:51

Not prove, actually get an intuition for why this works.

play03:54

ok

play03:55

So, we will consider two vectors w and x and this is what my vectors look like very similar

play03:58

to the case that we are considering w0 to wn and 1 to n.

play04:02

So, this again x naught is just 1, right.

play04:06

ok Now, this condition that I have been talking

play04:09

about is nothing, but the dot product.

play04:13

How many of you have gone through the prerequisites for today's lecture?

play04:16

Ok good.

play04:17

So, it is just a dot product, ok.

play04:21

Now, we can just read write the perceptron rule as this instead of the dot product.

play04:26

I mean instead of using that summation thing, we can just say that it is a dot product.

play04:30

ok Now, we are interested in finding the line

play04:34

w transpose x equal to 0.

play04:35

So, that is our decision boundary, right? which divides the input into two halves, ok.

play04:40

Now, every point on this line satisfies the equation w transpose x equal to 0.

play04:45

What does that mean actually?

play04:48

So, just a simple example is that if I have the line x1 plus x2 equal to 0, then all the

play04:56

points which lie on the line satisfy this equation, right.

play04:58

So, you could have 1 minus 1, 2 minus 2 and so on, but 2, 2 is cannot be a point on this

play05:04

line.

play05:05

At every point lying on this line satisfies this equation.

play05:07

So, every point lying on this line actually satisfies the equation w transpose x equal

play05:11

to 0.

play05:13

So, can you tell me what is the angle between w and any point on this line?

play05:18

How many say how many of you say perpendicular?

play05:23

Ok.

play05:24

Why?

play05:26

Dot product is 0, right.

play05:27

So, if the dot product is 0, they are orthogonal right.

play05:30

So, that means if I take this line, then my vector w is orthogonal to this.

play05:35

It is orthogonal to this point or this point to this point to every point on the line which

play05:40

is just the same as saying that the vector is perpendicular to the line itself, right

play05:44

as simple as that.

play05:45

So, the angle is 90 degrees because the dot product gives you the cos alpha and that is

play05:51

0, right and since it is perpendicular as I said to every point of the line it is just

play05:56

perpendicular to the line itself, right.

play05:57

So, this is what the geometric interpretation looks like.

play06:00

This is our decision boundary w transpose x and the vector w is actually orthogonal

play06:07

to this line and that is exactly the intuition that we have built so far.

play06:11

ok Now, let us consider some points which are

play06:15

supposed to lie in the positive half space of this line, right.

play06:18

That means these are the points for which the output is actually 1, ok.

play06:23

Now, can you tell me what is the angle between any of these points and w or you guys are

play06:31

actually trying to tell me the angle we have got some measuring stuff, no.

play06:35

So, I will give you three options i.e. equal to 90, greater than 90 and less than 90.

play06:41

Less than 90.

play06:42

Less than 90 it is obvious from the figure, right.

play06:43

Now, if I take any point which lies in the negative half space, what is the angle going

play06:51

to be between them?

play06:52

It is greater than 90, right.

play06:54

Again obvious and it also follows from the fact that cos alpha is w transpose x by something

play07:02

and we know that for the positive points w transpose x is greater than equal to 0, right.

play07:10

That means, cos alpha would be greater than equal to 0, that means the angle alpha would

play07:14

be less than 90 degrees and for the negative points w transpose x is actually less than

play07:19

0.

play07:20

That means, cos alpha would be less than 0 that means alpha would be greater than 90

play07:23

degrees, right.

play07:24

So, it actually follows from the formula itself, but it is also clear from the figure.

play07:27

So, keeping this picture in mind let us revisit the algorithm, ok.

play07:32

So, this is the algorithm, ok.

play07:35

Now, let us look at the first condition which was this.

play07:39

Now, if x belongs to p and w transpose x is less than 0, then means that the angle between

play07:47

x and the current w is actually greater than 90 degrees, but what do we want it to be?

play07:52

Less than 90 degrees and our solution to do this is, but we still do not know why this

play07:58

works?

play07:59

Now, anyone knows why this works?

play08:00

So, let us see why this works.

play08:03

So, what is the new cos alpha going to be?

play08:06

It is going to be proportional to this, right.

play08:09

It is going to be proportional to this.

play08:11

I will just substitute what w new is, fine.

play08:17

That means, if cos alpha new is going to be greater than cos alpha, what is alpha new

play08:23

going to be?

play08:24

It will be less than and that is exactly what we wanted.

play08:27

This angle was actually greater than 90 degrees.

play08:29

So, you want to slowly move it such that it becomes less than 90 degrees.

play08:33

It is not going to get solved in one iteration and that is why till convergence.

play08:36

So, we will keep doing this.

play08:37

I will keep picking xs again and again till it reaches convergence.

play08:41

That means, till we are satisfied with that condition, right.

play08:44

Let us look at the other condition x belongs to n and w transpose x was greater than equal

play08:50

to 0, then it means that the angle alpha is actually less than 90 degrees and we want

play08:57

it to be the opposite, ok.

play08:58

I will just quickly skim over this w minus this x, right.

play09:04

Ok I forgot to mention that this is actually a positive quantity, right.

play09:07

I mean that is why that result holds, ok.

play09:10

That means, cos alpha new is going to be less than cos alpha and this slight bit of mathematical

play09:17

in correctness, I am doing here, but that does not affect the final result.

play09:22

So, I will just gloss over that and you can go home and figure it out, but still it does

play09:27

not take away from the final intuition and interpretation, right.

play09:30

So, now the new cos alpha is going to be less than the original cos alpha; that means, the

play09:35

angle is going to be greater and that exactly what we wanted, ok.

play09:39

So, we will now see this algorithm in action for a toy data set.

play09:43

So, this is the toy data set we have and we have initialized w to a random value and that

play09:49

turns out to be this, right.

play09:50

I just picked up some random value for w and ended up with this particular configuration

play09:55

for w. ok Now, we observe that currently w transpose

play10:02

x is less than 0 for all the positive points and it is actually greater than equal to 0

play10:08

for all the negative points.

play10:09

If you do not understand w transpose x, it is just that the all the positive angle points

play10:13

actually have a greater than 90 degree angle and all the negative points actually have

play10:17

a less than 90 degree angle.

play10:18

So, this is exactly opposite of the situation that we want and now from here on, we want

play10:23

to actually run the perceptron algorithm, right and try to fix this w.

play10:28

How does it work?

play10:30

Remember we randomly pick a point, ok.

play10:32

So, say we pick the point p1, do we need to apply a correction?

play10:37

Yes.

play10:38

Yes.

play10:39

Why, because it is a positive point and the condition is violated, right.

play10:42

So, now we add w equal to w plus x and we get this new w.

play10:48

So, notice that we have a new w, ok.

play10:50

We again repeat this, we again pick a new point and this time we have picked p2.

play10:56

Do we need a correction?

play10:57

Yes.

play10:58

At least from the figure it looks like the angle is greater than 90, right.

play11:00

So, we will again do a correction.

play11:01

We will add w is equal to w plus p, right.

play11:04

This x is actually, sorry p2 and this is where we end up, ok.

play11:10

Now, again we pick a point randomly n1.

play11:14

Do we need a correction?

play11:16

So, this is what our w is.

play11:18

This line here and n1, right.

play11:21

So, we need a correction, ok.

play11:24

Now, what is the correction going to be?

play11:25

It will be minus and then, the w changes, ok.

play11:30

Now, we pick another point n3.

play11:33

Do we need a correction?

play11:34

No at least on the figure it seems like the angle is greater than 90 and we continue this,

play11:39

right.

play11:40

For n2, we do not need a correction.

play11:42

Now, for p3 again we do not need a correction.

play11:44

The angle looks less than 90.

play11:47

Sorry actually it is we need a correction.

play11:48

The angle is slightly greater than 90 and this is our correction, and now we keep cycling.

play11:53

Now, as I keep cycling over the points, I realize that I no longer need any correction.

play11:57

It should be obvious from the figure that for this particular value of w, now all my

play12:02

positive points are making an angle less than 90 and all my negative points are actually

play12:06

making an angle greater than 90.

play12:08

That means, by definition now my algorithm has converged, right.

play12:12

So, I can just stop it.

play12:14

So, I can just make one pass over the data.

play12:15

If nothing changes, I will just say it has converged, ok.

play12:19

Now, does anyone see a problem with this?

play12:23

Never converge.

play12:24

It will never converge in some cases.

play12:25

So, can someone tell me why?

play12:26

We are considering only cases where the data is linearly separable, right.

play12:31

That we already assumed.

play12:33

So, what you are trying to tell me is that you are going over these points cyclically.

play12:36

So, let me just rephrase and put words in your mouth that what you are trying to tell

play12:40

me actually is that I take a point, I adjust w, but now for the next point I maybe go back

play12:47

to the same w because that point asked me to move it again and I keep doing this again

play12:50

and again and basically, end up nowhere, right.

play12:53

That is why this will never converge.

play12:55

That is exactly what you are trying to tell me, right.

play12:57

Now, that is exactly what I am forcing you to tell me.

play13:00

So, that is not the case, right.

play13:02

This algorithm will converge.

Rate This

5.0 / 5 (0 votes)

相关标签
Machine LearningPerceptron AlgorithmData SeparationMovie RatingWeight AdjustmentLearning ProcessData AnalysisNeural NetworksAlgorithm TutorialLinear Separability
您是否需要英文摘要?