Characteristics of Lasso regression

IIT Madras - B.S. Degree Programme
6 Oct 202215:28

Summary

TLDRThe video script discusses the advantages of Ridge regression over Lasso, despite Lasso's tendency to produce sparse solutions. It highlights the lack of a closed-form solution for Lasso due to its non-differentiability at zero, necessitating iterative methods like subgradient descent. The script also touches on special techniques for solving Lasso, such as the Iteratively Reweighted Least Squares (IRLS), and concludes with an overview of linear regression, including least squares, stochastic gradient descent, and various regularizers like Ridge and Lasso, emphasizing their applications in machine learning.

Takeaways

  • πŸ“š The script discusses the comparison between Lasso and Ridge regression, highlighting the benefits and drawbacks of each method.
  • πŸ” Lasso does not have a closed-form solution due to its non-differentiability at zero, unlike Ridge regression which has a straightforward closed-form solution.
  • πŸ›  To solve Lasso, subgradient methods are used because of the non-differentiability issue; these methods work even when the function is not differentiable at all points.
  • πŸ“‰ The concept of subgradients is introduced as a way to approximate the direction of steepest descent for non-differentiable points in optimization problems.
  • πŸ“Œ The script provides an intuitive explanation of subgradients, showing how they can be used to approximate the gradient in non-differentiable regions.
  • πŸ”’ The absolute value function's subgradients are demonstrated, explaining how they can take any value between -1 and 1, representing different slopes that lower bound the function.
  • πŸ”„ The definition of a subgradient is given, emphasizing its role in linearizing a function at a point and ensuring the function's value is always above this linearization.
  • πŸ”§ Subgradient descent is presented as a useful algorithm for minimizing convex functions, even when they are not differentiable, by moving in the direction of the negative subgradient.
  • πŸ”‘ The relevance of subgradients to Lasso is highlighted, as Lasso is a convex optimization problem that can benefit from subgradient descent methods.
  • πŸ›‘ The script concludes that while Lasso provides sparse solutions, it lacks a closed-form solution and requires optimization techniques like subgradient descent or specialized methods like Iteratively Reweighted Least Squares (IRLS).
  • πŸš€ The summary of the course content on regression is provided, covering least squares, Ridge regression, Lasso, and the use of various regularizers in machine learning models.

Q & A

  • Why might someone choose Ridge regression over Lasso despite Lasso's ability to push some coefficients to zero?

    -Ridge regression has a closed-form solution, which makes it computationally simpler and faster compared to Lasso, which requires iterative methods like subgradient descent due to its non-differentiability at zero.

  • What is a closed-form solution and why is it beneficial?

    -A closed-form solution is an exact analytical expression that can be used to solve an equation or optimization problem. It is beneficial because it allows for direct computation of the solution without iterative methods, which can be more efficient and faster.

  • Why is Lasso not differentiable at zero and what are the implications for optimization?

    -Lasso is not differentiable at zero due to the L1 penalty term, which is an absolute value function that is not smooth at zero. This non-differentiability means that traditional gradient-based optimization methods cannot be directly applied to Lasso, necessitating the use of subgradient methods.

  • What are subgradient methods and how do they differ from gradient methods?

    -Subgradient methods are optimization techniques used for problems where the objective function is not differentiable at all points. They use subgradients, which are generalizations of gradients, to find a direction of descent even in non-differentiable regions. In contrast, gradient methods rely on the derivative of the function, which must exist at all points of interest.

  • Can you provide an example of a subgradient?

    -A subgradient for a piecewise linear function at a non-differentiable point can be any line that lies below the function and touches it at that point. For instance, in a function with a 'V' shape, multiple lines can serve as subgradients at the vertex, each representing a different direction in which the function is lower-bounded.

  • What is the definition of a subgradient in the context of convex optimization?

    -A subgradient of a function f at a point x is a vector g such that for all z, the function value f(z) is greater than or equal to f(x) + g^T (z - x). This means that the function lies above the linear approximation defined by the subgradient at x.

  • Why are subgradients useful in optimization, especially for non-differentiable functions?

    -Subgradients are useful because they allow for the optimization of convex functions that may not be differentiable. By moving in the direction of the negative subgradient, one can still converge to the minimum of the function, provided it is convex.

  • What is the relationship between the L1 penalty and subgradients?

    -The L1 penalty, which is the absolute value of a variable, is not differentiable at zero but has subgradients. At points other than zero, the subgradient is the sign of the variable, and at zero, any value between -1 and 1 can be a subgradient.

  • What are some alternative methods to solve the Lasso problem besides subgradient descent?

    -Besides subgradient descent, other methods include the Iteratively Reweighted Least Squares (IRLS) method, which leverages the structure of the Lasso problem by solving a series of weighted least squares problems.

  • What is the significance of the Lasso problem being a convex optimization problem?

    -The convexity of the Lasso problem ensures that any local minimum is also a global minimum. This property allows optimization algorithms like subgradient descent to find the optimal solution reliably, even though the problem may not have a closed-form solution.

  • Can you summarize the main differences between Ridge and Lasso regression in terms of their solutions and properties?

    -Ridge regression uses an L2 penalty and has a closed-form solution, making it computationally efficient. It shrinks coefficients towards zero but does not set them exactly to zero. Lasso regression, on the other hand, uses an L1 penalty and does not have a closed-form solution. It can result in sparse solutions with some coefficients exactly at zero, but requires iterative optimization methods.

Outlines

00:00

πŸ” Lasso vs Ridge Regression and Subgradient Methods

This paragraph discusses the comparison between Lasso and Ridge regression, highlighting that Lasso does not have a closed-form solution due to its non-differentiability at zero caused by the L1 penalty. It explains the use of subgradient methods to solve such optimization problems where traditional gradient descent isn't applicable. The explanation includes an intuitive description of subgradients, using a piecewise linear function as an example to demonstrate how subgradients can provide a direction for optimization even when the gradient isn't defined. The paragraph concludes by emphasizing the iterative nature of solving Lasso regression problems.

05:02

πŸ“‰ Understanding Subgradients in Optimization

The second paragraph delves deeper into the concept of subgradients, providing a formal definition and explaining their role in optimization, particularly for convex functions. It illustrates how subgradients can be used in place of gradients for optimization when the function isn't differentiable. The explanation includes a geometric interpretation of subgradients with respect to a one-dimensional absolute value function, showing multiple subgradients at points of non-differentiability. The paragraph also discusses the convergence properties of subgradient descent, a method applicable to convex functions, and its relevance to solving the Lasso problem, which is a convex optimization task.

10:05

πŸ›  Specialized Methods for Solving Lasso Regression

This paragraph summarizes the challenges and methods for solving the Lasso regression problem, emphasizing that there is no closed-form solution due to its L1 penalty. It mentions the iterative re-weighted least squares (IRLS) method as a specialized approach to tackle Lasso problems, leveraging the solvability of least squares as a 'black box'. The paragraph also touches on the possibility of using general optimization techniques like subgradient descent for Lasso, given its convexity, and acknowledges the existence of other specialized methods without going into detail. It concludes by contrasting the ease of solving Ridge regression with the more complex nature of Lasso.

15:05

🌟 Conclusion on Regression and Transition to Classification

The final paragraph wraps up the discussion on regression, providing a high-level summary of the concepts covered in the course, including least squares, stochastic gradient descent, maximum likelihood estimation, Bayesian linear regression, Ridge regression, and Lasso. It also mentions the existence of various regularizers and the possibility of incorporating domain-specific constraints into regularization. The paragraph concludes by signaling the transition to the next part of the course, which will focus on classification within the realm of supervised learning.

Mindmap

Keywords

πŸ’‘Lasso

Lasso, short for Least Absolute Shrinkage and Selection Operator, is a type of linear regression that includes an L1 penalty term. It is known for its ability to produce sparse models, where some coefficients can be exactly zero, which helps in feature selection. In the video, Lasso is compared with Ridge regression, and its non-differentiability at zero is discussed, which makes it challenging to solve with standard gradient descent methods.

πŸ’‘Ridge Regression

Ridge Regression is a linear regression technique that includes an L2 penalty term. It is used to analyze multiple regression data that has multicollinearity. Unlike Lasso, Ridge Regression does not produce sparse solutions; instead, it shrinks the coefficients towards zero but does not set them to zero. The script mentions that Ridge Regression has a closed-form solution, which is a significant advantage over Lasso.

πŸ’‘Closed-form Solution

A closed-form solution refers to a mathematical expression that can be used to solve an equation exactly, without the need for iterative methods. In the context of the video, it is mentioned that Ridge Regression has a closed-form solution, while Lasso does not, due to its non-differentiable penalty term at zero.

πŸ’‘Subgradient Methods

Subgradient methods are optimization techniques used for problems where the objective function is not differentiable at certain points. The video explains that since the Lasso penalty term is not differentiable at zero, subgradient methods are used to solve the optimization problem. These methods use subgradients, which are generalizations of gradients, to find the minimum of a function.

πŸ’‘Gradient Descent

Gradient Descent is a first-order iterative optimization algorithm used to find the minimum of a function. In the video, it is mentioned in contrast to subgradient methods, as it cannot be used for Lasso due to its non-differentiability. However, it is applicable to Ridge Regression, which has a closed-form solution.

πŸ’‘Convex Function

A convex function is a function whose second derivative is non-negative for all points in its domain. In the video, it is noted that the Lasso problem is a convex optimization problem because the sum of a quadratic loss function and an L1 norm (which is convex) results in a convex function. This property is important because it guarantees that any local minimum is also a global minimum.

πŸ’‘L1 Norm

The L1 norm, also known as the Manhattan norm, is a measure of the length of a vector. It is used in Lasso regression as a penalty term to encourage sparsity in the solution. The video discusses how the L1 norm contributes to the non-differentiability of the Lasso penalty term at zero.

πŸ’‘Iterative Re-weighted Least Squares (IRLS)

IRLS is a method for solving regression problems with non-differentiable penalties, such as Lasso. The video mentions IRLS as a specialized method for solving Lasso problems by iteratively solving weighted least squares problems. This method takes advantage of the structure of the Lasso problem to find the optimal solution.

πŸ’‘Sparsity

Sparsity in the context of regression models refers to the property of having many coefficients set to zero. This is a desirable property in feature selection, as it simplifies the model and can improve interpretability. The video discusses how Lasso, unlike Ridge, can produce sparse solutions by driving some coefficients exactly to zero.

πŸ’‘Regularization

Regularization is a technique used to prevent overfitting in machine learning models by adding a penalty term to the loss function. In the video, both Ridge and Lasso regression are discussed as examples of regularized linear regression, with Ridge using an L2 penalty and Lasso using an L1 penalty.

Highlights

Lasso does not have a closed form solution unlike Ridge regression, which complicates solving the optimization problem.

Subgradient methods are used to solve Lasso due to its non-differentiability at zero.

A subgradient is a direction where the function is completely lower bounded, used when the gradient is not available.

Subgradients can be thought of as slopes in one-dimensional cases, providing a way to approximate non-differentiable points.

The L1 penalty in Lasso has subgradients because it is not differentiable at zero, with slopes between -1 and 1.

Subgradient descent is an algorithm that can be used for convex functions even if they are not differentiable, converging to the minimum.

The Lasso problem is convex, allowing the use of subgradient descent methods for optimization.

Iterative Re-weighted Least Squares (IRLS) is a specialized method for solving the Lasso problem.

The Lasso problem combines a linear regression with an L1 penalty, promoting sparsity in the solution.

There are no closed-form solutions for Lasso, necessitating the use of optimization techniques or special purpose methods.

Ridge regression provides a closed-form solution and is easier to solve compared to Lasso.

Linear regression with squared loss is equivalent to the maximum likelihood estimator when given a probabilistic twist.

Bayesian linear regression introduces a prior distribution for the weights, leading to Ridge regression as a regularized version.

L1 regularization (Lasso) is preferred for promoting sparsity in the weight vector compared to L2 regularization (Ridge).

Elastic net penalty is a mixed regularizer combining Lasso and Ridge, catering to specific domain constraints.

Domain-specific constraints can be incorporated into regularizers to guide the optimization process towards expected solutions.

The course concludes with an overview of linear regression techniques and introduces the next part focusing on classification.

Transcripts

play00:00

foreign

play00:04

[Music]

play00:13

so now the question is well why don't we

play00:15

just do lasso right so if it's if it is

play00:18

giving us

play00:19

benefits in terms of uh you know the the

play00:23

pushing the values to exactly zero why

play00:26

would anyone even want to do rich when

play00:27

lasso is easy where there are some

play00:30

advantages of doing Ridge still than

play00:32

lasso

play00:34

foreign

play00:35

I mean let me point out a few things one

play00:38

thing is that one points some points um

play00:43

Point number one is that lasso

play00:48

does not have

play00:51

a closed form solution

play00:57

remember our Rich regression had a

play00:59

closed form solution ah Bridge closed

play01:02

from solution was x x transpose plus

play01:05

Lambda I inverse X transpose y right so

play01:08

it was a closed form solution whereas

play01:10

lasso you cannot say that well I cannot

play01:12

take the derivative set it to 0

play01:14

simply because the problem itself is not

play01:17

you know exactly differentiable at zero

play01:20

right so your your penalty is not a

play01:22

differentiable function so you cannot

play01:23

take the gradient set it to 0 and solve

play01:25

for it

play01:26

um so how do we solve this problem

play01:29

well you can use what are called as

play01:33

subgradient methods

play01:39

are usually used

play01:45

to solve lesson

play01:50

remember our gradient methods were just

play01:53

following the negative gradient

play01:54

direction now

play01:57

for problems where differentiation is an

play01:59

issue right so where you cannot take the

play02:01

gradient or at all points you can still

play02:05

solve the problem using an iterative

play02:06

method If instead of the gradient

play02:08

something called as a sub gradient

play02:10

exists a subgradient is equivalent to a

play02:14

gradient if a gradient is present at a

play02:18

point if not it is it is a direction

play02:20

where the function is completely lower

play02:23

bounded in that direction right so um

play02:26

so I'll just give you some examples as

play02:28

to what subgradients are we won't get

play02:30

into too much detail about sub gradients

play02:32

but just to give give you some sense of

play02:34

how subgradients look like

play02:36

let's say you have a point you have some

play02:40

function like this

play02:43

right so this is uh this is a piecewise

play02:46

linear function right so you you see

play02:48

there are different pieces of this

play02:49

function at different points um

play02:52

now at this point

play02:55

right so at this point of the input um

play02:58

let us say this is X now there are two

play03:01

pieces which are intersecting at this

play03:03

point and the problem is there is no

play03:05

it's not differentiable at this point so

play03:09

there is no gradient that you can find

play03:11

for this point but then a sub gradient

play03:13

is like

play03:15

you know some

play03:18

line right so some way to approximate

play03:20

this function at this point which is

play03:23

completely lower bounding the function

play03:26

right so if if the function is

play03:28

completely lower bounded at this point

play03:30

um for example this is one subgradient

play03:32

maybe here is another subgradient as you

play03:36

as you can see there are multiples of

play03:38

gradients here for the same point

play03:40

um if

play03:42

if you can compute such sub gradients if

play03:45

your function has such the gradients at

play03:47

every point for example at this point

play03:50

its a different point

play03:52

the only subgradient that you will have

play03:55

at this point is this itself

play03:58

this line itself no other line can

play04:00

completely lower bound this function at

play04:02

every Point whereas at these meeting

play04:04

points there might be multiple lines

play04:06

right so these are called as sub

play04:08

gradients

play04:10

um the blue one is also called a

play04:11

subgradient but then because it's

play04:13

differentiable at this point it is also

play04:15

called as a gradient

play04:17

right so if there is only one

play04:19

subgradient well typically that's the

play04:21

gradient right so

play04:22

um that's kind of telling you how the

play04:24

function is slope you can think of it as

play04:26

a slope right so in one dimensional case

play04:29

um so why is it this relevant for us

play04:32

because the L2 penalty has ah is of is

play04:36

like this right so it's an absolute

play04:37

value function absolute value function

play04:40

looks like this

play04:41

right so it it looks like this ah this

play04:44

is X this is absolute value of x and

play04:47

then you can think of it's like two

play04:48

pieces of linear functions type I mean

play04:52

attached at this point so it is not

play04:53

differentiable at this point uh but it

play04:56

has sub gradients right so for example

play04:57

this line is a sub gradient this line is

play04:59

a subgradient well this line itself is a

play05:02

sub gradient this line itself is a

play05:03

subgradient right so all these lines

play05:06

have slopes and all of these slopes are

play05:08

subgradients right so it has multiple

play05:10

sub gradients at this point in fact the

play05:12

subgradient are sub gradient will just

play05:15

be ah the slope for one dimensional

play05:18

function at 0 is any value between minus

play05:22

one and one minus 1 is this slope one is

play05:24

this slope so anybody between minus one

play05:26

and one any line is aligned with such a

play05:29

slope will lower bound this function

play05:30

absolute value completely and its called

play05:32

as subgradient

play05:34

right so um so let me quickly give you

play05:38

the definition of sub gradient and then

play05:40

we'll close this discussion so

play05:42

subgradient uh I just intuitively said

play05:45

what subgradient is but um you can think

play05:48

of sub gradient as a vector

play05:51

some Vector in D Dimension is a

play05:55

subgradient

play06:00

of some function f which is r d to R

play06:05

at a point

play06:07

X

play06:09

in r t if

play06:12

for every value of Z if the function's

play06:16

value is greater than

play06:19

the value at f of x plus G transpose Z

play06:25

minus X which just means that if you if

play06:28

you linearize this function at X right

play06:31

so

play06:32

um then the function

play06:34

um

play06:35

ah linear is using the sub gradient G

play06:39

then the function is completely above

play06:41

this linearization right so that is what

play06:43

this means right so that is precisely

play06:45

what this means so this um again so if

play06:48

you want the example that we saw earlier

play06:50

right so we had a function like this and

play06:54

at this point it's not differentiable

play06:57

but then at any Z right so I can let's

play07:00

say I draw this function now I take some

play07:03

Z the function itself takes the value F

play07:06

of Z here so this is f of Z now if this

play07:09

is the slope is G then this value is

play07:13

just f of x plus G times Z minus X right

play07:18

so that's what this value would be and

play07:20

you can see that F of Z is above this

play07:22

right so it's always about this now no

play07:24

matter which set I pick wherever I pick

play07:26

Z well the function is always above the

play07:28

blue line and if you can find such a g

play07:30

that satisfies this for all Z then

play07:33

that's a sub gradient

play07:35

now what's the use of sub gradient well

play07:37

ah here is the reason why sub gradients

play07:40

are useful ah y sub gradients

play07:50

well if the function

play07:54

f

play07:56

to minimize

play08:00

is a convex function

play08:07

then

play08:09

sub gradient descent is an algorithm

play08:12

that one can use instead of gradient

play08:15

descent

play08:16

and this algorithm also converges

play08:21

what does this mean this means that if

play08:23

your function is nice then

play08:26

instead of well even if it is not

play08:29

differentiable if it is nice in the

play08:30

sense that it's convex even if the

play08:32

function is not differentiable you can

play08:34

still Converse to the minimum of this

play08:37

function by moving along the negative of

play08:39

the sub gradient as opposed to the

play08:41

gradient of course there are multiple

play08:42

subgradients possible at a given point

play08:44

where it is not differentiable it

play08:46

doesn't really matter which subgradient

play08:48

you pick you can pick one arbitrary

play08:50

subgradient and move along the negative

play08:52

of that subgradient direction and then

play08:54

if the function is convex then one can

play08:56

argue that a subgradient descent

play08:58

algorithm will also converge to the

play08:59

optimal solution why is this relevant

play09:01

for us well it is relevant for us

play09:03

because the lasso problem is a convex

play09:05

problem it is the loss function is

play09:07

quadratic the norm L one Norm is a

play09:10

convex function so the sum of convex

play09:13

functions is convex and so the lasso

play09:15

problem itself is a convex optimization

play09:17

problem and providing the minimum of a

play09:19

convex optimization problem you can now

play09:21

use something like a subgrade in the

play09:23

center

play09:25

um

play09:26

by no means this is the only way to

play09:28

solve this lasso problem there are other

play09:31

specialized methods that one can use to

play09:34

solve the lasso problem

play09:36

um so we won't look at those methods but

play09:38

I will just point out that there are

play09:41

other methods

play09:46

special purpose methods

play09:49

because this is a special quadratic loss

play09:52

plus plus

play09:54

L one penalty now you can develop

play09:57

special purpose methods to solve this

play10:00

problem right so for lasso

play10:04

uh one example is what is called as irls

play10:10

um which is the iterative

play10:14

re-weighted

play10:18

least Quest method

play10:23

again we won't really look at this

play10:25

problem this method in this course

play10:29

but

play10:30

essentially the idea is that we can

play10:32

solve least queries easily right so Lee

play10:34

square is the original linear regression

play10:36

problem we know how to solve in closed

play10:37

form now you can use the use that as

play10:40

some kind of a black box to solve the

play10:42

lasso problem and now essentially this

play10:45

is you know taking advantage of the

play10:47

structure in the lasso problem that it

play10:49

is a linear regression problem plus a

play10:51

penalty and you can do special purpose

play10:54

things like irls which can be used to

play10:56

solve this problem that's one thing or

play10:58

you can use a general purpose of

play10:59

gradient descent algorithm by noting

play11:01

that our function loss function plus

play11:04

penalty the lasso objective is in fact

play11:06

convex it may not be differentiable but

play11:09

it's convex and so you can use subgrade

play11:11

InDesign methods so all these are ok

play11:13

just that we have to be aware that there

play11:15

is no closed form solution to the lasso

play11:17

problem and that's the main summary of

play11:19

this this part of the you know

play11:22

discussion there is no close form

play11:24

solution to lasso so we cannot you know

play11:27

hope that given the data and labels we

play11:30

can simply really get a close form value

play11:32

for w we have to do some work and that

play11:35

is important to keep in mind though you

play11:37

will get a sparse solution and all that

play11:39

it is not as straightforward as solving

play11:42

a ridge regression problem on the other

play11:44

hand is gives you a nice close form

play11:46

solution

play11:47

you can do a stochastic gradient descent

play11:49

also if you wish if you have large data

play11:52

size n large D and so on but for me for

play11:55

you know small values you can just use a

play11:58

closed form solution small values of N

play12:00

and D you can use the closed form

play12:01

solution whereas in lasso there is no

play12:03

closed form solution and so you have to

play12:05

rely on you know either optimization

play12:08

techniques like subgrade indesant or

play12:10

special purpose methods like IRS

play12:13

okay so so with this uh we we kind of

play12:17

conclude our discussion about uh the

play12:19

regression problem

play12:21

I just wanted to summarize at a high

play12:22

level whatever we have seen so far in

play12:24

regression

play12:25

um and then we'll move on to the next

play12:27

part of the course in regression we

play12:30

noted that uh you know you can solve a

play12:33

least squares objective which is

play12:36

required loss and that had a closed form

play12:38

solution we looked at its geometric

play12:40

interpretation we looked at its

play12:41

computational

play12:42

discussion we had a discussion and we

play12:45

came up with the stochastic gradient

play12:46

descent algorithm then we noted that

play12:49

well if you give it a probabilistic

play12:51

Twist then the linear regression with

play12:53

squared loss is same as the maximum

play12:55

likelihood estimator now because it's a

play12:58

maximum likelihood estimator then we can

play12:59

ask well is there a Bayesian counterpart

play13:01

to this and that gave us the linear

play13:04

regression plus regularized version

play13:06

which is the rich regression problem

play13:09

where you used an L2 regularizer which

play13:11

was equivalent to a gaussian prior with

play13:13

zero mean on the W itself and then we

play13:17

noted that well the while the L2

play13:18

regularizer is good in pushing w W's to

play13:22

close to 0 it may not make it exactly

play13:24

zero now to make it exactly zero then we

play13:27

looked at the geometry of the problem

play13:28

and said that well maybe an L1

play13:30

regularizer is better and that gave us

play13:33

the lasso problem the least absolute

play13:35

shrinkage selection operator algorithm

play13:37

which is the loss squared loss plus an

play13:39

L1 regularizer and we also had a small

play13:43

discussion about how to solve the lasso

play13:45

problem involving sub gradients and

play13:47

other techniques perhaps like the irls

play13:50

now this is the overall summary of

play13:53

linear regression I also wanted to point

play13:56

out that you know people use all sorts

play13:58

of regular razor there are mixed lasso

play14:01

plus Rich regularizer called the elastic

play14:03

net penalty and so on so there is a

play14:06

whole you know um you know cottage

play14:08

industry of regularizers where if you

play14:10

think your W should satisfy some nice

play14:12

constraint maybe W has some group

play14:14

structure in the sense that you collect

play14:16

features from lot of places but you know

play14:18

that some features should either be

play14:20

positive or sorry either be selected

play14:23

together or not be selected together

play14:25

these These are domain specific

play14:27

constraints that might come up and then

play14:29

you can convert them into you know

play14:32

regular razors and then try to solve the

play14:34

problem in a regularized fashion all

play14:37

these are things that people have

play14:38

studied um what you what we have looked

play14:40

at in this course is some basic type of

play14:42

regularizes which are very popular which

play14:44

are most commonly used but then you also

play14:47

should as machine learning practitioners

play14:49

you also should know that in case

play14:51

if your domain knowledge tomorrow in

play14:53

your problem that you try to solve you

play14:55

have some domain knowledge with say

play14:57

something more about the structure of

play14:59

the solution that you expect to see then

play15:02

you can perhaps convert that into a more

play15:05

meaningful regularizer than simply using

play15:07

a ridge or a lasso penalty with this we

play15:11

will complete our discussion about

play15:12

regression in this course and we will

play15:15

move on to the next part of the course

play15:17

where we will talk about supervised

play15:19

learning but from a classification point

play15:21

of view thank you

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Machine LearningLasso RegressionRidge RegressionConvex OptimizationSubgradient MethodsClosed Form SolutionRegularization TechniquesLinear RegressionData ScienceOptimization AlgorithmsModel Selection