Week 3 Lecture 14 Partial Least Squares
Summary
TLDRThis video script delves into regression methods, focusing on linear regression techniques such as subset selection and shrinkage methods, including ridge regression and lasso. It introduces derived directions with principal component regression and partial least squares (PLS), emphasizing PLS's unique approach that considers both input and output data. The script explains the process of constructing PLS directions and their orthogonality, leading to univariate regressions without multicollinearity issues. It concludes with the implications of using PLS for prediction and how it relates to the original least squares fit, especially when dealing with orthogonal inputs.
Takeaways
- š The lecture continues the discussion on linear regression methods, focusing on subset selection, shrinkage methods, and derived directions.
- š Subset selection methods include forward selection, backward selection, and stepwise selection, which involve choosing subsets of explanatory variables.
- š§ Shrinkage methods such as ridge regression and lasso are introduced to address the issue of overfitting by shrinking the coefficients of less important variables.
- š Derived directions encompass principal component regression (PCR) and partial least squares (PLS), which are methods to find new directions for regression analysis.
- š¤ The motivation for PLS is to address the limitation of PCR, which does not consider the relationship between input data and output data.
- āļø Before applying PLS, it's assumed that the input data (X) is standardized and the output data (Y) is centered, ensuring no variable dominates due to its scale.
- š The first derived direction (z1) in PLS is found by summing the individual contributions of each variable in explaining the output variable Y.
- š The process of PLS involves orthogonalization, where each new direction (z) is made orthogonal to the previous ones, allowing for univariate regression.
- š¢ The derived directions (Z1, Z2, Z3, etc.) are constructed to maximize variance in the input space and correlation with the output variable Y.
- š® Once the derived directions and their coefficients (Īø) are determined, predictions can be made without directly constructing the Z directions for new data.
- š The final model in PLS can be derived from the Īø coefficients, allowing for direct coefficients for the original variables (X) to be computed.
- š If the original variables (X) are orthogonal, PLS will stop after the first step, as there will be no additional information to extract from the residuals.
Q & A
What are the three classes of methods discussed in the script for linear regression?
-The three classes of methods discussed are subset selection, shrinkage methods, and derived directions.
What is the main difference between principal component regression (PCR) and partial least squares (PLS)?
-The main difference is that PCR only considers the input data (X) and its variance, while PLS also takes into account the output data (Y) and the correlation with the input data.
What assumptions are made about the data before applying partial least squares?
-It is assumed that the output variable Y is centered and the input variables are standardized, meaning each column has a 0 mean and unit variance.
How is the first derived direction (z1) in PLS computed?
-The first derived direction (z1) is computed by taking the projection of Y on each Xj, vectorizing it, and then summing all these projections to create a single direction.
What is the purpose of orthogonalization in the context of PLS?
-Orthogonalization is used to find new directions (xj2) that are orthogonal to the previous derived directions (z1, z2, ...), allowing for univariate regression without considering the influence of previous variables.
How does PLS balance the variance in the input space and the correlation with the output variable?
-PLS finds directions in X that have high variance and also high correlation with Y, effectively balancing both through an objective function.
What happens when you perform PLS on data where the original variables (X) are already orthogonal?
-If the original variables are orthogonal, PLS will stop after one step because there will be no additional variance to capture, and the derived directions (Z) will be the same as the original variables (X).
How many derived directions (Z) can be obtained from PLS, and what does this imply for the fit of the data?
-You can obtain as many derived directions (Z) as you want from PLS. If you get p PLS directions, it means you will get as good a fit as the original least squares fit.
How can the coefficients for the original variables (X) be derived from the coefficients of the derived directions (Īø) in PLS?
-The coefficients for the original variables (X) can be derived from the coefficients of the derived directions (Īø) by performing linear computations that account for how the Īøs are stacked and the original variables' contributions to each Īø.
What is the process of constructing derived directions in PLS, and how does it differ from PCR?
-In PLS, derived directions are constructed by summing the projections of Y on each Xj, creating directions that maximize the variance in the input space and the correlation with the output. This differs from PCR, which only maximizes variance in the input space without considering the output.
Outlines
š Introduction to Linear Regression Techniques
The speaker continues discussing linear regression, focusing on different methods such as subset selection, shrinkage methods, and derived directions. The discussion revisits subset selection, including forward, backward, and stage-wise selection, then moves to shrinkage methods like ridge regression and lasso. The speaker then introduces derived directions, specifically principal component regression (PCR), and explains the motivation behind partial least squares (PLS) as it considers both input and output data, unlike PCR.
š Projection and Derived Directions in 3D
The speaker explains the projection of the output variable Y on multiple input variables (X1, X2) to derive directions in the context of partial least squares (PLS). The challenge of visualizing this in a 3D space is acknowledged. The speaker contrasts PLS with principal component regression (PCR), noting that while PCR finds directions in X with the highest variance, PLS finds directions in X that are more aligned with the output variable Y. PLS balances variance in the input space with correlation to the output variable.
š Orthogonalization and Prediction with PLS
The process of orthogonalizing directions in partial least squares (PLS) is discussed, where each derived direction is orthogonal to the previous ones, simplifying univariate regression. The speaker explains how coefficients for the original variables X can be derived from the PLS directions for prediction. If all directions are derived, PLS achieves a fit equivalent to least squares. A thought experiment is presented: if the input variables X are orthogonal initially, PLS would immediately yield the least squares fit after the first direction.
Mindmap
Keywords
š”Linear Regression
š”Subset Selection
š”Shrinkage Methods
š”Principal Component Regression (PCR)
š”Partial Least Squares (PLS)
š”Orthogonalization
š”Centering and Standardizing
š”Projection
š”Univariate Regression
š”Coefficients
š”Overfitting
Highlights
Continuation of the discussion on linear regression methods.
Introduction to subset selection methods including forward, backward, and stepwise selection.
Exploration of shrinkage methods like ridge regression and lasso.
Introduction to derived directions starting with principal component regression.
The limitation of principal component regression in not considering the output data.
Assumption of centered Y and standardized inputs for both PCA and partial least squares.
Process of creating derived directions by projecting Y on Xj and summing the projections.
Explanation of how to find the first derived direction z1 by summing univariate contributions.
The concept of using Y in the regression to find derived directions.
Distinguishing partial least squares from PCR by considering the output variable Y.
Demonstration of how to orthogonalize by regressing xj on z1.
Iterative process of finding new directions xj2 and their corresponding z directions.
Orthogonality of derived directions ensuring univariate regression can be performed.
Derivation of coefficients for the original variables X from the derived directions Z.
The equivalence of the fit obtained by using p PLS directions to the original least squares fit.
Implication of orthogonal X variables on the PLS method, potentially stopping after one step.
Concluding the discussion on regression methods with insights into partial least squares.
Transcripts
ļ»æ Okay so we will continue from where we left offĀ Ā
as I promised right so we are looking at linearĀ regression and we looked at subset selection andĀ Ā
then we looked at the shrinkage methods and then,Ā finally we came to derived directions all right IĀ Ā
said there are three classes of methods so weĀ are looking at a couple of examples of each ofĀ Ā
those classes of methods the first one we lookedĀ at was subset selection so we looked at forward,Ā Ā
backward selection and stage way selection in stepĀ wise selection and all that and then we lookedĀ Ā
at shrinkage methods where we looked at ridgeĀ regression and lasso and then we started lookingĀ Ā
at derived directions right where we looked atĀ principal component regression I said the next oneĀ Ā
we look at is partial least squares and it gaveĀ me the motivation for looking at partial leastĀ Ā
squares it is mainly because principal componentĀ regression only looks at the input data okay,Ā Ā
does not pay attention to the output right andĀ therefore you might sometimes come up with reallyĀ Ā
counterintuitive directions like an exampleĀ I showed you with the +1 and -1 outputs okay,Ā Ā
so the basic idea here is that we areĀ going to use the Y also right. Ā
Just like the usual case I am going to assumeĀ that Y is centered right. And I am also goingĀ Ā
to assume that the inputs are standardized. ThisĀ is something which you have to do for both PCA andĀ Ā
partial least squares essentially assume that theĀ each column right it is going to have 0 mean unitĀ Ā
variance right on the data that is given to youĀ make it 0 mean unit variance, so that you are notĀ Ā
having any magnitude related effects on the outputĀ okay, so what I am going to do is the following ifĀ Ā
you remember how we did orthogonalization earlierĀ something very similar so I am going to look atĀ Ā
so I am going to look at the projection of YĀ on Xj right then I am going to create a derivedĀ Ā
direction which essentially sums up all of theseĀ projections right I have computed basically I amĀ Ā
computing the projection of Y on xj right,Ā so this is essentially the direction is aĀ Ā
vectorized version of it then I am going to sumĀ all of this up so essentially what I am doingĀ Ā
here is I am looking at each variable in turnĀ I take each Xj in turn okay I am seeing what isĀ Ā
the effect on Y right, so how much of Y I am ableĀ to explain just by taking Xj alone and I am usingĀ Ā
all of that I am combining that and makingĀ that as my single direction so individuallyĀ Ā
taking each one of this all by itself okay. Individually taking each direction by itself howĀ Ā
much of Y can I explain and that becomes myĀ first derived direction that is my z1 okay.Ā Ā
So that is the coefficient for z1 in myĀ regression fit eventual regression fitĀ Ā
okay that is the coefficient for that one youĀ can see what it is like so I have taken Y andĀ Ā
regressed it on z1 and that essentially gives meĀ what the coefficient for z1 right so how do I goĀ Ā
on to fine okay so I am looking at how much of YĀ is along each direction Xj right so in some senseĀ Ā
you can think of it as if I have one variableĀ Xj right how much of Y can be explained withĀ Ā
that one variable xj okay I am looking at that andĀ then my first direction z1 is essentially summingĀ Ā
that univariate contributions over all my inputĀ directions I suppose, I have two input directionsĀ Ā
Unfortunately I have to do this in 3d supposeĀ I have to input directions so what I am goingĀ Ā
to do is I am going to take my Y right, soĀ project it on x1 alone first right projectĀ Ā
it on x1 alone and on x2 alone right we redoĀ that this is tricky to do this in 3d but anyĀ Ā
way right. No it is going to be hard to do it on theĀ Ā
board pictorially for you okay I am not going toĀ do this so I really need to get a actually have toĀ Ā
plot a function Y right I cannot just do it withĀ single data points why that does not make sense,Ā Ā
so I actually have to get to a surface Y on x1,Ā x2 and then talk about the projection so thatĀ Ā
is going to be hard right, but the basic idea isĀ I take Y right I find the projection of Y alongĀ Ā
x1 okay then I find the projection of Y along x2Ā okay now I am going to take the sum of these twoĀ Ā
okay and whatever is the resulting direction andĀ I am going to use that as my first direction. Ā
Yes we see in PCR what we did was we first foundĀ directions X which had the highest variance hereĀ Ā
we are not finding directions in X with theĀ highest variance but we are finding directionsĀ Ā
in X right some sense components of X which areĀ have more in the direction of the output variableĀ Ā
Y right, so eventually you can show that whichĀ you are not going to do but you can show that theĀ Ā
directions you pick that Z1, Z2, Z 3 that youĀ pick or those which have high variance in theĀ Ā
input space. But also have a high correlationĀ with way right it is actually an objectiveĀ Ā
function which tries to balance correlation withĀ Y and variance in the input space why PCA thatĀ Ā
is only variance PCR does only variance in theĀ input space does not worry about the correlationĀ Ā
but partial least squares you can show that itĀ actually worries about the correlation as wellĀ Ā
right. We find the first coordinate now what doĀ you do you orthoganalize, so what should I do nowĀ Ā
I should regress x1 so what should I be doingĀ now I should regress x1 like xj on z1 right. Ā
This is how we did the orthogonalization earlierĀ right, so you find one direction okay then youĀ Ā
regress everything else on that direction thenĀ subtract from it that gives you the orthogonalĀ Ā
direction right, so essentially that is whatĀ you are doing here the expressions look bigĀ Ā
but then if you have been following the materialĀ from the previous classes then it is essentiallyĀ Ā
whether they just reusing the univariateĀ regression construction we had earlier right. Ā
So now I have a new set of directions whichĀ I call xj2 write x j1 was the original xjĀ Ā
is I start off with now I have a new set ofĀ directions which we will call xj2 and thenĀ Ā
I can keep repeating the whole process, I canĀ take white projected along xj2 right and thenĀ Ā
combine that and get Z2 and then regress YĀ on Z2 to get Īø2 right, so I can keep doingĀ Ā
this until I get as many directions as I wantĀ all right so what is the nice thing about Z1,Ā Ā
Z2 other things they themselves will be orthogonalĀ because they are being constructed by individualĀ Ā
vectors which are orthogonal with respect toĀ their all the previous Zs that we have right. Ā
Each one will be orthogonal and therefore I canĀ essentially do univariate regression so I do notĀ Ā
have to worry about accommodating the previousĀ variable, so when it when I want to fit the ZKĀ Ā
I can just do a univariate regression of Y on thatĀ K and I will get the coordinates theta K okay isĀ Ā
it fine great. So once I get this theta one toĀ ĪøK how do I use it for prediction can I just doĀ Ā
like xĪ² like into xĪ² can I do xĪø and I know whatĀ should I do well so I can do zx z read it meanĀ Ā
I am sorry I can do ĪøZ and predict it but then IĀ do not really want to construct this Z directionsĀ Ā
for every vector that I am going to get so I doĀ not want to project it along those Z direction,Ā Ā
so instead of that what I can do if you thinkĀ about it each of those Zs is actually composedĀ Ā
of the original variables X right. So I canĀ compute the Īø and then I can just go backĀ Ā
and derive coefficients for the Xs directlyĀ because all of these are linear computationsĀ Ā
I all I need to do is essentially figure out howĀ I am going to stack all the thetas so that I canĀ Ā
derive the coefficients for the Xs okay thinkĀ about it you can do it as a short exercise butĀ Ā
I can eventually come up and write right. So where I can derive this coefficients Ī² hatĀ Ā
from these Īøs right so I will derive Īø1, Īø2, Īø3Ā so on so forth I can just go back and do thisĀ Ā
computation so you will have to think about itĀ very easy you can work it out and figure outĀ Ā
what is the number should be right and what isĀ the āmā doing that number of their directions, IĀ Ā
actually derive the number of directions I deriveĀ from the PLS right so here the first direction IĀ Ā
can keep going suppose I derive p directions whatĀ can you tell me about the fit for the data if IĀ Ā
get p PLS directions it essentially means thatĀ I will get as good a fit as the original leastĀ Ā
squares fit right so I essentially get the sameĀ fit as least squares fit okay so anything lesserĀ Ā
than that is going to give me something differentĀ from the least squares fit okay here is a thoughtĀ Ā
question if my X are originally, orthogonal toĀ begin with X were actually orthogonal to beginĀ Ā
with what will happen with PLS Z will be theĀ same as Xs right and what will happen to Z2 canĀ Ā
I do the Z2 no right PLS will stop after oneĀ step because there will be no residuals afterĀ Ā
that right so I will essentially get my leastĀ squares fit in the first attempt itself okayĀ Ā
so that is essentially what will happen rightĀ so we will stop with regression methods.
Browse More Related Video
5.0 / 5 (0 votes)