Week 3 Lecture 14 Partial Least Squares

Machine Learning- Balaraman Ravindran

4 Aug 202114:34

Summary

TLDRThis video script delves into regression methods, focusing on linear regression techniques such as subset selection and shrinkage methods, including ridge regression and lasso. It introduces derived directions with principal component regression and partial least squares (PLS), emphasizing PLS's unique approach that considers both input and output data. The script explains the process of constructing PLS directions and their orthogonality, leading to univariate regressions without multicollinearity issues. It concludes with the implications of using PLS for prediction and how it relates to the original least squares fit, especially when dealing with orthogonal inputs.

Takeaways

📚 The lecture continues the discussion on linear regression methods, focusing on subset selection, shrinkage methods, and derived directions.
🔍 Subset selection methods include forward selection, backward selection, and stepwise selection, which involve choosing subsets of explanatory variables.
🔧 Shrinkage methods such as ridge regression and lasso are introduced to address the issue of overfitting by shrinking the coefficients of less important variables.
🌐 Derived directions encompass principal component regression (PCR) and partial least squares (PLS), which are methods to find new directions for regression analysis.
🤔 The motivation for PLS is to address the limitation of PCR, which does not consider the relationship between input data and output data.
⚖️ Before applying PLS, it's assumed that the input data (X) is standardized and the output data (Y) is centered, ensuring no variable dominates due to its scale.
📈 The first derived direction (z1) in PLS is found by summing the individual contributions of each variable in explaining the output variable Y.
🔄 The process of PLS involves orthogonalization, where each new direction (z) is made orthogonal to the previous ones, allowing for univariate regression.
🔢 The derived directions (Z1, Z2, Z3, etc.) are constructed to maximize variance in the input space and correlation with the output variable Y.
🔮 Once the derived directions and their coefficients (θ) are determined, predictions can be made without directly constructing the Z directions for new data.
🔄 The final model in PLS can be derived from the θ coefficients, allowing for direct coefficients for the original variables (X) to be computed.
📉 If the original variables (X) are orthogonal, PLS will stop after the first step, as there will be no additional information to extract from the residuals.

Q & A

What are the three classes of methods discussed in the script for linear regression?
-The three classes of methods discussed are subset selection, shrinkage methods, and derived directions.
What is the main difference between principal component regression (PCR) and partial least squares (PLS)?
-The main difference is that PCR only considers the input data (X) and its variance, while PLS also takes into account the output data (Y) and the correlation with the input data.
What assumptions are made about the data before applying partial least squares?
-It is assumed that the output variable Y is centered and the input variables are standardized, meaning each column has a 0 mean and unit variance.
How is the first derived direction (z1) in PLS computed?
-The first derived direction (z1) is computed by taking the projection of Y on each Xj, vectorizing it, and then summing all these projections to create a single direction.
What is the purpose of orthogonalization in the context of PLS?
-Orthogonalization is used to find new directions (xj2) that are orthogonal to the previous derived directions (z1, z2, ...), allowing for univariate regression without considering the influence of previous variables.
How does PLS balance the variance in the input space and the correlation with the output variable?
-PLS finds directions in X that have high variance and also high correlation with Y, effectively balancing both through an objective function.
What happens when you perform PLS on data where the original variables (X) are already orthogonal?
-If the original variables are orthogonal, PLS will stop after one step because there will be no additional variance to capture, and the derived directions (Z) will be the same as the original variables (X).
How many derived directions (Z) can be obtained from PLS, and what does this imply for the fit of the data?
-You can obtain as many derived directions (Z) as you want from PLS. If you get p PLS directions, it means you will get as good a fit as the original least squares fit.
How can the coefficients for the original variables (X) be derived from the coefficients of the derived directions (θ) in PLS?
-The coefficients for the original variables (X) can be derived from the coefficients of the derived directions (θ) by performing linear computations that account for how the θs are stacked and the original variables' contributions to each θ.
What is the process of constructing derived directions in PLS, and how does it differ from PCR?
-In PLS, derived directions are constructed by summing the projections of Y on each Xj, creating directions that maximize the variance in the input space and the correlation with the output. This differs from PCR, which only maximizes variance in the input space without considering the output.