RSM318H1 Chapter Notes - Chapter 6: Stepwise Regression, Overfitting, Simple Linear Regression
Document Summary
If true relationship is linear, least squares has low bias. If n is much larger than p, least squares tends to have low variance. If n not much larger than p, can result in overfitting and high variability. If p>n, variance is infinite and no unique least squares coefficient estimate b. Irrelevant variables in multiple regression adds unnecessary complexity to model. Fit a least squares regression for all 2p possible models, then identify which one is best. To select best model, first select best model for each subset size based on rss or r2. Starts with no predictors, adds predictors one at a time to model until all predictors are in model. Add predictor that gives greatest additional improvement to fit. Not guaranteed to find best model out of all 2p possible models. Only viable subset method when p is very large. Begins with all predictors in model, removes least useful variable one at a time.