STAT1008 Lecture Notes - Lecture 36: Stepwise Regression, Dependent And Independent Variables
STAT1008 Week 12 Lecture C
● Variable Selection
○ (Some) ways of deciding whether a variable should be included in the model or
not:
■ Does it improve adjusted R2?
■ Does it have a low p-value?
■ Is it associated with the response by itself?
■ Is it strongly associated with another explanatory variable? (If yes, then
including both maybe redundant) <- multi collinearity
■ Does common sense say it should contribute to the model?
○ One method: Backward Elimination
■ Start with a model with all predictors
■ Drop the worst predictor (by p-value)
■ Continue until
● All predictors are “significant” OR
● Adjusted R2 no longer improves
● Final Exam - All predictors:
○ 6 predictors in the model not all are significant thus backward elimination used
one at a time
○ Don’t drop them all at the same time
○ QuizAvg has the highest p-value since drop the worse out of the lot thus weakest
predictor = QuizAvg
○ Remember = 67.5% for R squared adjusted
○ When you drop QuizAvg the R-squared adjusted increases to 68.1%
○ Overall p-value increases for other predictors
○ Thus drop another one and the r-squared adjusted increases to 68.4%
○ Drop Exam 1 from the model since p-value is highest hence the r-square adj
decrease to 67.9%
○ All p-values have dropped to less than 10% which is good but the r-square adj
has gotten worse. Exam 1 has variability but by dropping it the r-square got
worse
○ You could drop projects since p-value >5% or stop-this model is fine since all p-
values are less than 10% or put exam1 back in the model since adjusted R2 went
down
○ Pro of adding back Exam 1: Best adjusted R2 and sepsilon but weakest t-tests
○ If you leave the model the pro is t-tests < 10% but adjusted R2 and sepsilon worse
○ If you remove projects the pro is t-tests < 1% but the adjusted R2 and sepslion is
worse @ 66.7%
○ No write answer just use justification
○ If you want to explain predictions then you pick the first model since it shows a
higher variability
○ Don’t use the 3rd model since even tho the t-tests are less than 1% it doesn’t
have information
find more resources at oneclass.com
find more resources at oneclass.com
Document Summary
Variable selection (some) ways of deciding whether a variable should be included in the model or not: Is it strongly associated with another explanatory variable? (if yes, then including both maybe redundant)