STAT1008 Lecture Notes - Lecture 36: Stepwise Regression, Dependent And Independent Variables

46 views2 pages
30 May 2018
School
Department
Course
Professor
STAT1008 Week 12 Lecture C
Variable Selection
(Some) ways of deciding whether a variable should be included in the model or
not:
Does it improve adjusted R2?
Does it have a low p-value?
Is it associated with the response by itself?
Is it strongly associated with another explanatory variable? (If yes, then
including both maybe redundant) <- multi collinearity
Does common sense say it should contribute to the model?
One method: Backward Elimination
Start with a model with all predictors
Drop the worst predictor (by p-value)
Continue until
All predictors are “significant” OR
Adjusted R2 no longer improves
Final Exam - All predictors:
6 predictors in the model not all are significant thus backward elimination used
one at a time
Don’t drop them all at the same time
QuizAvg has the highest p-value since drop the worse out of the lot thus weakest
predictor = QuizAvg
Remember = 67.5% for R squared adjusted
When you drop QuizAvg the R-squared adjusted increases to 68.1%
Overall p-value increases for other predictors
Thus drop another one and the r-squared adjusted increases to 68.4%
Drop Exam 1 from the model since p-value is highest hence the r-square adj
decrease to 67.9%
All p-values have dropped to less than 10% which is good but the r-square adj
has gotten worse. Exam 1 has variability but by dropping it the r-square got
worse
You could drop projects since p-value >5% or stop-this model is fine since all p-
values are less than 10% or put exam1 back in the model since adjusted R2 went
down
Pro of adding back Exam 1: Best adjusted R2 and sepsilon but weakest t-tests
If you leave the model the pro is t-tests < 10% but adjusted R2 and sepsilon worse
If you remove projects the pro is t-tests < 1% but the adjusted R2 and sepslion is
worse @ 66.7%
No write answer just use justification
If you want to explain predictions then you pick the first model since it shows a
higher variability
Don’t use the 3rd model since even tho the t-tests are less than 1% it doesn’t
have information
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows half of the first page of the document.
Unlock all 2 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Variable selection (some) ways of deciding whether a variable should be included in the model or not: Is it strongly associated with another explanatory variable? (if yes, then including both maybe redundant)

Get access

Grade+
$40 USD/m
Billed monthly
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
10 Verified Answers
Class+
$30 USD/m
Billed monthly
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
7 Verified Answers

Related textbook solutions

Related Documents