STAT1008 Lecture Notes - Lecture 36: Stepwise Regression, Dependent And Independent Variables

46 views2 pages

whitegoat396

30 May 2018

School

Department

Course

Professor

For unlimited access to Class Notes, a Class+ subscription is required.

STAT1008 Week 12 Lecture C

● Variable Selection

○ (Some) ways of deciding whether a variable should be included in the model or

not:

■ Does it improve adjusted R2?

■ Does it have a low p-value?

■ Is it associated with the response by itself?

■ Is it strongly associated with another explanatory variable? (If yes, then

including both maybe redundant) <- multi collinearity

■ Does common sense say it should contribute to the model?

○ One method: Backward Elimination

■ Start with a model with all predictors

■ Drop the worst predictor (by p-value)

■ Continue until

● All predictors are “significant” OR

● Adjusted R2 no longer improves

● Final Exam - All predictors:

○ 6 predictors in the model not all are significant thus backward elimination used

one at a time

○ Don’t drop them all at the same time

○ QuizAvg has the highest p-value since drop the worse out of the lot thus weakest

predictor = QuizAvg

○ Remember = 67.5% for R squared adjusted

○ When you drop QuizAvg the R-squared adjusted increases to 68.1%

○ Overall p-value increases for other predictors

○ Thus drop another one and the r-squared adjusted increases to 68.4%

○ Drop Exam 1 from the model since p-value is highest hence the r-square adj

decrease to 67.9%

○ All p-values have dropped to less than 10% which is good but the r-square adj

has gotten worse. Exam 1 has variability but by dropping it the r-square got

worse

○ You could drop projects since p-value >5% or stop-this model is fine since all p-

values are less than 10% or put exam1 back in the model since adjusted R2 went

down

○ Pro of adding back Exam 1: Best adjusted R2 and sepsilon but weakest t-tests

○ If you leave the model the pro is t-tests < 10% but adjusted R2 and sepsilon worse

○ If you remove projects the pro is t-tests < 1% but the adjusted R2 and sepslion is

worse @ 66.7%

○ No write answer just use justification

○ If you want to explain predictions then you pick the first model since it shows a

higher variability

○ Don’t use the 3rd model since even tho the t-tests are less than 1% it doesn’t

have information

find more resources at oneclass.com

Unlock document

This preview shows half of the first page of the document.
Unlock all 2 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Variable selection (some) ways of deciding whether a variable should be included in the model or not: Is it strongly associated with another explanatory variable? (if yes, then including both maybe redundant)

STAT1008 Lecture Notes - Lecture 36: Stepwise Regression, Dependent And Independent Variables

Document Summary

Get access

Related textbook solutions

Introductory Statistics

Related Documents

STAT1008 Study Guide - Final Guide: Normal Distribution, Confidence Interval, Prediction Interval

STAT1008 Study Guide - Final Guide: Dependent And Independent Variables, Test Statistic, Confounding

STAT 3006 Lecture Notes - Lecture 24: Mean Squared Error, Stepwise Regression, Feature Selection