ECON 6306 Lecture Notes - Lecture 8: F-Test, Business Analytics, Analysis Of Variance

34 views3 pages
Consequences of a misspecified regression model
Including irrelevant variables will increase the standard errors, hence the p value of coefficients will increase
resulting in no variable being significant.
Some of us in the business analytics program work with titanic amounts of data. When we run regression on a
very large sample size, the standard errors and hence the p values shrink. This can make all of the variables
significant even if they were not in a smaller dataset.
If we omit relevant variables, we get omitted variable bias in the model in which the estimated coefficients may
not be representative of the actual coefficients.
Standardized regression coefficients
If we want to see which regression variables are the most important, we cannot simply look at the coefficients
and choose the largest. This is because the coefficients are based on the scale of the original variables. We
standardize the variables to get which coefficient has the highest absolute impact.
Partial F test
Suppose that you have two models that you want to compare. The first model has y, x1, x2 and x3. While the
second model has y, x1 and x2 only. Notice that the second model is a subset of the first model.
In such a scenario, where one model is a subset of the other, we can use a partial F test to see if the additional
variable is significant. The code to run this is given below
anova(model2, model1)
If the p value is less than 5%, we can say that the additional variable (x3 in this case) is significant and we should
keep it in the model.
Interaction effects
Till now, we have only considered the independent impact of x variables on y variable. It is also possible that two
or more x variables affect the y variable jointly. This is called interaction between the x variables. Simply stating,
value of x1 determines what the effect of x2 on y will be.
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows page 1 of the document.
Unlock all 3 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Including irrelevant variables will increase the standard errors, hence the p value of coefficients will increase resulting in no variable being significant. Some of us in the business analytics program work with titanic amounts of data. When we run regression on a very large sample size, the standard errors and hence the p values shrink. This can make all of the variables significant even if they were not in a smaller dataset. If we omit relevant variables, we get omitted variable bias in the model in which the estimated coefficients may not be representative of the actual coefficients. If we want to see which regression variables are the most important, we cannot simply look at the coefficients and choose the largest. This is because the coefficients are based on the scale of the original variables. We standardize the variables to get which coefficient has the highest absolute impact. Suppose that you have two models that you want to compare.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents

Related Questions