Textbook Notes (280,000)
CA (170,000)
York (10,000)
MGMT (200)
Chapter 16

MGMT 1050 Chapter Notes - Chapter 16: Explained Variation, Total Variation, Bias Of An Estimator


Department
Management
Course Code
MGMT 1050
Professor
Olga Kraminer
Chapter
16

This preview shows page 1. to view the full 5 pages of the document.
So when SSE is small, the fit is excellent and the model should be used. But how do we
know if the SSE is small enough? We judge the value of SEE by comparing to the mean of
the dependant variable (mean of sample y).
If SEE is small in comparison, then the linear regression can be used. However SEE has no
upper limit, so it is not great to measure whether a model is a good fit or not.
Method 2: ANOVA (F-test)
- The deviation between y and y- is a result of two sources of variation: 1) the
difference between y-hat and the sample mean of y-hat (the difference between the
predicted y value and the average y value) and 2) the difference between y and y-hat
(the difference between the actual value of y and the predicted value of y)
The difference between the predicted value of y and the average value of y can be
explained due to changes in x (since x is not constant, it makes sense that the predicted
value of y is different from the average value of y as x changes) - SSR
The other part, the difference between the actual value of y and the predicted value of y,
is residual we cannot explain it with the model. This part of the difference is unexplained
by the variation in x - SSE
Thus, the total variation in y = the explained variation + the unexplained variation in y
The sum of squares for regression (SSR) is a measure of the explained variation in y
The sum of squares for error (SSE) is a measure of the unexplained variation in y
Sum of squares total (SS(Total)) is a measure of the total variation in y
SS(Total) = SSE + SSR
If there was no link between the two variables x and y then SSR would be zero and all of the
variation would be in SSE. A good regression model minimizes the value of SSE and
maximizes the value of SSR
We can use an F-test to determine whether SSR is significantly greater than SSE and that
the model is a good one. Our hypothesis are:
Ho: B1 = 0
Ho: B ≠ 
Since there is a linear relationship (not a horizontal line) when B ≠ . Instead of MSR/MSE,
we do SSR/SSE, divided by degree of freedom. A Large F-statistic or a low p-value indicates a
good model.
find more resources at oneclass.com
find more resources at oneclass.com
You're Reading a Preview

Unlock to view full version

Only page 1 are available for preview. Some parts have been intentionally blurred.

We then compare this F-statistic to our rejection region, which is always one tailed in the
upper tail. The test is one-tailed because only when SSR is significantly greater than SSE we
can reject and infer that a significant amount of the variation in the dependant variable can
be explained by changes in independent variable.
Remember variation explained is when variation in the dependant variable can be
explained by changes in the independent variable
Method 3: t-Test of the Slope (student-t distributed)
While we can use ANOVA, a t-test is usually quicker. If the line of best fit between two
variables in the sample is horizontal (has a slope of zero), the same value of y-hat is
estimated for every value of x (since y-hat is on the vertical axis and the x-value is on the
horizontal axis). We can make inferences about the population slope B1 from the sample
slope b1 to determine whether there is a relationship between the variables e.g. B ≠ 
Since b1 is an unbiased estimator of B1 we can use b1 for hypothesis testing. The sampling
distribution is student-t distributed with n 2 degrees of freedom. Refer to pg. 59 for more
detail
The t-test ANOVA equivalent is the first two tailed test. However with t-tests, we can do
more than we can with ANOVA. With t-tests, we can test whether the linear relationship is
positive or negative too unlike ANOVA
When conducting a one-tailed test, the value of the test statistic would be the same as in a
two-tailed test. However, the rejection region is different (BECAUSE OF SIGNIFICANCE
LEVEL) and the p-value of the one-tailed test is half of that of the two-tailed test.
Method 4: Coefficient of Determination
- The t-test of the slope and ANOVA only address the question of whether there is
enough evidence to infer that a linear relationship exists
- However to measure the strength of the linear relationship (how close the actual
values are to the predicted ones), we use the coefficient of determination (denoted
R^2)
- The coefficient of determination (R^2) is a measure of the amount of variation in the
dependant variable that is explained by the variation in the independent variable
- Remember, if there is a linear relationship then y values will differ (if they were
constant, then the line would be horizontal, which means that x does not affect y,
and the y-value remains constant)
- In other words the coefficient of determination is a measure of how much of the
changes in y we can explain from changes in x
find more resources at oneclass.com
find more resources at oneclass.com
You're Reading a Preview

Unlock to view full version