Textbook Notes (280,000)

CA (170,000)

York (10,000)

MGMT (200)

MGMT 1050 (70)

Olga Kraminer (9)

Chapter 16

# MGMT 1050 Chapter Notes - Chapter 16: Explained Variation, Total Variation, Bias Of An Estimator

by OC295907

This

**preview**shows page 1. to view the full**5 pages of the document.**So when SSE is small, the fit is excellent and the model should be used. But how do we

know if the SSE is small enough? We judge the value of SEE by comparing to the mean of

the dependant variable (mean of sample y).

If SEE is small in comparison, then the linear regression can be used. However SEE has no

upper limit, so it is not great to measure whether a model is a good fit or not.

Method 2: ANOVA (F-test)

- The deviation between y and y- is a result of two sources of variation: 1) the

difference between y-hat and the sample mean of y-hat (the difference between the

predicted y value and the average y value) and 2) the difference between y and y-hat

(the difference between the actual value of y and the predicted value of y)

The difference between the predicted value of y and the average value of y can be

explained due to changes in x (since x is not constant, it makes sense that the predicted

value of y is different from the average value of y as x changes) - SSR

The other part, the difference between the actual value of y and the predicted value of y,

is residual – we cannot explain it with the model. This part of the difference is unexplained

by the variation in x - SSE

Thus, the total variation in y = the explained variation + the unexplained variation in y

The sum of squares for regression (SSR) is a measure of the explained variation in y

The sum of squares for error (SSE) is a measure of the unexplained variation in y

Sum of squares total (SS(Total)) is a measure of the total variation in y

SS(Total) = SSE + SSR

If there was no link between the two variables x and y then SSR would be zero and all of the

variation would be in SSE. A good regression model minimizes the value of SSE and

maximizes the value of SSR

We can use an F-test to determine whether SSR is significantly greater than SSE and that

the model is a good one. Our hypothesis are:

Ho: B1 = 0

Ho: B ≠

Since there is a linear relationship (not a horizontal line) when B ≠ . Instead of MSR/MSE,

we do SSR/SSE, divided by degree of freedom. A Large F-statistic or a low p-value indicates a

good model.

find more resources at oneclass.com

find more resources at oneclass.com

###### You're Reading a Preview

Unlock to view full version

Only page 1 are available for preview. Some parts have been intentionally blurred.

We then compare this F-statistic to our rejection region, which is always one tailed in the

upper tail. The test is one-tailed because only when SSR is significantly greater than SSE we

can reject and infer that a significant amount of the variation in the dependant variable can

be explained by changes in independent variable.

Remember variation explained is when variation in the dependant variable can be

explained by changes in the independent variable

Method 3: t-Test of the Slope (student-t distributed)

While we can use ANOVA, a t-test is usually quicker. If the line of best fit between two

variables in the sample is horizontal (has a slope of zero), the same value of y-hat is

estimated for every value of x (since y-hat is on the vertical axis and the x-value is on the

horizontal axis). We can make inferences about the population slope B1 from the sample

slope b1 to determine whether there is a relationship between the variables e.g. B ≠

Since b1 is an unbiased estimator of B1 we can use b1 for hypothesis testing. The sampling

distribution is student-t distributed with n – 2 degrees of freedom. Refer to pg. 59 for more

detail

The t-test ANOVA equivalent is the first two tailed test. However with t-tests, we can do

more than we can with ANOVA. With t-tests, we can test whether the linear relationship is

positive or negative too unlike ANOVA

When conducting a one-tailed test, the value of the test statistic would be the same as in a

two-tailed test. However, the rejection region is different (BECAUSE OF SIGNIFICANCE

LEVEL) and the p-value of the one-tailed test is half of that of the two-tailed test.

Method 4: Coefficient of Determination

- The t-test of the slope and ANOVA only address the question of whether there is

enough evidence to infer that a linear relationship exists

- However to measure the strength of the linear relationship (how close the actual

values are to the predicted ones), we use the coefficient of determination (denoted

R^2)

- The coefficient of determination (R^2) is a measure of the amount of variation in the

dependant variable that is explained by the variation in the independent variable

- Remember, if there is a linear relationship then y values will differ (if they were

constant, then the line would be horizontal, which means that x does not affect y,

and the y-value remains constant)

- In other words the coefficient of determination is a measure of how much of the

changes in y we can explain from changes in x

find more resources at oneclass.com

find more resources at oneclass.com

###### You're Reading a Preview

Unlock to view full version