Tuesday January 11, 2011
Regression equation: y = a + bx
y = the predicted value, or the dependent variable
and x = the independent variable
a = intercept (y when x = 0 )
b = the slope of a line
we're going to discuss this more next week
regression and correlation fit together in a way, since correlation is defined by a regression
people think interval data is more powerful than nominal or ordinal....
if we want to predict y from x the scores from x must be at least as efficient as the scores of y
as the mean of y itself.
So we have two axes and the idea is that there's one independent and one dependent
variable. These variables have ranks that some things are higher than others, and that
there's a constant interval between each of the units. It's measured in some set of units i.e. x
= years of education. Now the idea here is that we can have a number of individual cases
that are ranked on x or ranked on y. the best predictor of y (just y, alone) would be whatever
the mean score of y is. and the reason that it' s the best predictor is that y would give us the
smallest average deviation of the mean of y. so the mean of the sample becomes the standard
of which we compare any other variable.
so the square deviation of the mean score will let us decide if x is useful.
pg. 309 in brians (only just touched upon on that page).
there's more than one kind of variance or variation. in fact there's 3 we need to worry about...
Types of Variation
1) Total Variance
- It equals the sum of the square differences from the mean on any variable
- The average squared deviation = the total variation of the y variable
- "The least squares line"
2) Unexplained Variance
- Everything that's left over (that doesn't touch the regression "slope" aka the
"y = a + bx")
3) Explained Variance
- Total variation of y that can be attributed to the influence of x
- Pearson's correlation = the square root of the variance.
- So r = the square root of explained variance.