Chapter 5 – Regression
Regression Lines
• Regression Lines – a straight line that describes relationship b/w explanatory and response variables
◦ we use to to predict value of y for given value of x
• a review of straight lines:
◦ y = a + bx
▪ a = intercept point
• a = ybar - b(xbar)
• when x and y are 0
• x is whatever the 'x' is that lines up with 'a'
▪ b = slope (amount by which y changes when x increases by one unit); change of one SD in x
corresponds to change of r SD in y
• b = r(sy/sx)
• also known as “rate of change”
• when units changes, slope changes
Example 5.2) Using a Regression line
Refer to figure 5.1. x is non exercise activity and y is fat gain.
“b” = -0.00344
tells us, on avg, fat gained goes down by 0.00344 kg for each added calorie to the NEAchange
a = 3.505
it's the predicted fat gain when x is 0
Fat gain = a + (b x “non-exercise” activity change)
Fat gain = 3.505 – 0.00344 x NEAchange
= 3.505 – 0.00344 x 400
= 2.13 kg
• size of slope depends on units which we measure two variables
◦ also, slope is a numerical description of the relationship b/w two variables, it does not mean change in NEA
has little effect on fat gain
◦ you can't determine how important a relationship is by looking at the size of the slope of the regression line
The Least-Squares Regression Line
• prediction errors – we make are in y
◦ good regression line makes vertical distances of points from the line as small as possible
◦ error = observed response – predicted response
• least-regression line – of y on x is the line that makes the sum of the squares of vertical distances of data
points, as small as possible
◦ ŷ = a + bx
▪ the line gives a predicted response, ŷ, for any x
◦ ŷ - ybar / sy = r ( x-xbar / sx )
◦ least-squares regression always passes the point (xbar, ybar)
Facts about Least-Squares Regression
1. Distinction b/w explanatory and response variable is essential in regression.
2. Slope and correlation always have same sign. If scatterplot has positive association, b and r are both positive.
If r=1 or -1, change in predicted response, ŷ, is same (in standard deviation units) as change in x. If correlation
is less strong, ŷ changes less.
3. The least-squares regression line always pass through point (x bar, y bar).
4. The square of the correlation, r^2, is fraction of the variation in the values of y that is explain by least-
squares regression of y on x.
• r^2 = variation in ŷ along regression line as x varies/ total variation in observed values of y
◦ what it means is that about x% of a variation is explained by linear relationship/regression
• the main idea is that, as x changes, y changes Residuals
• residuals – difference b/w observed value for y and value predicted by regression line. It's a prediction error
that remains after we have chose the regression line:
◦ residual = observed y – predicted y
= y – ŷ
Example 5.5 – I feel your pain)
r = 0.515.
ŷ = -0.0578 + 0.00761x
For empathy score of 38, (empathy on x scale so it is x)..
ŷ = -0.0578 + 0.00761(38) = 0.231
At subjects brain activity level was as -0.120. The residual is
= y – ŷ

More
Less