STA215H5 Lecture Notes - Lecture 8: Confounding, Dependent And Independent Variables, Bee Sting
STA215; Chapter 8
Recall; Residuals
● A residual is the difference between an observed value of the response variable and the
value predicted by the regression line. That is, a residual is the prediction error that
remains after we have chosen the regression line.
● Residual = observed y - predicted y. We will write this as: residual = y − y
● By hand;
○ To find the residuals by hand, you will need to find the prediction, ˆy, for each
observation, y. Then subtract the two as such: y − yˆ
○ NOTE: The correlation between the residuals and x is 0 (up to round off error).
This is a special property of the least squares regression.
Influential Observations
● An observation is influential for a statistical calculation if removing it would significantly
change the result of the calculation.
● The result of a statistical calculation may be of little practical use if it depends strongly on
a few influential observations.
● Points that are outliers in either the x or the y direction of a scatterplot are often
influential for the correlation. Points that are outliers in the x direction are often influential
for the least-squares regression line.
● Example;
○ a) Make a scatterplot of the data that is suitable for predicting metabolic rate from
body mass, with two new points added. Point A: mass 42 kilograms, metabolic
rate 1500 calories. Point B: mass 70 kilograms, metabolic rate 1400 calories.
■
○ b) In which direction is each of these points an outlier?
■ Point A lies above the other points; that is, the metabolic rate is higher
than we expect for the given body mass. Point B lies to the right of the
other points; that is, it is an outlier in the x (mass) direction, and the
metabolic rate is lower than we would expect
○ c) Add three least-squares regression lines to your plot: one for the original 12
women, one for the original women plus Point A, and one for the original women
plus Point B.
○ d) Which new point is more influential for the regression line? Explain why
Document Summary
A residual is the difference between an observed value of the response variable and the value predicted by the regression line. That is, a residual is the prediction error that remains after we have chosen the regression line. Residual = observed y - predicted y. We will write this as: residual = y y. To find the residuals by hand, you will need to find the prediction, y, for each observation, y. Then subtract the two as such: y y . Note: the correlation between the residuals and x is 0 (up to round off error). This is a special property of the least squares regression. An observation is influential for a statistical calculation if removing it would significantly change the result of the calculation. The result of a statistical calculation may be of little practical use if it depends strongly on a few influential observations.