Chapter 19: Understanding Regression Residuals
19.1 Examining Residuals for Groups
• Looking at the residuals plotted against the explanatory (y) variable can help
uncover new information.
• Look to see whether groups who’s residuals are high/low/0 share common
characteristics (i.e. cereal example with cereals separated by their shelf level).
19.2 Extrapolation and Prediction
• Extrapolation: although linear models provide an easy way to predict values of y
for a given value of x, it’s unsafe to predict for values of x far from the ones used
to find the linear model equation.
o This is called extrapolating.
o Predicting values far away requires the assumption that the relationship
between x and y will not change (not often the case).
o When the x variable is time, we should be very wary of extrapolation.
19.3 Unusual and Extraordinary Observations
• Two ways a point can stand out:
o Outlier: a data point that stands away from the regression line by having a
large residual is called an outlier.
o Leverage: data points whose xvalues are far from the mean of x are said
to exert leverage on a linear model.
They can pull the line close to them, sometimes completely
determining the slope and intercept.
Points with a high enough leverage can have deceptively small
They may lie on the line that best fits the other data points but this
will enhance the r and R values.
• When a highleverage point exists, two regression lines should be fit to the data,
one with and one without it.
o Influence: if omitting a point from the data changes the regression model
substantially, the point is considered influential.
19.4 Working with Summary Values
• Scatterplots of statistics summarized over groups tend to show less variability
than we’d see if we measured the same variables on individuals.
o I.e. means vs. normal data points
o They exhibit less scatter than the individuals they are based on.
o Will result in higher correlations for the statistics.
• Lesson: do not think that since lines fit the statistics very well that they will fit the
individual points just as well.
19.5 Autocorrelation • Data points collected at the same point in time will be related.
• When values at time t are correlated with values at time t – 1, they are said to be
o When values are correlated with points two periods back, it is said that
secondorder autocorrelation is present.
• Autocorrelation is a violation of an assumption for regression.
• DurbinWatson statistic: allows us to test for autocorrelation. Calculated by
summing the squares of the differences between consecutive residuals and
dividing by its expected value under the null of no autocorrelation.
∑ (et−e t−1)
∑ e t
where et is the residual at time t. The statistic always falls in the interval from
0 to 4.
o When the null is true, D should be 2.
• Outcomes when testing for positive autocorrelation:
o If D 4 – d ,Lthen there is evidence of negative autocorrelation
o If 4 – d L