Chapter 7: Introduction to Linear Regression
7.1 The Linear Model
Linear model: an equation of a straight line through the data.
o Does not have to touch any point (it is the line that comes closest to all
o Written in the form of ̂ where and are numbers
estimated from the data and ̂ is the predicted value.
The hat on y is used to distinguish the predicted value from the
observed, y, value.
Predicted value: the prediction for y found for each x-value in the data. A
predicted value, ̂, is found by substituting the x-value in the regression
o The predicted values are the values on the fitted line; (x, ̂) lies exactly
on the fitted line.
Residual: the difference between the actual data value and the corresponding
value predicted by the regression model (or any model).
Line of best fit: the line for which the sum of the squared residuals is smallest
– often called the least squares line.
o It has a special property that the variation of data around the model
(as seen in the residuals) is the smallest it can be for any straight line
model for these data.
7.2 Correlation and the Line
Slope: tells us how the response variable changes for a one-unit step in the
o The slope of the least squares line is:
o r = correlation; s xnd s aye the s.d.’s of x and y respectively.
o The slope gets it’s units from the ratio of the s.d.’s and is expressed as
the units of y per unit of x.
Intercept: gives us a starting value in y-units (x = 0).
o It usually has no meaning.
o By subbing in the average values of y and x, we can find the intercept:
Regression line: the particular linear equation that satisfies the least squares
criterion, often called the line of best fit.
Same conditions have to be checked for regression like was done for
o Quantitative Variables Condition o Linearity Condition
o Outlier Condition
For z-scored data:
In general, moving any number of s.d.’s in x moves our prediction r times that
number of s.d.’s in y.
7.3 Regression to the Mean
Regression to the mean: because the correlation is always less than 1.0 in
magnitude, each predicted y tends to be fewer s.d.’s from its mean than its
corresponding x from its mean.
Two regression lines can be made:
o One where x is the explanatory variable.
Minimize the vertical distances between the points and the
o One where y is the explanatory variable.
Minimize the horizontal distances between the points and the
o If correlation = 1, the two lines are identical and all the data points lie
exactly on the one line.
7.4 Checking the Model
Linear models make sense only for quantitative data.
Check the scatterplot to determine if the relationship between the two
variables is reasonably straight.
Outlying points can dramatically change a regression model.
o They may have large residuals that, when squared, have a large
So large they may in fact change the sign of the slope.
7.5 Learning More from the Residuals
Plotting the residuals (y-axis) against t