Chapter 20: Multiple Regression
20.1 The Linear Multiple Regression Model
• Multiple regression: a linear regression with two or more predictors whose
coefficients are found by least squares.
o When the distinction is needed, a least squares linear regression with a
single predictor is called a simple regression.
y=β +0 x1+…1β x +εk k
y=b +b x +…+b x +e
0 1 1 k k
• The equation for the residuals remains the same:
• The degrees of freedom will now depend on the number of predictors we have (
1 for the intercept):
• The standard deviation of the residuals is:
s = ∑ (y−y)
e √ n−k−1
• tratio for coefficients: the tratios for the coefficients can be used to test the null
hypothesis that the true values of each coefficient is zero against the alternative
that it is not.
o The tdistribution is also used in the construction of CIs for each slope
o The Pvalue can be obtained to test the hypothesis of whether the true
coefficients are 0.
20.2 Interpreting Multiple Regression Coefficients
• Multiple regression coefficients must always be interpreted in terms of the other
predictors in the model.
o I.e. house space and # of bedrooms; coeffient on bedrooms was – because
as the number of bedrooms increases (with the size of the house remaining
constant) the price of the house will drop (there is less space for all the
other rooms. • Be careful not to assume causation between the predictor variables and the
20.3 Assumptions and Conditions for the Multiple Regression Model
• Linearity Assumption
o Check the Linearity Condition for each of the predictors.
• Linearity Condition
o Are the scatterplots of y against each of the predictors reasonably straight?
o Plot the residuals against the predicted values to ensure no patterns exist.
• Independence Assumption
o The errors in the true underlying regression model must be independent of
o Check the Randomization Condition.
• Randomization Condition
o Randomization assures us that the data are representative of some
o Plot the residuals vs. the predicted values for evidence of patterns, trends,
Can also plot the residuals against each of the explanatory
variables in the model.
• This will help determine if a reexpression is needed for
o With timeseries data, check for autocorrelation.
• Equal Variance Assumption
o Variability of the errors should be about the same for all values of each
o Check the equal spread condition.
• Equal Spread Condition
o Plot the residuals vs. the predicted values and look for clues of a violation
of equal spread.
o Check for homoscedasticity in the scatterplot of predicted values vs. the x
• Normality Assumption
o We assume the errors around the idealized regression follow a Normal
o If the sample size is large, we have little to worry about.
o Check the Nearly Normal Condition
• Nearly Normal Condition
o Make a h