STA302H1 Study Guide - Final Guide: Simple Linear Regression, Statistical Inference, Type I And Type Ii Errors
Chapter 5: Multiple Linear Regression –
Estimation and Inference in Multiple Linear
Regression
1. Model: ,
•: random fluctuation (error) in such that
• p+2 parameters:
• p coefficients:
• ,
2. Matrix Formulation of Least Square Estimates
,
,
,
, where
i.
,
ii. , ,
(unbiased)
3. Tests of Linearity
i. Test whether there is a linear association
between Y and all
ii. T-test:
ifis true (test each one at a time)
iii. F-test:
ifis true (test all at once)
• Total sample variability:
• Variability explained by the model:
• Residual sum of squares:
• if there is a linear
relationship between Y and all
•
, : how much
variation in y can be explained by the
model
Chapter 6: Diagnostics and Transformations
for Multiple Linear Regression
I. Regression Diagnostics for Multiple Regression
1. Regression Diagnostics: (i) The validity of the
model: standardized residual-fitted value (
) plot, standardized residual-predictor
variable () plots, marginal model plots; (ii)
Determine whether there are leverage points;
(iii) Determine whether there are outliers; (iv)
The effect of each predictor variable on the
response variable: added-variable plots; (v)
The extent of collinearity among the predictor
variables: variance inflation factors; (vi)
Determine whether the error variance is
constant; (vii) If the data are collected over time,
examine whether the data are correlated over
time.
2. Leverage Points in Multiple Regression
i. Hat Matrix: ()
•
ii. If
(in multiple regression with
p predictors), the ith point is a leverage point
•
, where
*: the (i, j)th element of H,
: the ith diagonal element of H
3. Properties of Residuals in Multiple Regression
i.
,
ii. Standardized Residual:
,
where
iii. Using Residuals and Standardized
Residuals for Model Checking
(a) When a valid model has been fit, a plot of
- (or linear combination) will have the
following features:
• A random scatter plot of points around
the horizontal axis ()
• Constant variability as we look along the
horizontal axis
(b) Any non-random (deterministic) pattern in
plots of indicates an invalid model has
been fit to the data
(c) In multiple regression, plots of the
residuals provide direct information on
how the model is misspecified when the
following two conditions hold:
•
plot -
•
plot (linear relationship)
*If both conditions do not hold, then a
pattern in a residual plot indicates that an
incorrect model has been fit, but the pattern
itself does not provide direct information on
how the model is misspecified.
*Premise: We already know that the model is
invalid, then we use the conditions to check
whether it is possible to improve the model
II. Using Transformation to Overcome
Nonlinearity – Transforming Only the
Response Variable Using Inverse Regression
1. Suppose that the true regression model
between Y and is given by:
Turn the model into a multiple regression
model by transforming Y by :
e.g.
2. If we want to estimate g, we plot -
;
if we want to estimate , we plot
.
III. Multicollinearity and Variance Inflation
Factors
1. Multicollinearity: A number of important
issues arise when strong correlations exist
among the predictor variables
2. Variance Inflation Factors
i. First, consider a multiple regression model
with two predictors:
* : Pearson correlation coefficient
betweenand
* : the standard deviation of
()
: variance inflation factor
Correlation amongst the predictors ()
increases the variance of the estimated
regression coefficients
ii. Next consider the general regression model:
* : the value of obtained from the
regression of on the other
(variability explained by the model)
(),
: the jth variance inflation factor
If the predictor variables are correlated,
would be close to 1. Then
would
be very large, p-value would be very large
(statistically insignificant), and the
confidence interval would be wide.
Chapter 7: Variable Selection
I. Evaluating Potential Subsets of Predictor
Variables–AIC (Akaike’s Information
Criterion)
1. Definition: AIC is an estimator of the relative
quality of statistical models for a given set of
data
2. Derivation:
• Suppose are the observed
values of normal random variables
•
•
• Likelihood:
Let
,
• Log-Likelihood:
,
where
•
(R-output:
)
*The smaller the AIC, the better the model.
3. Use: model selection. Given a collection of
models for the data, AIC estimates the quality
of each models, relative to each of the other
models.
4. AIC tells nothing about testing a null
hypothesis and the absolute quality of a model.
AIC only tells the quality relative to other
models.
II. Deciding on the Collection of Potential Subsets
of Predictor Variables
1. “Best”: the best choice is the set of predictors
with the smallest value of RSS
* max(or
) min RSS min AIC
2. Forward Stepwise Regression
i. Definition: Forward stepwise starts with no
potential predictor variables in the regression
equation. Then at each step, it adds the
predictor such that the resulting model has the
lowest value of an information criterion. This
process is continued until all variables have
Document Summary
Regression: model: (cid:1851)(cid:3036)=(cid:2010)(cid:2868)+(cid:2010)(cid:2869)(cid:1876)(cid:2869)(cid:3036)++(cid:2010)(cid:1876)(cid:3036)+(cid:1857)(cid:3036), (cid:1857)(cid:3036)~(cid:1861)(cid:1861)(cid:1856) (cid:4666)(cid:882),(cid:2870)(cid:4667),(cid:1861)=(cid:883),(cid:884),,(cid:1866, (cid:1857)(cid:3036): random fluctuation (error) in (cid:1851)(cid:3036) such that (cid:4666)(cid:1857)(cid:3036)|x(cid:4667)=(cid:882, p+2 parameters: (cid:2010)(cid:2868),(cid:2010)(cid:2869),,(cid:2010), p coefficients: (cid:2010)(cid:2869),(cid:2010)(cid:2870),,(cid:2010, (cid:4666)(cid:1877)(cid:3036)|x(cid:4667)=(cid:2010)(cid:2868)+(cid:2010)(cid:2869)(cid:1876)(cid:2869)(cid:3036)++(cid:2010)(cid:1876)(cid:3036), Y=(cid:3438)(cid:1877)(cid:2869)(cid:1877)(cid:2870)(cid:1709)(cid:1877)(cid:3041)), x=(cid:3438)(cid:883)(cid:883) (cid:1876)(cid:2869)(cid:2869)(cid:1876)(cid:2870)(cid:2869) (cid:1710) (cid:1876)(cid:2869)(cid:1876)(cid:2870) (cid:883) (cid:1876)(cid:3041)(cid:2869) (cid:1710) (cid:1876)(cid:3041)), (cid:1709) (cid:1709) (cid:574)=(cid:3438)(cid:2010)(cid:2869)(cid:2010)(cid:2870)(cid:1709)(cid:2010)), e=(cid:3438)(cid:1857)(cid:2869)(cid:1857)(cid:2870)(cid:1709)(cid:1857)(cid:3041)) (cid:1372) y=x(cid:574)+e, where var(cid:4666)e(cid:4667)=(cid:2870)(cid:1835)(cid:3041) (cid:3041: (cid:2010) =(cid:4666)(cid:1850) (cid:1850)(cid:4667) (cid:2869)(cid:1850) (cid:1851) (cid:1372) (cid:4666)(cid:2010) |x(cid:4667)=(cid:574), (cid:1876) ii. (cid:1851) =(cid:1850)(cid:2010) , (cid:1857) =(cid:1851) (cid:1851) =(cid:1851) (cid:1850)(cid:2010) , (cid:3041) (cid:2869)= (cid:2869)(cid:3041) (cid:2869) (cid:1857)(cid:3114) (cid:2870) (cid:1871)(cid:2870)= (cid:3019)(cid:3020)(cid:3020) (cid:3041)(cid:3036)=(cid:2869) between y and all (cid:1876)(cid:3036) (cid:1834)(cid:2868): (cid:2010)(cid:2869)=(cid:2010)(cid:2870)==(cid:2010)=(cid:882) (cid:1834)(cid:2869):(cid:1853)(cid:1872) (cid:1864)(cid:1857)(cid:1853)(cid:1871)(cid:1872) (cid:1871)(cid:1867)(cid:1865)(cid:1857) (cid:1867)(cid:1858) (cid:1872) (cid:1857) (cid:2010)(cid:3036) (cid:882) (cid:4666)(cid:1861)=(cid:883),(cid:884),,(cid:1868)(cid:4667) (cid:3046)(cid:3032)(cid:4666)(cid:3081)(cid:3362) (cid:4667)~(cid:1872)(cid:3041) (cid:2869) ii. T-test: (cid:1846)(cid:3036)=(cid:3081)(cid:3362) (cid:3081)(cid:3284) if (cid:1834)(cid:2868) is true (test each (cid:2010)(cid:3036) one at a time) iii. F-test: = (cid:3020)(cid:3020)/ (cid:3019)(cid:3020)(cid:3020)/(cid:3041) (cid:2869)~(cid:1832),(cid:3041) (cid:2869) if (cid:1834)(cid:2868) is true (test all (cid:2010)(cid:3036) at once) Rss= (cid:4666)(cid:1877)(cid:3036) (cid:1877)(cid:3114) (cid:4667)(cid:2870) (cid:3041)(cid:3036)=(cid:2869) relationship between y and all (cid:1876)(cid:3036: sst=(cid:1845)(cid:1845)(cid:3045)(cid:3032)+(cid:1844)(cid:1845)(cid:1845) if there is a linear, (cid:1844)(cid:2870)=(cid:3020)(cid:3020)(cid:3020)(cid:3020)(cid:3021) =(cid:883) (cid:3019)(cid:3020)(cid:3020)(cid:3020)(cid:3020)(cid:3021) , (cid:1844)(cid:2870) : how much, residual sum of squares, tests of linearity i.