STA302H1 Study Guide - Final Guide: Simple Linear Regression, Statistical Inference, Type I And Type Ii Errors

245 views2 pages
22 Jun 2018
School
Department
Course
Professor
Chapter 5: Multiple Linear Regression
Estimation and Inference in Multiple Linear
Regression
1. Model: ,

: random fluctuation (error) in such that

p+2 parameters: 
p coefficients: 
,

2. Matrix Formulation of Least Square Estimates
,



 
   ,
,
, where 
i. 
,


 
 
ii. , ,



 (unbiased)
3. Tests of Linearity
i. Test whether there is a linear association
between Y and all



ii. T-test:



ifis true (test each one at a time)
iii. F-test: 

ifis true (test all at once)
Total sample variability:


• Variability explained by the model:


Residual sum of squares:


 if there is a linear
relationship between Y and all

 
 , : how much
variation in y can be explained by the
model
Chapter 6: Diagnostics and Transformations
for Multiple Linear Regression
I. Regression Diagnostics for Multiple Regression
1. Regression Diagnostics: (i) The validity of the
model: standardized residual-fitted value (
) plot, standardized residual-predictor
variable () plots, marginal model plots; (ii)
Determine whether there are leverage points;
(iii) Determine whether there are outliers; (iv)
The effect of each predictor variable on the
response variable: added-variable plots; (v)
The extent of collinearity among the predictor
variables: variance inflation factors; (vi)
Determine whether the error variance is
constant; (vii) If the data are collected over time,
examine whether the data are correlated over
time.
2. Leverage Points in Multiple Regression
i. Hat Matrix:  ()


ii. If 
(in multiple regression with
p predictors), the ith point is a leverage point

 , where






*: the (i, j)th element of H,
: the ith diagonal element of H
3. Properties of Residuals in Multiple Regression
i. 
, 
ii. Standardized Residual:
,
where


iii. Using Residuals and Standardized
Residuals for Model Checking
(a) When a valid model has been fit, a plot of
- (or linear combination) will have the
following features:
A random scatter plot of points around
the horizontal axis ()
• Constant variability as we look along the
horizontal axis
(b) Any non-random (deterministic) pattern in
plots of indicates an invalid model has
been fit to the data
(c) In multiple regression, plots of the
residuals provide direct information on
how the model is misspecified when the
following two conditions hold:

plot -

plot (linear relationship)
*If both conditions do not hold, then a
pattern in a residual plot indicates that an
incorrect model has been fit, but the pattern
itself does not provide direct information on
how the model is misspecified.
*Premise: We already know that the model is
invalid, then we use the conditions to check
whether it is possible to improve the model
II. Using Transformation to Overcome
Nonlinearity Transforming Only the
Response Variable Using Inverse Regression
1. Suppose that the true regression model
between Y and  is given by:

Turn the model into a multiple regression
model by transforming Y by :

e.g. 

2. If we want to estimate g, we plot -
;
if we want to estimate , we plot
.
III. Multicollinearity and Variance Inflation
Factors
1. Multicollinearity: A number of important
issues arise when strong correlations exist
among the predictor variables
2. Variance Inflation Factors
i. First, consider a multiple regression model
with two predictors:
* : Pearson correlation coefficient
betweenand
* : the standard deviation of


 ()
: variance inflation factor
Correlation amongst the predictors ()
increases the variance of the estimated
regression coefficients
ii. Next consider the general regression model:

* : the value of  obtained from the
regression of on the other
(variability explained by the model)


 (),
: the jth variance inflation factor
If the predictor variables are correlated,
would be close to 1. Then
would
be very large, p-value would be very large
(statistically insignificant), and the
confidence interval would be wide.
Chapter 7: Variable Selection
I. Evaluating Potential Subsets of Predictor
Variables–AIC (Akaike’s Information
Criterion)
1. Definition: AIC is an estimator of the relative
quality of statistical models for a given set of
data
2. Derivation:
Suppose  are the observed
values of normal random variables




Likelihood: 




Let
,


Log-Likelihood: 



,
where 


(R-output: 
)
*The smaller the AIC, the better the model.
3. Use: model selection. Given a collection of
models for the data, AIC estimates the quality
of each models, relative to each of the other
models.
4. AIC tells nothing about testing a null
hypothesis and the absolute quality of a model.
AIC only tells the quality relative to other
models.
II. Deciding on the Collection of Potential Subsets
of Predictor Variables
1. “Best”: the best choice is the set of predictors
with the smallest value of RSS
* max(or 
) min RSS min AIC
2. Forward Stepwise Regression
i. Definition: Forward stepwise starts with no
potential predictor variables in the regression
equation. Then at each step, it adds the
predictor such that the resulting model has the
lowest value of an information criterion. This
process is continued until all variables have
Unlock document

This preview shows half of the first page of the document.
Unlock all 2 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Regression: model: (cid:1851)(cid:3036)=(cid:2010)(cid:2868)+(cid:2010)(cid:2869)(cid:1876)(cid:2869)(cid:3036)++(cid:2010)(cid:1876)(cid:3036)+(cid:1857)(cid:3036), (cid:1857)(cid:3036)~(cid:1861)(cid:1861)(cid:1856) (cid:4666)(cid:882),(cid:2870)(cid:4667),(cid:1861)=(cid:883),(cid:884),,(cid:1866, (cid:1857)(cid:3036): random fluctuation (error) in (cid:1851)(cid:3036) such that (cid:4666)(cid:1857)(cid:3036)|x(cid:4667)=(cid:882, p+2 parameters: (cid:2010)(cid:2868),(cid:2010)(cid:2869),,(cid:2010), p coefficients: (cid:2010)(cid:2869),(cid:2010)(cid:2870),,(cid:2010, (cid:4666)(cid:1877)(cid:3036)|x(cid:4667)=(cid:2010)(cid:2868)+(cid:2010)(cid:2869)(cid:1876)(cid:2869)(cid:3036)++(cid:2010)(cid:1876)(cid:3036), Y=(cid:3438)(cid:1877)(cid:2869)(cid:1877)(cid:2870)(cid:1709)(cid:1877)(cid:3041)), x=(cid:3438)(cid:883)(cid:883) (cid:1876)(cid:2869)(cid:2869)(cid:1876)(cid:2870)(cid:2869) (cid:1710) (cid:1876)(cid:2869)(cid:1876)(cid:2870) (cid:883) (cid:1876)(cid:3041)(cid:2869) (cid:1710) (cid:1876)(cid:3041)), (cid:1709) (cid:1709) (cid:574)=(cid:3438)(cid:2010)(cid:2869)(cid:2010)(cid:2870)(cid:1709)(cid:2010)), e=(cid:3438)(cid:1857)(cid:2869)(cid:1857)(cid:2870)(cid:1709)(cid:1857)(cid:3041)) (cid:1372) y=x(cid:574)+e, where var(cid:4666)e(cid:4667)=(cid:2870)(cid:1835)(cid:3041) (cid:3041: (cid:2010) =(cid:4666)(cid:1850) (cid:1850)(cid:4667) (cid:2869)(cid:1850) (cid:1851) (cid:1372) (cid:4666)(cid:2010) |x(cid:4667)=(cid:574), (cid:1876) ii. (cid:1851) =(cid:1850)(cid:2010) , (cid:1857) =(cid:1851) (cid:1851) =(cid:1851) (cid:1850)(cid:2010) , (cid:3041) (cid:2869)= (cid:2869)(cid:3041) (cid:2869) (cid:1857)(cid:3114) (cid:2870) (cid:1871)(cid:2870)= (cid:3019)(cid:3020)(cid:3020) (cid:3041)(cid:3036)=(cid:2869) between y and all (cid:1876)(cid:3036) (cid:1834)(cid:2868): (cid:2010)(cid:2869)=(cid:2010)(cid:2870)==(cid:2010)=(cid:882) (cid:1834)(cid:2869):(cid:1853)(cid:1872) (cid:1864)(cid:1857)(cid:1853)(cid:1871)(cid:1872) (cid:1871)(cid:1867)(cid:1865)(cid:1857) (cid:1867)(cid:1858) (cid:1872) (cid:1857) (cid:2010)(cid:3036) (cid:882) (cid:4666)(cid:1861)=(cid:883),(cid:884),,(cid:1868)(cid:4667) (cid:3046)(cid:3032)(cid:4666)(cid:3081)(cid:3362) (cid:4667)~(cid:1872)(cid:3041) (cid:2869) ii. T-test: (cid:1846)(cid:3036)=(cid:3081)(cid:3362) (cid:3081)(cid:3284) if (cid:1834)(cid:2868) is true (test each (cid:2010)(cid:3036) one at a time) iii. F-test: = (cid:3020)(cid:3020)/ (cid:3019)(cid:3020)(cid:3020)/(cid:3041) (cid:2869)~(cid:1832),(cid:3041) (cid:2869) if (cid:1834)(cid:2868) is true (test all (cid:2010)(cid:3036) at once) Rss= (cid:4666)(cid:1877)(cid:3036) (cid:1877)(cid:3114) (cid:4667)(cid:2870) (cid:3041)(cid:3036)=(cid:2869) relationship between y and all (cid:1876)(cid:3036: sst=(cid:1845)(cid:1845)(cid:3045)(cid:3032)+(cid:1844)(cid:1845)(cid:1845) if there is a linear, (cid:1844)(cid:2870)=(cid:3020)(cid:3020)(cid:3020)(cid:3020)(cid:3021) =(cid:883) (cid:3019)(cid:3020)(cid:3020)(cid:3020)(cid:3020)(cid:3021) , (cid:1844)(cid:2870) : how much, residual sum of squares, tests of linearity i.