STA302H1 Midterm: STA302 University of Toronto St George Midterm Cheat Sheet Alfred Benn
I. Simple Linear Regression Model
1. Model: , ,
• Parameters: , ,
• Variables: X (known/explanatory/predictor variable)
Y (known random/response/dependent variable)
(random error in )
• ,
2. Model Assumptions:
i. y is related to x by the simple linear regression model
i.e.,
ii. The errors are independent of each other
iii. The errors have a common variance
iv. The errors are normally distributed with a mean of 0 and
variance , that is
v. Values of the predictor variable are known
fixed constants
3. Residual
i.
ii.
iii. Usually,
. To minimize the least square
estimates (if every point is on the least square line),
4. Least Square Estimates
i.
ii.
*
iii. Estimate of :
(2 parameters: , )
5. Inference of ,
i.
,
(
is a linear combination of )
•
,
(
is an unbiased estimator of )
,
• Hypothesis test: whether x and y have linear relationship
,
if is true,
;
Reject when
,
or
• confidence interval for :
ii.
•
,
(
is an unbiased estimator of )
,
• Hypothesis test
*If is true,
• confidence interval for :
6. Confidence Interval for the Population Regression Line (at a
given value of x*)
,
confidence interval for y*:
7. Prediction Interval for the Actual Value of y
i. Confidence interval: reported for a parameter (, )
Prediction interval: reported for the value of a random variable
(value range of y*)
*Prediction interval is wider than the confidence interval
ii.
prediction interval for Y*:
8. Analysis of Variance: to test whether there is a linear association
between y and x (ANOVA, using F-test)
i. F-test is for multiple linear regression cases, but can also fit the
simple linear regression model
ii. Hypothesis test ;
*If is true,
,
*Reject at level if
iii. Total sample variability:
,
Variability explained by the model:
,
,
Unexplained (or error) variability:
,
,
* ,
iv.
,
9. Pearson (Sample) Correlation Coefficient: symmetric measure of
linear association between x and y
i.
,
ii. : fall exactly on line
: no linear relationship
: positive relationship between x and y
: negative relationship between x and y
*
slope:
and
iii. : occur in simple linear regression only
*
iv. Given ,
and
and
*
II. Diagnostics and Transformation for Simple Linear Regression
1. Regression Diagnostics: Tools for Checking the Validity of a
Model: i) Standardized residual plots: model’s validity; ii)
Whether there are leverage points and outliers; iii) If leverage
points exist, determine whether each is a bad leverage point
(assess its influence on the line); iv) Whether the assumption of
find more resources at oneclass.com
find more resources at oneclass.com
Document Summary
The errors are normally distributed with a mean of 0 and. Least square estimates: simple linear regression model estimates (if every point is on the least square line), y is related to x by the simple linear regression model, variables: x (known/explanatory/predictor variable) = (cid:4666)(cid:1877)(cid:3036) (cid:1877)(cid:3114) (cid:4667)(cid:2870) (cid:3041)(cid:3036)=(cid:2869) (cid:3041)(cid:3036)=(cid:2869: usually, (cid:1857)(cid:3114) (cid:2870) = (cid:4666)(cid:3051)(cid:3284) (cid:3051) (cid:4667)(cid:3052)(cid:3284) (cid:3051)(cid:3284)(cid:3052)(cid:3284) (cid:3041)(cid:3051) (cid:3052) (cid:3284)=(cid:3117) (cid:4666)(cid:3051)(cid:3284) (cid:3051) (cid:4667)(cid:3118) (cid:3284)=(cid:3117) (cid:4666)(cid:3051)(cid:3284) (cid:3051) (cid:4667)(cid:3118) (cid:3284)=(cid:3117) (cid:4666)(cid:3051)(cid:3284) (cid:3051) (cid:4667)(cid:3118) (cid:3284)=(cid:3117) (cid:3284)=(cid:3117) (cid:3284)=(cid:3117) = (cid:2869)(cid:3041) (cid:2870) (cid:1857)(cid:3114) (cid:2870: estimate of (cid:2870): (cid:1871)(cid:2870)= (cid:4666)(cid:3052)(cid:3284) (cid:3052)(cid:3362) (cid:4667)(cid:3118) (cid:3041)(cid:3036)=(cid:2869) (cid:3284)=(cid:3117)(cid:3041) (cid:2870) (2 parameters: (cid:2868), (cid:2869)) , (cid:1855)(cid:3036)=(cid:3051)(cid:3284) (cid:3051) (cid:3041)(cid:3036)=(cid:2869) (cid:3284)=(cid:3117) (cid:4666)(cid:3051)(cid:3284) (cid:3051) (cid:4667)(cid:3118) (cid:3020)(cid:3025)(cid:3025) ((cid:2869) is a linear combination of (cid:1877)(cid:3036)) (cid:3284)=(cid:3117: (cid:4666)(cid:2869) |x(cid:4667)=(cid:2869), var(cid:4666)(cid:2869) |x(cid:4667)= (cid:3118)(cid:3020)(cid:3025)(cid:3025) ((cid:2869) is an unbiased estimator of (cid:2869)) (cid:2868):(cid:2869)=(cid:882) (cid:4666)(cid:1876) (cid:1866)(cid:1856) (cid:1877) (cid:1857) (cid:1866)(cid:1867) (cid:1861)(cid:1866)(cid:1857)(cid:1870) (cid:1870)(cid:1857)(cid:1872)(cid:1861)(cid:1867)(cid:1866)(cid:1871) (cid:1861)(cid:1868)(cid:4667), if (cid:2868) is true, t= (cid:3081)(cid:3117) (cid:3046)(cid:4666)(cid:3081)(cid:3117) (cid:4667)~(cid:1872)(cid:3041) (cid:2870); (cid:2869): (cid:2869) (cid:882)