Textbook Notes
(362,815)

Canada
(158,059)

University of Toronto Scarborough
(18,306)

Statistics
(125)

STAB22H3
(122)

Mahinda Samarakoon
(14)

Chapter 8

# Chapter 8.docx

by
OneClass999

Unlock Document

University of Toronto Scarborough

Statistics

STAB22H3

Mahinda Samarakoon

Winter

Description

Stats: Data and Models – Canadian Edition
Chapter 8 – Linear Regression
- A relationship can be modelled with a line and the equation of that line, which will allow us to
predict the value of a variable, given a value of the related variable
- Linear model – an equation of a straight line through the data
- A straight line can summarize the general pattern with only a couple of parameters
Residuals
- Predicted value/fitted value/fit (y-hat) – the estimate made from a model; distinguished from the
true value of y (y)
- Residual – the difference between the observed value and its associated predicted value; tells us
how far off the model’s prediction is at that point (y – y-hat)
o A negative residual means that the predicted value is an overestimate
o A positive residual means that the predicted value is an underestimate
“Best Fit” Means Least Squares
- To assess a line of best fit, square all of the residuals so that the values are all positive
- Squaring emphasizes large residuals (as we are more concerned with points far from the line, than
those close to the line)
- The smaller the sum of the residuals, the better the fit
- The line of best fit is the line for which the sum of the squared residuals is smallest; the least-
squares line
The Linear Model
- y-hat = b + b x
0 1
o the predictions from our model follow a straight line
o if the model is good, the values will scatter closely around it
o b1= slope, b =0y-intercept
- Slope is always expressed in y-units per x-unit
- The y-intercept is the value the line takes when x = 0 (but sometimes 0 is not a plausible value for
x, in which case we use the y-intercept as a starting value for our predictions)
The Least-Squares Line
- b = r(s /s ), where r is the correlation of the association, and s is the standard deviation
1 y x
- If correlation is positive, the scatterplot runs from lower left to upper right, and the slope of the
line is positive
- Slope uses units – changing the units of the variables affects their standard deviation directly
o Units of slop are always the units of y per unit of x
- b = ȳ - b x-bar, knowing the slope and the fact that the line goes through the point (x-bar, ȳ), tells
0 1
us how to find the intercept
- Least-square lines are commonly called regression lines
Correlation and the Line
- For standardized values: z-hat = rz
y x
Moving one standard deviation from the mean in x, we can expect to move r
standard deviations from the mean in y
- If r = 0, there is no linear relationship
How Big Can Predicted Values Get? - Regression to the mean: Each predicted y tends to be closer to its mean (in standard deviations)
than its corresponding x was
Residuals Revisited
- Residuals are the part of the data that hasn’t been measured, residual = data – model
o Or, e = y – y-hat
- When a regression model is appropriate, it should model the underlying relationship
- A scatterplot of residuals verses x-values should not have shape or direction, it should stretch
horizontally and have the same amount of scatter throughout
- Often computers plot residuals

More
Less
Related notes for STAB22H3