Textbook Notes (362,815)
Statistics (125)
STAB22H3 (122)
Chapter 8

# Chapter 8.docx

3 Pages
71 Views

School
University of Toronto Scarborough
Department
Statistics
Course
STAB22H3
Professor
Mahinda Samarakoon
Semester
Winter

Description
Stats: Data and Models – Canadian Edition Chapter 8 – Linear Regression - A relationship can be modelled with a line and the equation of that line, which will allow us to predict the value of a variable, given a value of the related variable - Linear model – an equation of a straight line through the data - A straight line can summarize the general pattern with only a couple of parameters Residuals - Predicted value/fitted value/fit (y-hat) – the estimate made from a model; distinguished from the true value of y (y) - Residual – the difference between the observed value and its associated predicted value; tells us how far off the model’s prediction is at that point (y – y-hat) o A negative residual means that the predicted value is an overestimate o A positive residual means that the predicted value is an underestimate “Best Fit” Means Least Squares - To assess a line of best fit, square all of the residuals so that the values are all positive - Squaring emphasizes large residuals (as we are more concerned with points far from the line, than those close to the line) - The smaller the sum of the residuals, the better the fit - The line of best fit is the line for which the sum of the squared residuals is smallest; the least- squares line The Linear Model - y-hat = b + b x 0 1 o the predictions from our model follow a straight line o if the model is good, the values will scatter closely around it o b1= slope, b =0y-intercept - Slope is always expressed in y-units per x-unit - The y-intercept is the value the line takes when x = 0 (but sometimes 0 is not a plausible value for x, in which case we use the y-intercept as a starting value for our predictions) The Least-Squares Line - b = r(s /s ), where r is the correlation of the association, and s is the standard deviation 1 y x - If correlation is positive, the scatterplot runs from lower left to upper right, and the slope of the line is positive - Slope uses units – changing the units of the variables affects their standard deviation directly o Units of slop are always the units of y per unit of x - b = ȳ - b x-bar, knowing the slope and the fact that the line goes through the point (x-bar, ȳ), tells 0 1 us how to find the intercept - Least-square lines are commonly called regression lines Correlation and the Line - For standardized values: z-hat = rz y x  Moving one standard deviation from the mean in x, we can expect to move r standard deviations from the mean in y - If r = 0, there is no linear relationship How Big Can Predicted Values Get? - Regression to the mean: Each predicted y tends to be closer to its mean (in standard deviations) than its corresponding x was Residuals Revisited - Residuals are the part of the data that hasn’t been measured, residual = data – model o Or, e = y – y-hat - When a regression model is appropriate, it should model the underlying relationship - A scatterplot of residuals verses x-values should not have shape or direction, it should stretch horizontally and have the same amount of scatter throughout - Often computers plot residuals
More Less

Related notes for STAB22H3

OR

Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Join to view

OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.