Textbook Notes (378,682)
CA (167,218)
UTSC (19,212)
Statistics (135)
STAB22H3 (130)
Moras (15)
Chapter 2

Chapter 2 (After Midterm)

5 Pages
115 Views

Department
Statistics
Course Code
STAB22H3
Professor
Moras

This preview shows pages 1-2. Sign up to view the full 5 pages of the document.
Chapter 2.3- 2.6
2.3 Regression
- explains the relationship between two variables only when one variable helps explain or
predicts the other
Regression line: a straight line which shows a response variable y changes as the explanatory x
varable changes
-Used to predict the value of y for a given value of x
2.12
Increase in energy use is an explanatory variable and fat gain is a response variable
Fitting a line = line of best fit
y = a + bx
-where b is the slope and a is the intercept (when x=0)
-b is the rate of change; as x changes by one unit, y changes by the slope value
-y = prediction where as y = actual observation
Extrapolation: use of the regression line for prediction far outside of the range of values
-not so accurate because cannot say whether the graph continues to increase or decrease or
whether the relationship remains linear at extreme values
Least-squares Regression
-y = a + bx
-method of fitting a line to a scatter plot
-it is not resistant to outliers
-a line that is close as possible to all the points in the vertical direction
-error = observed – predicted
-e > 0 when observed is greater than prediction and vice versa
-makes prediction errors as small as possible
-least-squares regression line of y on x is the line that makes the sum of the errors = 0
-sum of error: e1 +e2+e3 = 0
-sum of sq of errors = to the smallest value
To calculate (do not round values that are further used in calculations):
b = r (sy / sx)
a = meany – b(meanx)
Interpreting the regression
-when changing units, correlation does not change however the least-squares line does
change
-least sq regression line always passes through the point: (x mean, y mean)
-when mean = 0 and s = 1, the regression line passes through the origin and has a slope = r
www.notesolution.com
-correlation of r is the slope of the least sq regression line when in standardized unit
-if b >0 then r >0
-R2 = (r)2 r = + sq root of R2
Correlation and regression
-correlation is the relationship between two quantitative variables whereas regression is the
relationship of explanatory and response
Square of correlation (r2): is the proportion of the variation in the y that is explained by x
-r2 as the measure of how successful the regression explains the response
ex: if r = +/-.7 r2 = .49 which is 50% of the variation is accounted for the linear
relationship
r = -/+1, r 2 = 1 which indicates that all the variation in one variable is accounted by the
linear relationship of the other variable
Another way to calculate r2
r2 = variance of predicted value y / variance of observed value y
Transforming relationships
2.18
- if relationship is not linear, take the logarithm to get a linear relationship
2.4 Cautions about correlation and regression
Residuals
-shows how far the data fall from the regression line
-observed y – predicted y(regression line)
-mean of the least sq residual is always ZERO
Residual Plot
-is a scatter plot of regression residual vs. explanatory variable
-if scatter plot with a regression line has a pattern then the residual plot should have NO
pattern in the residuals.
-If scatter plot is curved rather than linear, the residuals will follow a curve pattern
2.20
- regression line does not catch the important fact that the variability of field measurements
increases with defect depth increases
Outliers and Influential Observations
-When a point is within a range of a scatter plot, the residual is large
-Influential: a point that is outside the range and can make a huge difference to the
regression line when removed
-If influential has a small residual then the difference in regression line is small
-If the influential has a big residual then the difference in the regression line is huge
www.notesolution.com

Loved by over 2.2 million students

Over 90% improved by at least one letter grade.

Leah — University of Toronto

OneClass has been such a huge help in my studies at UofT especially since I am a transfer student. OneClass is the study buddy I never had before and definitely gives me the extra push to get from a B to an A!

Leah — University of Toronto
Saarim — University of Michigan

Balancing social life With academics can be difficult, that is why I'm so glad that OneClass is out there where I can find the top notes for all of my classes. Now I can be the all-star student I want to be.

Saarim — University of Michigan
Jenna — University of Wisconsin

As a college student living on a college budget, I love how easy it is to earn gift cards just by submitting my notes.

Jenna — University of Wisconsin
Anne — University of California

OneClass has allowed me to catch up with my most difficult course! #lifesaver

Anne — University of California
Description
Chapter 2.3- 2.6 2.3 Regression - explains the relationship between two variables only when one variable helps explain or predicts the other Regression line: a straight line which shows a response variable y changes as the explanatory x varable changes - Used to predict the value of y for a given value of x 2.12 Increase in energy use is an explanatory variable and fat gain is a response variable Fitting a line = line of best fit y = a + bx - where b is the slope and a is the intercept (when x=0) - b is the rate of change; as x changes by one unit, y changes by the slope value - y = prediction where as y = actual observation Extrapolation: use of the regression line for prediction far outside of the range of values - not so accurate because cannot say whether the graph continues to increase or decrease or whether the relationship remains linear at extreme values Least-squares Regression - y = a + bx - method of fitting a line to a scatter plot - it is not resistant to outliers - a line that is close as possible to all the points in the vertical direction - error = observed predicted - e > 0 when observed is greater than prediction and vice versa - makes prediction errors as small as possible - least-squares regression line of y on x is the line that makes the sum of the errors = 0 - sum of error: e1 +e2+e3 = 0 - sum of sq of errors = to the smallest value To calculate (do not round values that are further used in calculations): b = r (sy sx) a = mean yb(mean ) x Interpreting the regression - when changing units, correlation does not change however the least-squares line does change - least sq regression line always passes through the point: (x mean, y mean) - when mean = 0 and s = 1, the regression line passes through the origin and has a slope = r www.notesolution.com
More Less
Unlock Document


Only pages 1-2 are available for preview. Some parts have been intentionally blurred.

Unlock Document
You're Reading a Preview

Unlock to view full version

Unlock Document

Log In


OR

Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit