Textbook Notes (362,734)
Statistics (112)
STAT151 (74)
Chapter 3

# Ch3.pdf

4 Pages
32 Views

School
University of Alberta
Department
Statistics
Course
STAT151
Professor
Paul Cartledge
Semester
Fall

Description
Ch. 3 – Intro to Correlation and Regression Ch. 2 deals with univariate data. This chapter, however, considers bivariate data and how two numerical variables are related. Methods of description are introduced here and formalized in Ch. 11. Terminology: x y Explanatory variable Response variable Independent variable Dependent variable Predictor variable Predicted variable Notation: - bivariate sample of size n: { (x1, 1 ), (2 ,2y ), …, nx n y ) } - sample means: x , y - sample std dev.: sx, sy Displaying relationships: Def’n: An association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable. A scatterplot is a graphical display of two quantitative variables. - x-variable goes on the x-axis, y-variable on the y-axis - origin (0,0) may be included Look for: - form of relationship (i.e. any obvious pattern) - strength of relationship (i.e. closeness of fitting to a line) - direction of relationship (i.e. positive or negative association) - any unusual observations or outliers x y 1 1 2 2 4 1 3 2 (graph of above data used to discuss scatterplot traits further) Correlation: Def’n: Pearson’s Sample Correlation Coefficient r is given by n ⎛ ⎞⎛ ⎞ r = 1 ⎜ xi − x ⎟⎜y i− y ⎟ = 1 z z n 1 ∑i=⎝ sx ⎠⎜ sy ⎟ n − ∑ i i ⎝ ⎠ where z ixithe “standardized” observation for x andiz is thi “standardized” observation for y ior i = 1, …, n (example graphs of correlation drawn in class: 1. strong positive linear; 2. weak positive linear; 3. strong negative linear; 4. no pattern; 5. parabola; 6. exponential) Properties of r: • Can only be calculated for numerical data. • A measure of the LINEAR relationship between two variables. • -1 ≤ r ≤ 1 • The magnitude of r (or absolute value) measures the strength of the relationship: o If r = ± 1, then the points follow a straight line. o If r = 0, then the pattern of scatter suggest no linear relationship. • The sign of r indicates the nature of the relationship: o Positive association if r > 0, o Negative association if r < 0. • The two variables x and y play symmetric roles. • Location and scale invariance (unitless). • We can have r = 0, even when the data reveal a strong nonlinear relationship. o e.g. y = x 2 • Correlation does not imply causation (or vice versa). • Since r depends on the mean and std. dev., it is sensitive to outliers. 3.3 Intro to Simple Linear Regression Ex3.1) Suppose you had 4 variables for the Oilers roster: height, weight, jersey, age - which relationships might be valid? - how can we describe the relationship between any pair? - how do we use the description to make predictions? - how do we quantify errors in estimates and predictions? Def’n: The regression line predicts the value for the response variable y as a straight-line function of the value x of the explanatory variable. Equation for the regression line: ŷ = a + bx - a is the intercept: the height of the line at x = 0. - b is the slope: the amount by which y increases when x increases by 1 unit. - ŷ (“y-hat”) denotes the predicted value of y (or mean y for a given value of x). What about a new student who gets a mark of 80.1%? No observation so can we estimate the final mark based on the pattern of the other observations? Try and fit a line through the data and use it as a model for final percentage given midterm percentage; then, use the line to estimate (or, interpolate) the final percentage for a student that gets 8
More Less

Related notes for STAT151

OR

Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Join to view

OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.