Class Notes (806,431)
Statistics (237)
STAT141 (21)
Lecture

# UASTAT141Ch7-10.pdf

5 Pages
53 Views

School
University of Alberta
Department
Statistics
Course
STAT141
Professor
Paul Cartledge
Semester
Winter

Description
Ch. 7 – Scatterplots, Association, and Correlation So far, we’ve seen univariate data. This section, however, considers bivariate data and how two numerical variables are related. Methods of description are introduced here and formalized in Ch. 27. Terminology: x y Explanatory variable Response variable Independent variable Dependent variable Predictor variable Predicted variable Notation: - bivariate sample of size n: { (x1, 1 ), 2x ,2y ), …, nx n y ) } - sample means: x , y - sample std dev.: sx, y Displaying relationships: Def’n: An association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable. A scatterplot is a graphical display of two quantitative variables. - x-variable goes on the x-axis, y-variable on the y-axis - origin (0,0) may be included Look for: - form of relationship (i.e. any obvious pattern) - strength of relationship (i.e. closeness of fitting to a line) - direction of relationship (i.e. positive or negative association) - any unusual observations or outliers Ex7.1) x y 1 1 2 2 4 1 3 2 (graph of above data used to discuss scatterplot traits further) Correlation: Def’n: Pearson’s Sample Correlation Coefficient r is given by n 1 ⎜ xi− x⎟⎜ yi − y ⎟ 1 r = n 1 ∑ ⎜ s ⎟⎜ s ⎟= n − ∑ zxi yi i=⎝ x ⎠⎝ y ⎠ where z ix the “standardized” observation for x andiz is tye “standardized” i i observation for y ior i = 1, …, n (example graphs of correlation drawn in class: 1. strong positive linear; 2. weak positive linear; 3. strong negative linear; 4. no pattern; 5. parabola; 6. exponential) Properties of r: • A measure of the LINEAR relationship between two variables. • -1 ≤ r ≤ 1 • The magnitude of r (or absolute value) measures the strength of the relationship: o If r = ± 1, then the points follow a straight line. o If r = 0, then the pattern of scatter suggest no linear relationship. • The sign of r indicates the nature of the relationship: o Positive association if r > 0, o Negative association if r < 0. • Correlation treats x and y symmetrically. • Center and scale invariance (unitless). • We can have r = 0, even when the data reveal a strong nonlinear relationship. o e.g. y = x 2 • Correlation does not imply causation (or vice versa). • Since r depends on the mean and std. dev., it is sensitive to outliers. Ch. 8/9 - Intro to Simple Linear Regression Ex8.1) Suppose you had 4 variables for the Oilers roster: height, weight, jersey, age - which relationships might be valid? - how can we describe the relationship between any pair? - how do we use the description to make predictions? - how do we quantify errors in estimates and predictions? Def’n: The regression line predicts the value for the response variable y as a straight-line function of the value x, the explanatory variable. Equation for the regression line: ŷ = b0+ b 1 - b is the intercept: the height of the line at x = 0. 0 - b1is the slope: the amount by which y changes when x increases by 1 unit. - ŷ (“y-hat”) denotes the predicted value of y (or mean y for a given value of x). What about a new student who gets a mark of 80.1%? No observation so can we estimate the final mark based on the pattern of the other observations? Try and fit a line through the data and use it as a model for final percentage given midterm percentage; then, use the line to estimate (or, interpolate) the final percentage for a student that gets 80.1% on the midterm. Def’n: Regression analysis tells how to fit a line to the overall pattern. This equation, or “model”, may estimate or predict other values of y given values of x. Simple linear regression refers specifically to fitting a straight line (“linear”) and using only ONE explanatory variable (“simple”). Least squares estimation of b 0nd b :1 Def’n: A residual is the difference between an observed value and its estimated value. Since ŷ denote
More Less

Related notes for STAT141

OR

Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Join to view

OR

By registering, I agree to the Terms and Privacy Policies
Just a few more details

So we can recommend you notes for your school.