Class Notes (836,136)
Statistics (248)
STAT151 (157)
Susan Kamp (11)
Lecture

# 7-10.pdf

25 Pages
72 Views

School
Department
Statistics
Course
STAT151
Professor
Susan Kamp
Semester
Fall

Description
Ch 7 Scatterplots, Association, and Correlation We will be investigating the relationship and association between two quantitative variables (bivariate data), such as height and weight, the concentration of an injected drug and heart rate, or the consumption level of some nutrient and weight gain. Sometimes the purpose of a study is to show that one variable can explain the outcome of another variable. Definition: - Response (or dependent) variable (symbol: y) - measures an outcome of a study - Explanatory (or independent) variable (symbol: x) explains or causes changes in the response variable. Example 1: Distinguish the x and y variables a) What is the effect of rainfall on crop yield? - x: - y: b) What is the effect of the midterm score on the final grade? - x: - y: 1 of 25 Data: - we measure x and y for each individual - observations are recorded in the form (x, y) - our sample of n bivariate observations is (x1, y1), (2 ,2y ), …, (n ,ny ) Scatterplot - is the best way to start observing the relationship and the ideal way to picture associations between two quantitative variables - is a plot of pairs of observed values of two different quantitative variables. It helps to evaluate the quality of the relationship. - The x-axis is the horizontal axis and y-axis is the vertical axis. - Each observation is then plotted according to its value from the x variable and its value from the y variable. Example: Does the number of years invested in schooling pay off in the job market? 2 of 25 Thought: the better educated you are, the more money you will earn. The data in the following table give the median annual income of full-time workers age 25 or older by the number of years of schooling completed. x = Years of Schooling y = Salary (dollars) 8 18,000 10 20,500 12 25,000 14 28,100 16 34,500 19 39,700 Create a scatterplot for x and y. Scatterplot for salary vs. years 45,000 40,000 35,000 30,000 salary 25,000 20,000 15,000 7 9 11 13 15 17 19 21 years NOTE: If you want to make a scatterplot with more than 1 group, then use different symbols for each group. NOTE: Axes need not to intersect at (0, 0). 3 of 25 Examining a Scatterplot: In any graph of data, look for the overall pattern and for striking deviations (ex. outliers) from this pattern. You can describe the overall pattern of a scatterplot by the form, direction, and strength of the relationship. 1) Form of relationship - linear – where the points roughly follow a straight line - curved relationship and clusters 2) Strength of the Relationship - determined by how close the points in the scatterplot lie to a simple form such as a line - the closer the observations appear to fit a line, the stronger the relationship. 3) Direction (positive and negative associations) - 2 variables are positively associated when x increases, y also increases. - 2 variables are negatively associated when x increases, y decreases. 4 of 25 4) outliers or unusual observations - look for any striking deviations from the overall pattern Example: Describe the pattern of the scatterplot above. Correlation 200 150 100 50 0 0 20 40 60 80 140 135 130 125 120 115 110 105 100 95 90 25 30 35 40 45 50 55 5 of 25 If the scatterplot shows a reasonable linear relationship, calculate correlation coefficient to evaluate the direction and strength of the linear relationship between two numerical variables. Correlation coefficient r: - a numerical measurement of the strength of the linear relationship between the explanatory and response variables xi x  yi y   zxzy   s  s  r    x  y . n 1 n 1 - This is the sum of the products of the standardized values for each paired observation, all divided by n – 1. Example: Calculate the correlation coefficient between years of schooling and salary. What does this number imply? Recall: x = Years of Schooling y = Salary (dollars) 8 18,000 10 20,500 12 25,000 14 28,100 16 34,500 19 39,700 6 of 25 NOTE: Summary statistics: Column n Mean Variance Std. Dev. Sum x 6 13.166667 16.166666 4.020779 79 y 6 27633.334 6.8718664E7 8289.672 165800 Facts about Pearson's correlation coefficient (r): 1) Correlation measures the strength of a linear relationship between two quantitative variables. Check a scatterplot first. a. Correlation requires both variables to be numerical; Cannot be applied to categorical data b. does NOT apply to nonlinear relations c. outliers can distort the correlation dramatically 7 of 25 2) Correlation makes no distinction between explanatory and response variables, ie. The correlation of x with y is the same as the correlation of y with x. 3) Correlation has no units 4) Correlation is a number between –1 and 1 5) The absolute value of the coefficient measures how closely the variables are related.  The closer it is to 1, the closer the relationship. | r | > 0.8  a strong correlation between the variables.  r ≈ 0  a weak linear association 6) Like the mean and standard deviation, the correlation is strongly affected by outliers. 7) Correlation is not affected by changes in the center or scale of either variable. - Correlation depends only on the z-scores, and they are unaffected by changes in center or scale. 8) The sign of the correlation coefficient tells you of the trend in the relationship. r > 0 indicates a positive relation between the variables r < 0 indicates a negative relation between the variables 8 of 25 Straightening Scatterplot (Ch10) - Correlation is a measure of the strength for straight relationships only. When a scatterplot shows a bent form that consistently increases or decreases, we can often straighten the form of the plot by re-expressing one or both variables. - We can often find transformations that straighten the scatterplot’s form. y vs x ln(y) vs x 35 4 30 25 3 20 y15 ln(y) 10 1 5 0 0 0 2 4 6 0 2 4 6 x x Correlation Tables It is common in some fields to compute the correlations between every pair of variables in a collection of variables and arrange these correlations in a table. 9 of 25 Ch 8 Linear Regression & Ch 9 Regression Wisdom Idea: To fit a straight line through the data so that we can predict values of the response at specified values of x. Linear Regression When we have one dependent variable and one independent variable and the relationship between two variables follows a linear pattern, it is possible to describe the relationship by a straight line and by an equation of the form: y = b0+ b 1 where b is called the y-intercept and b the slope of the equation. 0 1 The b’s are called the coefficients of the linear model. 10 of 25 The slope is the amount by which y increases when x increases by 1 unit. Salary vs Years of Schooling 42,000 37,000 32,000 S27,000 22,000 17,000 7 9 11 13 15 17 19 21 Years of Schooling How do we find the line that best describes the linear relationship? Estimate: y  b0 b 1 y - gives an estimate (predicted response) for y for a given value of x - y  b  b x is called the line of best fit or the least squares 0 1 regression line. Note 1: y  y. The vertical distance from a data point (x, y) to the line is called the error of prediction or deviation or residuals. 11 of 25 Deviation of the i data point (x, y) is: i i y  y  y  b  b x  i i i 0 1 i -A negative residual means the predicted value is too big (an overestimate) -A positive residual means the predicted value is too small (an underestimate) Note 2: Sum of the residuals is always 0. Thus, we can’t assess how well the line fits by adding up all the residuals. Note 3: Similar to what we did with deviations, we square the residuals and add the squares. Note 4: the smaller the sum, the better the fit. Conclusion: The best fitted line is the one that minimizes the sum of the squared differences between the data points and the line itself. 2 n 2 n 2
More Less

Related notes for STAT151
Me

OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Join to view

OR

By registering, I agree to the Terms and Privacy Policies
Just a few more details

So we can recommend you notes for your school.