false

Class Notes
(836,136)

Canada
(509,645)

University of Alberta
(13,399)

Statistics
(248)

STAT151
(157)

Susan Kamp
(11)

Lecture

Unlock Document

Statistics

STAT151

Susan Kamp

Fall

Description

Ch 7 Scatterplots, Association, and Correlation
We will be investigating the relationship and association between
two quantitative variables (bivariate data), such as height and
weight, the concentration of an injected drug and heart rate, or the
consumption level of some nutrient and weight gain.
Sometimes the purpose of a study is to show that one variable can
explain the outcome of another variable.
Definition:
- Response (or dependent) variable (symbol: y) - measures
an outcome of a study
- Explanatory (or independent) variable (symbol: x)
explains or causes changes in the response variable.
Example 1: Distinguish the x and y variables
a) What is the effect of rainfall on crop yield?
- x:
- y:
b) What is the effect of the midterm score on the final grade?
- x:
- y:
1 of 25 Data:
- we measure x and y for each individual
- observations are recorded in the form (x, y)
- our sample of n bivariate observations is
(x1, y1), (2 ,2y ), …, (n ,ny )
Scatterplot
- is the best way to start observing the relationship and the
ideal way to picture associations between two quantitative
variables
- is a plot of pairs of observed values of two different
quantitative variables. It helps to evaluate the quality of the
relationship.
- The x-axis is the horizontal axis and y-axis is the vertical
axis.
- Each observation is then plotted according to its value from
the x variable and its value from the y variable.
Example:
Does the number of years invested in schooling pay off in the job
market?
2 of 25 Thought: the better educated you are, the more money you will
earn.
The data in the following table give the median annual income of
full-time workers age 25 or older by the number of years of
schooling completed.
x = Years of Schooling y = Salary (dollars)
8 18,000
10 20,500
12 25,000
14 28,100
16 34,500
19 39,700
Create a scatterplot for x and y.
Scatterplot for salary vs. years
45,000
40,000
35,000
30,000
salary
25,000
20,000
15,000
7 9 11 13 15 17 19 21
years
NOTE: If you want to make a scatterplot with more than 1 group,
then use different symbols for each group.
NOTE: Axes need not to intersect at (0, 0).
3 of 25 Examining a Scatterplot:
In any graph of data, look for the overall pattern and for striking
deviations (ex. outliers) from this pattern. You can describe the
overall pattern of a scatterplot by the form, direction, and strength
of the relationship.
1) Form of relationship
- linear – where the points roughly follow a straight line
- curved relationship and clusters
2) Strength of the Relationship
- determined by how close the points in the scatterplot lie to a
simple form such as a line
- the closer the observations appear to fit a line, the stronger the
relationship.
3) Direction (positive and negative associations)
- 2 variables are positively associated when x increases, y also
increases.
- 2 variables are negatively associated when x increases, y
decreases.
4 of 25 4) outliers or unusual observations
- look for any striking deviations from the overall pattern
Example:
Describe the pattern of the scatterplot above.
Correlation
200
150
100
50
0
0 20 40 60 80
140
135
130
125
120
115
110
105
100
95
90
25 30 35 40 45 50 55
5 of 25 If the scatterplot shows a reasonable linear relationship, calculate
correlation coefficient to evaluate the direction and strength of
the linear relationship between two numerical variables.
Correlation coefficient r:
- a numerical measurement of the strength of the linear
relationship between the explanatory and response variables
xi x yi y
zxzy s s
r x y .
n 1 n 1
- This is the sum of the products of the standardized values
for each paired observation, all divided by n – 1.
Example: Calculate the correlation coefficient between years of
schooling and salary. What does this number imply?
Recall:
x = Years of Schooling y = Salary (dollars)
8 18,000
10 20,500
12 25,000
14 28,100
16 34,500
19 39,700
6 of 25 NOTE: Summary statistics:
Column n Mean Variance Std. Dev. Sum
x 6 13.166667 16.166666 4.020779 79
y 6 27633.334 6.8718664E7 8289.672 165800
Facts about Pearson's correlation coefficient (r):
1) Correlation measures the strength of a linear relationship
between two quantitative variables. Check a scatterplot
first.
a. Correlation requires both variables to be numerical;
Cannot be applied to categorical data
b. does NOT apply to nonlinear relations
c. outliers can distort the correlation dramatically
7 of 25 2) Correlation makes no distinction between explanatory and
response variables, ie. The correlation of x with y is the
same as the correlation of y with x.
3) Correlation has no units
4) Correlation is a number between –1 and 1
5) The absolute value of the coefficient measures how closely
the variables are related.
The closer it is to 1, the closer the relationship.
| r | > 0.8 a strong correlation between the variables.
r ≈ 0 a weak linear association
6) Like the mean and standard deviation, the correlation is
strongly affected by outliers.
7) Correlation is not affected by changes in the center or scale
of either variable.
- Correlation depends only on the z-scores, and they are
unaffected by changes in center or scale.
8) The sign of the correlation coefficient tells you of the trend
in the relationship.
r > 0 indicates a positive relation between the variables
r < 0 indicates a negative relation between the variables
8 of 25 Straightening Scatterplot (Ch10)
- Correlation is a measure of the strength for straight
relationships only. When a scatterplot shows a bent form that
consistently increases or decreases, we can often straighten the
form of the plot by re-expressing one or both variables.
- We can often find transformations that straighten the
scatterplot’s form.
y vs x ln(y) vs x
35 4
30
25 3
20
y15 ln(y)
10 1
5
0 0
0 2 4 6 0 2 4 6
x x
Correlation Tables
It is common in some fields to compute the correlations between
every pair of variables in a collection of variables and arrange
these correlations in a table.
9 of 25 Ch 8 Linear Regression & Ch 9 Regression Wisdom
Idea: To fit a straight line through the data so that we can predict
values of the response at specified values of x.
Linear Regression
When we have one dependent variable and one independent
variable and the relationship between two variables follows a
linear pattern, it is possible to describe the relationship by a
straight line and by an equation of the form:
y = b0+ b 1
where b is called the y-intercept and b the slope of the equation.
0 1
The b’s are called the coefficients of the linear model.
10 of 25 The slope is the amount by which y increases when x increases by
1 unit.
Salary vs Years of Schooling
42,000
37,000
32,000
S27,000
22,000
17,000
7 9 11 13 15 17 19 21
Years of Schooling
How do we find the line that best describes the linear
relationship?
Estimate: y b0 b 1
y
- gives an estimate (predicted response) for y for a given
value of x
- y b b x is called the line of best fit or the least squares
0 1
regression line.
Note 1: y y. The vertical distance from a data point (x, y) to
the line is called the error of prediction or deviation or
residuals.
11 of 25 Deviation of the i data point (x, y) is:
i i
y y y b b x
i i i 0 1 i
-A negative residual means the predicted value is too big
(an overestimate)
-A positive residual means the predicted value is too
small (an underestimate)
Note 2: Sum of the residuals is always 0. Thus, we can’t assess
how well the line fits by adding up all the residuals.
Note 3: Similar to what we did with deviations, we square the
residuals and add the squares.
Note 4: the smaller the sum, the better the fit.
Conclusion: The best fitted line is the one that minimizes the
sum of the squared differences between the data points and the
line itself.
2
n 2 n 2

More
Less
Related notes for STAT151

Join OneClass

Access over 10 million pages of study

documents for 1.3 million courses.

Sign up

Join to view

Continue

Continue
OR

By registering, I agree to the
Terms
and
Privacy Policies

Already have an account?
Log in

Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.