-to display a relationship between a categorical explanatory variable and a quantitative
response variable, make a side by side comparison of the distributions of the response for
-measure used for data analysis by using a numerical measure to supplement the graph.
(since our eyes are not good judges of how strong a relationship is)
-correlation r: helps us see that r is positive when there is a positive association between
the variables. Ex. height and weight have a positive correlation.
Correlation: measures the direction and strength of the linear relationship between two
quantitative variables. It is usually written as r.
Ex. suppose data on variable x and y for n individuals. The means and standard deviations
of the two variables are and for the x values and and for the y values. The
correlation r between x and y is
means : add these terns for all the individuals
This formula helps us see what correlation is but is not convenient for actually calculating r.
the beginning of this formula starts by standardizing the observations.
is the standardized height of the ith person. The standardized height says how many SD
above or below the mean a person’s height lies. Standardized values have no units, they
have no longer measured in centimeters. The correlation r is an average of the products of
the standardized height and the standardized weight for the n people.
-properties of correlation:
Correlation for the following:
•Doesn’t make a difference what you make the x or y variable when calculating the
•Requires that both variables be quantitative, so that it makes sense to do the
arithmetic indicated by the formula for r. ex. city cant be calculated bc its
•Because r uses the standardized values of the observations, r does not change when
we change the units of measurement of x, y, or both. Ex. using weight and height.
Cm -> inches or kg -> lbs. doesn’t change the correlation between weight and height.
Correlation r has no unit of measurement
•Positive r indicates positive association between the variables and negative r
indicates negative association
•Correlation r is always a number between -1 and 1. Values of r near 0 means a very
weak linear relationship. Strength of relationship increases as r moves away from 0
toward either -1 or 1. Values of r close -1 or 1 means that the points lie close to a
straight line. The extreme values
r=-1 and r= 1 occur only when the points in a scatterplot lie exactly along a straight line.
•Measures the strength of only the linear relationship between two variables.
Correlation does not describe curved relationships between variables, no matter how
strong they are.