Chapter 6: Scatterplots, Association, and Correlation
Scatterplot: a graph that shows the relationship between two quantitative
variables measured on the same cases.
o It is easy to see patterns, trends, relationships, and even the
occasional unusual values standing apart from the others by simply
looking at one.
o Direction: a positive direction or association means that, in general, as
one variable increases, so does the other.
When increases in one variable generally correspond to
decreases in the other, the association is negative.
o Form: the form we care about most is straight, but other patterns in
the scatterplots should be described as well.
o Strength: a scatterplot is said to show a strong association if there is
little scatter around the underlying relationship.
6.1 Looking at Scatterplots
Outliers can lead us to probe further to understand our data more clearly.
There may be entire clusters or subgroups that stand away or show a trend
in a different direction than the rest of the plot.
o Try to understand why they are different and possibly split the data
into subgroups to obtain a more relevant conclusion.
6.2 Assigning Roles to Variables in Scatterplots
Bivariate analysis: statistical analysis of two variables at the same time, as in
our calculation of the correlation coefficient and plotting of scatter diagrams.
Explanatory (predictor/independent) variable: The variable that accounts
for, explains, predicts, or is otherwise responsible for the y-variable.
o It is the x-axis variable.
Response (dependent) variable: The variable that the scatterplot is meant to
explain or predict.
o It is the y-axis variable.
The roles that are chosen for each variable have more to do with how we
think about them than with the variables themselves.
o What are we trying to look for?
6.3 Understanding Correlation
Changing the units of either axis will not change the direction, form, and
strength of the scatterplot.
The variables can be standardized for simplicity’s sake.
By standardizing the values, the scales on both the x-axis and y-axis will be
equal. o Equal scaling gives a neutral way of drawing the scatterplot and a
fairer impression of the strength of the association.
By taking the products of each point of ( ) and summing them, we can
get a measure of the strength of the association.
o Points in the upper right and lower left sections of the plot will have
the same signs and thus their products will be positive (positive
o Points in the upper left and lower right sections of the plot will have
the opposite signs and thus their products will be negative (negative
To adjust for the size of the sum since it gets bigger the more data that exists,
divide the sum by .
o This ratio is called the correlation coefficient, or just the correlation.
o Correlation coefficient: a numerical measure of the direction and
strength of a linear association.
Other equivalent variations of the formula: