Chapter 14
Describing Relationships: Scatterplots and Correlation
Principles that guide our work:
1) First plot the data, then add numerical summaries
2) Look for overall patterns and deviations from those patterns
3) When the overall pattern is quite regular, there is sometimes a way to describe it very briefly
The most common way to display the relation between two quantitative variables is a scatterplot
A scatterplot shows the relationship between two quantitative variables measured on the same
o The values of one variable appear on the horizontal axis, and the values of the other
variable appear on the vertical axis
o Each individual in the data appears as the point in the plot fixed by the values of both
variables for that individual
Always plot the explanatory variable if there is one, on the horizontal axis of a scatterplot
If there is no explanatory response distinction, either variable can go on the horizontal axis
The response variable y
Interpreting scatterplots
Look for the overall pattern and for striking deviations from that pattern
You can describe the overall pattern of a scatterplot by the direction, form, and strength of the
An important kind of deviation is an outlier, an individual value that falls outside the overall
pattern of the relationship
Two variables are positively associated when above-average values of one tend to accompany
above-average values of the other and below-average values also tend to occur together
The scatterplot slopes upward as we move from left to right
Two variables are negatively associated when above-average values of one tend to accompany
below-average values of the other, and vice versa
The scatterplot slopes downward from left to right
The strength of a relationship in a scatterplot is determined by how closely the points follow a
clear form
Straight-line relations are important because a straight line is a simple pattern that is quite
Correlation describes the direction and strength of a straight-line relationship between two
quantitative variables written as r
positive r indicates positive association between the variables, and negative r indicates negative
the correlation r always falls between -1 and 1
