Chapter 5: Displaying and Describing Quantitative Data (5.7 – 5.14)
5.7 Grouped Data
The mean can be calculated from grouped data by multiplying midpoints by
the % of people who chose that option and adding the results.
You can use the midpoints of ranges in the regular formula for variance and
also multiply by the % (p) of the sample in that group:
5.8 Five-Number Summary and Boxplots
Five-number summary of a distribution reports its median, quartiles, and
A five-number summary of a quantitative variable can be displayed in a
o Boxplot: a boxplot displays the five-number summary as a central box
with whiskers that extend to the non-outlying values.
Particularly effective for comparing groups.
Steps to create a boxplot:
o Draw a single vertical axis spanning the extent of the data.
o Draw short horizontal lines at the lower and upper quartiles and at
the median. Then connect them with vertical lines to form a box
(width not important unless multiple groups being shown).
o Put fences (don’t show in final graph) around the main part of the
data, placing the upper fence 1.5 IQRs (Q3 – Q1) above the upper
quartile and the lower fence 1.5 IQRs below the lower quartile.
I.e. Q3 + 1.5*IQR = upper fence.
o Grow “whiskers”. Draw lines from each end of the box up and down to
the most extreme data values found within the fences.
o Add any outliers by displaying data values that lie beyond the fences
with special symbols.
Outliers that are < 3 IQRs from the quartiles with one symbol;
outliers that are > 3 IQRs from the quartiles with another
o The boxes in the centre of the boxplot show the middle half of the
o The height of the box = IQR.
o If the median is roughly centered between the boxes, the data is
roughly symmetric (if not the distribution is skewed).
o If the whiskers are not roughly the same length, the distribution is
5.9 Percentiles Q1 can be thought of as the 25 percentile (25% of the data below it).
Q3 can be thought of as the 75 percentile.
The median is the 50 percentile.
Percentile: a value below which a given percentage of data lies.
5.10 Comparing Groups
Histograms are best at displaying one or two distributions.
Boxplots usually do a better job at displaying more than two distributions.
o They offer an ideal balance of information and simplicity, hiding the
details while displaying the overall summary information.
You can see which group:
Has the higher median.
Has the greater IQR.
Where the central 50% of the data is located.