Chapter 5: Displaying and Describing Quantitative Data (5.1 – 5.6)
5.1 Displaying Data Distributions
Histogram: can count the number of cases that fall into each bin, represent
the counts as bars, and plot them against bin values.
o Gaps are not in histograms unless gaps exist in the data.
o Frequency – y-axis
o Bin – x-axis
o If we have n data points, we use about bins.
When a value falls right on a bin boundary, it can be put into either bin (left
or right), but that process must be consistent with all other similar cases.
A relative frequency histogram’s vertical axis shows the percentage of the
total count in each bin.
Stem-and-leaf display: similar to histograms but also give the individual
o They use part of each number (called the stem) to name the bins.
o To make the “leaves”, the next digit of the number is used.
I.e. the number 2.13 in stem-and-leaf looks like 2|3
o A -0 and a +0 must be used as stems since -0.3 is different from 0.3.
Quantitative Data Condition: that the data represent values of a quantitative
Three things to pay attention to when describing distribution:
The shape of a distribution is described in terms of its mode(s), its symmetry,
and whether it has any gaps or outlying values.
Mode: defined as the single value that appears most often (categorical
o Also defined as the peak in a histogram (quantitative variables).
Unimodal: a distribution whose histogram has one main hump.
Bimodal: a distribution whose histogram has two main humps.
Multimodal: a distribution whose histogram has three of more main humps.
Approximately uniform distribution: a distribution that doesn’t appear to
have any clear mode.
Approximately symmetric distribution: when a distribution can be divided
into two parts that look (approximately) like mirror images.
Tails: the (usually) thinner ends of a distribution.
Skewed distribution: when one tail stretches out farther than the other
(skewed to the side of the longer tail).
Outliers (stragglers): points that stand off away from the body of the data
distribution. o I.e. studying personal wealth with Bill Gates in your sample (Bill
Gates’ point would be an outlier).
o Can provide exciting/interesting information about the data.
Looking at a histogram at several different bin widths can help you see how
persistent some of the features are.
When a histogram is unimodal and symmetric, most people would point to
the centre of the distribution to describe a typical price change.
A bar over any symbol indicates the mean of that quantity.
Mean of y: the sum of all the values of variable y divided by the number of
Median: the value that splits a histogram into two equal areas.
o A better description of a distribution then the mean.
The mean may be
represented by the
describe the data as
well as the me