STAB22H3 Chapter Notes - Chapter 5: Box Plot, Equalize

Stats: Data and Models Canadian Edition
Chapter 5 Understanding and Comparing Distributions
Boxplot and 5-Number Summaries
- 5-number summary can be displayed in a boxplot
- Making the boxplot:
o Vertical axis spanning the extent of the data
o Indicate Q1, median, and Q3 with short horizontal lines, and connect them with vertical
lines to form a box
o (in class) indicate the minimum and maximum values with an asterisk (*), connect the *s
to the box with a straight line
o Erect “fences” around the main part of the data to identify outliers, shown with a dotted
line place the upper fence 1.5 IQRs above Q3 (Q3 + 1.5IQR) and the lower fence 1.5
IQRs below Q1 (Q1 1.5IQR)
o If a data value falls outside the fences, it does not get connect with a whisker (the line
from the box connecting the min. and max. values)
If the min. or max. value falls outside of the fence, the most extreme value within
the fences is also marked (because it is the highest/lowest non-outlier)
- The height of the box made by the Q1, median, and Q3 is equal to the IQR
- If the median is roughly at the centre of the box, the middle half of the data is roughly symmetric,
if it is not centred, the distribution is skewed
- The length of the whiskers also indicate distribution symmetry/skewness
Comparing Groups with Boxplots
- Boxplots offer a balance of information and simplicity; hide the details while displaying the
overall summary information
- Looking at boxplots side-by-side, we can see which groups have higher medians, greater IQRs,
and greater ranges
Outliers
- Cases that stand out from the rest of the data almost always deserve our attention
- An outlier is a value that doesn’t fit in with the rest of the data
- Firstly, try to understand the outlier(s) in the context of the data
- Look at the gap between the case and the rest of the data when considering whether it is an outlier
- Some outliers are just errors, many are just different
- Report summaries and analyses with and without the outlier to see its influence and then decide
what to think about the data
- Never ignore an outlier or drop it from analysis without comment
- A timeplot is a display of values against time
- Smoothing Timeplots
o Timeplots often show a lot of point-to-point variation that we want to see past so that we
can see underlying smooth trends and how the values around that trend vary (timeplot
o A smooth trace can highlight long-term patterns and help us see them through the local
variation; helps us see the main trend and points that don’t fit in the overall pattern
Looking Into the Future
