# STAB22H3 Chapter Notes - Chapter 5: Box Plot, Equalize

23 views2 pages

For unlimited access to Textbook Notes, a Class+ subscription is required.

Stats: Data and Models – Canadian Edition

Chapter 5 – Understanding and Comparing Distributions

Boxplot and 5-Number Summaries

- 5-number summary can be displayed in a boxplot

- Making the boxplot:

o Vertical axis spanning the extent of the data

o Indicate Q1, median, and Q3 with short horizontal lines, and connect them with vertical

lines to form a box

o (in class) indicate the minimum and maximum values with an asterisk (*), connect the *s

to the box with a straight line

o Erect “fences” around the main part of the data to identify outliers, shown with a dotted

line – place the upper fence 1.5 IQRs above Q3 (Q3 + 1.5IQR) and the lower fence 1.5

IQRs below Q1 (Q1 – 1.5IQR)

o If a data value falls outside the fences, it does not get connect with a whisker (the line

from the box connecting the min. and max. values)

If the min. or max. value falls outside of the fence, the most extreme value within

the fences is also marked (because it is the highest/lowest non-outlier)

- The height of the box made by the Q1, median, and Q3 is equal to the IQR

- If the median is roughly at the centre of the box, the middle half of the data is roughly symmetric,

if it is not centred, the distribution is skewed

- The length of the whiskers also indicate distribution symmetry/skewness

Comparing Groups with Boxplots

- Boxplots offer a balance of information and simplicity; hide the details while displaying the

overall summary information

- Looking at boxplots side-by-side, we can see which groups have higher medians, greater IQRs,

and greater ranges

Outliers

- Cases that stand out from the rest of the data almost always deserve our attention

- An outlier is a value that doesn’t fit in with the rest of the data

- Firstly, try to understand the outlier(s) in the context of the data

- Look at the gap between the case and the rest of the data when considering whether it is an outlier

- Some outliers are just errors, many are just different

- Report summaries and analyses with and without the outlier to see its influence and then decide

what to think about the data

- Never ignore an outlier or drop it from analysis without comment

Timeplots: Order, Please!

- A timeplot is a display of values against time

- Smoothing Timeplots

o Timeplots often show a lot of point-to-point variation that we want to see past so that we

can see underlying smooth trends and how the values around that trend vary (timeplot

version of centre and spread)

o A smooth trace can highlight long-term patterns and help us see them through the local

variation; helps us see the main trend and points that don’t fit in the overall pattern

Looking Into the Future