Chapter 2 – Descriptive Statistics
In this chapter we will learn about descriptive statistics which is the science of describing the important
characteristics of a population or sample
We will be interested in things like:
• The shape of the distribution of the sample data and/or the population 2.1
• What value is our data centred around? 2.2
o Average value
• How spread out is our data? 2.3
• Is there anything unusual about our data, such as outlying observations? 2.4
• Qualitative / Quantative data
Describing the Shape of a Distribution
Jim is a family doctor. As a patient of Jim’s, you feel that you have to wait too long after the scheduled
appointment time before you get to see Dr. Jim.
You decide to study the length of time a patient waits before seeing Dr. Jim. You gather the following random
sample of 24 waiting times (in minutes) during a typical month:
8, 15, 23, 9, 33, 8, 26, 15, 15, 9, 28, 38, 45, 17, 42, 26, 10, 13, 19, 31, 41, 23, 16, 20
Analyze this data and draw some conclusions.
We see in example 2.1 that things start with a question:
How long do patients have to wait after their scheduled appointment time before seeing Dr. Jim?
Variable of Interest
*X i = time, in minutes, patient “i” waits
after the scheduled appointment
time before seeing Dr. Jim
*Capital X is used to denote what is called a random variable
We use a variable, because we do not know the value that X will take on until you actually select a unit and
record a measurement (collect the data).
In example 2.1
X = waiting time for patient i
where i = 1, 2, 3, …, 24
n = 24 (sample size)
*xi= actual observed waiting time value
x1= 8, x 2 15, … , x = 23, x = 2024
*Small letter represents the actual value of a random variable.
What type of data you have will determine how you organize and summarize your data
However, one of the goals is to determine the distribution of the variable you are interested in (you want the
distribution of the data)
The pattern of the data is the distribution.
In example 2.1, the numbers vary • this is called variation
• the pattern of variation of a variable is called its distribution
The distribution of a variable is best displayed graphically
2 Main Types of Graphical Displays for Quantitative Data
1. Stemplots (Stem-and-leaf Displays)
• quick/easy way to view the shape of a distribution
• best used for small data sets with observations having at least 2 digits
• one advantage of stem plots is that original data is not lost; it is displayed in the stem plot
• however, this advantage makes stem plots awkward for large data sets or for small data sets with a
wide range of numbers (too many stems, too few leaves)
Stem: consists of one or more leading digits
Leaf: consists of the final digit
• arrange stems vertically in increasing order from top to bottom
• arrange leaves in increasing order from left to right
The stem plots ends up ordering your data from smallest to largest.
Back to Example 2.1
Characteristics: - Fairly spread (8-45 range is comparatively big)
- Skewed to the right - No gaps
- Single peaked - *Perhaps 45 minutes may be a outliers
- Centered around 15-20
• useful for larger data sets or when there are not enough stems (less than 5) to clearly see distribution
• split each stem into two:
one with leaves 0 to 4 0-4 5-9
other with leaves 5 to 9 0 | 8 8 9 9
Back to Example 2.1 1 | 0 3 5 5 5 6 7 9
Separate one stem into two. 2 | 0 3 3 6 6 8
Observations about a Distribution 3 | 1 3 8
After making a graph, you should always ask “What do I see?” 4 | 1 2 5
Look for the graph’s important features such as:
1. What is the overall pattern?
2. Are there any deviations from this pattern?
3. What is the shape of the distribution?
i. Bell shaped (normal curve) iv. Bimodal
a) Symmetrical a) Two peaks
b) Single peak v. Uniform
ii. Skewed to the left a) A constant number
a) Median>Mean b) Horizontal line
iii. Skewed to the right c) No peaks