Chapter 4: Numerical Descriptive Measures
This chapter discussed numerical descriptive measures used to summarize and describe sets of data.
At the completion of this chapter, you are expected to know the following:
1. How to calculate the basic numerical measures of central location and dispersion.
2. How to use the Empirical Rule and Chebyshevs theorem to interpret standard deviation.
3. How to calculate quartiles, and use them to construct a box plot.
4. How to approximate the mean and standard deviation of a set of grouped data.
5. How to calculate and interpret covariance and the coefficient of correlation.
6. How to calculate the coefficients b0and b1for the least squares (regression) line.
4.2 Measures of Central Location
This section discussed three commonly used numerical measures of the central, or average, value of
a data set: the mean, the median, and the mode. You are expected to know how to compute each of these
measures for a given data set. Moreover, you are expected to know the advantages and disadvantages of
each of these measures, as well as the type of data for which each is an appropriate measure.
Question: How do I determine which measure of central location should be usedthe
mean, the median, or the mode?
Answer: If the data are qualitative, the only appropriate measure of central location is
the mode. If the data are ranked, the most appropriate measure of central loca-
tion is the median.
For quantitative data, however, it is p ossible to compute all th ree mea-
sures. Which measure you should use depends on your objective. The mean is
most popular because it is easy to compute and to interpret. (In particular, the
mean is generally the best measure of central location for purposes of statisti-
cal inference, as youll see in later chapters.) It has the disadvantage,
however, of being unduly i nfluenced by a few very sm all or very l arge
To avoid this influence, you might choose to use the median. This could
well be t he case i f the data consisted, for exam ple, of sal aries or of house
prices. The mode, representing the value occurring most frequently (or t he
midpoint of the class with the largest frequency) should be used when the ob-
jective is to indicate the value (such as shirt size or house price) that is most
popular with consumers.
23 Example 4.1
Find the mean, mode, and median of the following sample of measurements:
8, 12, 6, 6, 10, 8, 4, 6
The mean value is
x = i=1 = i=1
= 8 + 12 + 6 + 6 + 10 + 8 + 4 + 6 = 60 = 7.5
The mode is 6, because that is th e value that occurs m ost frequently. To find the median, we must first
arrange the measurements in ascending order:
4, 6, 6, 6, 8, 8, 10, 12
Since the number of measurements is even, the median is the midpoint between the two middle values, 6
and 8. Thus, the median is 7.
Consider the following sample of measurements, which is obtained from the sample in Example 4.1
by adding one extreme value, 21:
8, 12, 6, 6, 10, 8, 4, 6, 21
Which measure of central location is most affected by the addition of the single value?
The mean value is now
The mode is still 6.
We arrange the new sample of measurements in ascending order:
4, 6, 6, 6, 8, 8, 10, 12, 21
The median is now equal to 8, the middle value. Thus, the mean is the measure that is most affected by
the addition of one extreme value.
24 Example 4.3
In Example 2.2, we considered the following weights, in pounds, of a group of workers:
173 165 171 175 188
183 177 160 151 169
162 179 145 171 175
168 158 186 182 162
154 180 164 166 157
a) Find the mean of the weights of the sample of 25 workers.
b) Find the median of the weights.
c) Find the modal class of the frequency distribution of weights that was constructed in the solu-
tion to part b) of Example 2.2.
a) The mean of the 25 weights is
i =1 173 + 183 + 162 + .. . + 175 + 162 + 177
x = =
= 168.8 pounds
b) The middle value of the 25 weights is most easily found by referring to the stem and leaf dis-
play constructed in the solution to part a) of Example 2.2. We find that the median, or middle
value, is 169 pounds.
c) The modal class is the class with the largest frequency, which is 160 up t o 170. The m ode
may be taken to be the midpoint of this class, which is 165 pounds.
Geometric Mean (Optional)
The arithmetic mean is the most popular measure of the central location of the distribution of a
set of observations. But the arithmetic mean is not a good measure of the average rate at which
a quantity grows over time. That quantity, whose growth rate (or rate of change) we wish to
measure, might be the total annual sales of a fi rm or the market value of an i nvestment. The
geometric mean should be used to measure the average growth rate of the values of a variable
25 Geometric Mean
Let R ienote the rate of growth (expressed in decimal form) of some variable in period i (i = 1,
, n). The geometric mean of the growth rates R 1 R 2 , R ns the constant return g that
produces the same terminal value at the end of period n as do the actual returns for the n
periods. That is,
1+ R g = 1+ R 1+1R 12 R , n) or
R = (1+ R)(1+ R )1
The total sales of a company over a six year period are shown in the accompanying table.
Year Sales ($millions)
a) Calculate the five annual growth rates from year 1 to year 6.
b) Find the geometric mean growth rate in sales over this period.
c) Find the arithmetic mean growth rate in sales over this period.
d) What is the best estimate of the growth rate in sales in year 7?