OMIS 2010 Chapter Notes - Chapter 04: Squared Deviations From The Mean, Standard Deviation, Frequency Distribution
26 views29 pages
Chapter 4: Numerical Descriptive Measures
This chapter discussed numerical descriptive measures used to summarize and describe sets of data.
At the completion of this chapter, you are expected to know the following:
1. How to calculate the basic numerical measures of central location and dispersion.
2. How to use the Empirical Rule and Chebyshev’s theorem to interpret standard deviation.
3. How to calculate quartiles, and use them to construct a box plot.
4. How to approximate the mean and standard deviation of a set of grouped data.
5. How to calculate and interpret covariance and the coefficient of correlation.
6. How to calculate the coefficients b0 and b1 for the least squares (regression) line.
4.2 Measures of Central Location
This section discussed three commonly used numerical measures of the central, or average, value of
a data set: the mean, the median, and the mode. You are expected to know how to compute each of these
measures for a given data set. Moreover, you are expected to know the advantages and disadvantages of
each of these measures, as well as the type of data for which each is an appropriate measure.
Question: How do I determine which measure of central location should be used—the
mean, the median, or the mode?
Answer: If the data are qualitative, the only appropriate measure of central location is
the mode. If the data are ranked, the most appropriate measure of central loca-
tion is the median.
For quantitative data, however, it is possible to compute all three mea-
sures. Which measure you should use depends on your objective. The mean is
most popular because it is easy to compute and to interpret. (In particular, the
mean is generally the best measure of central location for purposes of statisti-
cal inference, as you’ll see in later chapters.) It has the disadvantage,
however, of being unduly influenced by a few very small or very large
To avoid this influence, you might choose to use the median. This could
well be the case if the data consisted, for example, of salaries or of house
prices. The mode, representing the value occurring most frequently (or the
midpoint of the class with the largest frequency) should be used when the ob-
jective is to indicate the value (such as shirt size or house price) that is most
popular with consumers.
Find the mean, mode, and median of the following sample of measurements:
8, 12, 6, 6, 10, 8, 4, 6
The mean value is
8 = 7.5
The mode is 6, because that is the value that occurs most frequently. To find the median, we must first
arrange the measurements in ascending order:
4, 6, 6, 6, 8, 8, 10, 12
Since the number of measurements is even, the median is the midpoint between the two middle values, 6
and 8. Thus, the median is 7.
Consider the following sample of measurements, which is obtained from the sample in Example 4.1
by adding one extreme value, 21:
8, 12, 6, 6, 10, 8, 4, 6, 21
Which measure of central location is most affected by the addition of the single value?
The mean value is now
The mode is still 6.
We arrange the new sample of measurements in ascending order:
4, 6, 6, 6, 8, 8, 10, 12, 21
The median is now equal to 8, the middle value. Thus, the mean is the measure that is most affected by
the addition of one extreme value.
In Example 2.2, we considered the following weights, in pounds, of a group of workers:
173 165 171 175 188
183 177 160 151 169
162 179 145 171 175
168 158 186 182 162
154 180 164 166 157
a) Find the mean of the weights of the sample of 25 workers.
b) Find the median of the weights.
c) Find the modal class of the frequency distribution of weights that was constructed in the solu-
tion to part b) of Example 2.2.
a) The mean of the 25 weights is
25 =173 +183 +162 +...+175 +162 +177
b) The middle value of the 25 weights is most easily found by referring to the stem and leaf dis-
play constructed in the solution to part a) of Example 2.2. We find that the median, or middle
value, is 169 pounds.
c) The modal class is the class with the largest frequency, which is “160 up to 170.” The mode
may be taken to be the midpoint of this class, which is 165 pounds.
Geometric Mean (Optional)
The arithmetic mean is the most popular measure of the central location of the distribution of a
set of observations. But the arithmetic mean is not a good measure of the average rate at which
a quantity grows over time. That quantity, whose growth rate (or rate of change) we wish to
measure, might be the total annual sales of a firm or the market value of an investment. The
geometric mean should be used to measure the average growth rate of the values of a variable