Location of mean and median in a skewed distribution
In class, I tried to explain the locations of measures of central tendency (i.e. mean, median) in a skewed
distribution. My manner of thinking about this was to start with the characteristics of a symmetrical
distribution, and compare it to a skewed distribution that has everything else the same as the
symmetrical distribution, except the symmetry. I tried to do this comparison with the ‘exercise’ of
converting a symmetrical distribution into a skewed distribution, and then evaluating the impact on the
mean and median.
After talking with some students, it appears some (I know—not all!) of the confusion in class occurred
because I didn’t make it clear that we never actually change distributions in this manner. This
‘conversion’ was just my way of explaining the location of mean and median in a skewed distribution.
Rest assured that a distribution of a data isn’t changeable once sampled.
Needless to say, this manner of explaining mean and median in a skewed distribution didn’t go over so
well, and I apologize! So, this file is an attempt to explain the location of mean and median in a
skewed distribution in a different, better, way.
First, let’s be clear what we’re trying to understand. The figure below (which is just Figure 2-11 from the
textbook) shows the relative locations of the mean and median in skewed distributions and symmetrical
distributions. What we are trying to understand is the location of the mean and median in a skewed
Left skew Symmetrical Right skew
First, let’s focus on the characteristics of the symmetrical distribution. I’ve created a perfectly
symmetrical distribution of data, shown in the frequency distribution below and associated histogram.
(cm) (SYMMETRICAL 20
2.5 1 Frequency
7.5 4 5
22.5 28 Value (cm)
42.5 1 Recall that we calculate the mean by summing the values of ALL the data points, and dividing by the
number of data points (i.e. ̅ ∑ ). The mean for this symmetrical data set is 22.5 cm.
The median gives us the midpoint of the data array; it separates the bottom 50% of the data from the
top 50% of the data, splitting the data set exactly in half. Recall that to calculate the median, we
organize the data points so they are in increasing order, and then find the middle point—the exact
centre of the data set. In this particular data set, which has 100 data points (an even number), we have
to use the rule where we find the middle two data points and calculate the mean of their values. The
middle two data points for this data set have values of 22.5 cm and 22.5 cm. The mean of these two
middle points is 22.5 cm, which is our median.
Notice that in a perfectly symmetrical data set like this, the mean and median are equal. We probably
never see a perfectly symmetrical distribution, but we should expect the mean and median to be
roughly equal as well (assuming no outliers—more on that below) in a roughly symmetrical distribution.
Look where the median lies on the histogram (above) in a symmetrical distribution: exactly at the
centre, splitting the data points in half exactly. It’s the middle value in the data set/histogram.
Now, we’ll look at a skewed distribution. I’ve created a left-skewed distribution (i.e. has a long left tail),
with the frequency distribution and associated histogram shown below.
7.5 3 10
12.5 6 5
17.5 8 0
32.5 24 Value (cm)
Once again, we can calculate the mean and median for this data set. Like the symmetrical distribution,
this left-skewed distribution has 100 data points (an even number of points). So, we calculate the
median by finding the mean of the middle two data points in an ordered list of the data. The middle two
data points are 32.5 cm and 32.5 cm. The mean of these two values is 32.5 cm; so the median of the left-
skewed distribution is 32.5 cm.
Find 32.5 cm on the histogram; this point splits the distribution of data points in half, with 50% of the
data points on the left side of the median, and 50% of the data points on the right side of the median.
Again, the median gives us the centre of the data set. When we think about the effect of skew on medians, it doesn’t really do anything. The median doesn’t
take into account the actual VALUES (e.g. 17.5 cm, 32.5 cm, etc) of the data points (point 1, point 2,
point 3); it just splits the data set into two halves with equal number of data points on either side.
But let’s take a look at how skew affects the mean. We’ll do this in a three-step process:
1) compare the location of values in the symmetrical distribution to that of the left-skewed
2) reason through the impact on the mean of the differences in the locat