Study Guides (248,683)
Biology (1,536)

# descriptive statistics- Location+of+mean+and+median+in+a+skewed+distribution.pdf

4 Pages
95 Views

Department
Biology
Course Code
Biology 2244A/B
Professor
Jennifer Waugh

This preview shows page 1. Sign up to view the full 4 pages of the document.
Description
Location of mean and median in a skewed distribution In class, I tried to explain the locations of measures of central tendency (i.e. mean, median) in a skewed distribution. My manner of thinking about this was to start with the characteristics of a symmetrical distribution, and compare it to a skewed distribution that has everything else the same as the symmetrical distribution, except the symmetry. I tried to do this comparison with the ‘exercise’ of converting a symmetrical distribution into a skewed distribution, and then evaluating the impact on the mean and median. After talking with some students, it appears some (I know—not all!) of the confusion in class occurred because I didn’t make it clear that we never actually change distributions in this manner. This ‘conversion’ was just my way of explaining the location of mean and median in a skewed distribution. Rest assured that a distribution of a data isn’t changeable once sampled. Needless to say, this manner of explaining mean and median in a skewed distribution didn’t go over so well, and I apologize! So, this file is an attempt to explain the location of mean and median in a skewed distribution in a different, better, way. First, let’s be clear what we’re trying to understand. The figure below (which is just Figure 2-11 from the textbook) shows the relative locations of the mean and median in skewed distributions and symmetrical distributions. What we are trying to understand is the location of the mean and median in a skewed distribution. Left skew Symmetrical Right skew tail tail First, let’s focus on the characteristics of the symmetrical distribution. I’ve created a perfectly symmetrical distribution of data, shown in the frequency distribution below and associated histogram. Frequency 30 Value 25 (cm) (SYMMETRICAL 20 DISTRIBUTION) 15 2.5 1 Frequency 7.5 4 5 0 12.5 13 17.5 18 22.5 28 Value (cm) 27.5 18 32.5 13 37.5 4 42.5 1 Recall that we calculate the mean by summing the values of ALL the data points, and dividing by the number of data points (i.e. ̅ ∑ ). The mean for this symmetrical data set is 22.5 cm. The median gives us the midpoint of the data array; it separates the bottom 50% of the data from the top 50% of the data, splitting the data set exactly in half. Recall that to calculate the median, we organize the data points so they are in increasing order, and then find the middle point—the exact centre of the data set. In this particular data set, which has 100 data points (an even number), we have to use the rule where we find the middle two data points and calculate the mean of their values. The middle two data points for this data set have values of 22.5 cm and 22.5 cm. The mean of these two middle points is 22.5 cm, which is our median. Notice that in a perfectly symmetrical data set like this, the mean and median are equal. We probably never see a perfectly symmetrical distribution, but we should expect the mean and median to be roughly equal as well (assuming no outliers—more on that below) in a roughly symmetrical distribution. Look where the median lies on the histogram (above) in a symmetrical distribution: exactly at the centre, splitting the data points in half exactly. It’s the middle value in the data set/histogram. Now, we’ll look at a skewed distribution. I’ve created a left-skewed distribution (i.e. has a long left tail), with the frequency distribution and associated histogram shown below. Frequency 30 Value 25 (cm) (LEFT-SKEWED DISTRIBUTION) 20 15 2.5 1 7.5 3 10 Frequency 12.5 6 5 17.5 8 0 22.5 12 27.5 17 32.5 24 Value (cm) 37.5 17 42.5 12 Once again, we can calculate the mean and median for this data set. Like the symmetrical distribution, this left-skewed distribution has 100 data points (an even number of points). So, we calculate the median by finding the mean of the middle two data points in an ordered list of the data. The middle two data points are 32.5 cm and 32.5 cm. The mean of these two values is 32.5 cm; so the median of the left- skewed distribution is 32.5 cm. Find 32.5 cm on the histogram; this point splits the distribution of data points in half, with 50% of the data points on the left side of the median, and 50% of the data points on the right side of the median. Again, the median gives us the centre of the data set. When we think about the effect of skew on medians, it doesn’t really do anything. The median doesn’t take into account the actual VALUES (e.g. 17.5 cm, 32.5 cm, etc) of the data points (point 1, point 2, point 3); it just splits the data set into two halves with equal number of data points on either side. But let’s take a look at how skew affects the mean. We’ll do this in a three-step process: 1) compare the location of values in the symmetrical distribution to that of the left-skewed distribution; 2) reason through the impact on the mean of the differences in the locat
More Less

Only page 1 are available for preview. Some parts have been intentionally blurred.

Unlock Document

Unlock to view full version

Unlock Document
Me

OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Join to view

OR

By registering, I agree to the Terms and Privacy Policies
Just a few more details

So we can recommend you notes for your school.

Get notes from the top students in your class.

Request Course
Submit