PSYC 3000 Lecture Notes - Lecture 3: Interquartile Range, Quartile, Box Plot

2 Aug 2016
PSYC 3000
Sept 15
Numeric tools
-for normal distribution, we try and find the center of the data using the sample mean.
-standard deviation is also used. It characterises dispersion.
-dispersion means how observation are dispersed around the centre
-often in research we don’t get normal distribution. Distribution are often skewed or oddly
shaped.
-mean doesn’t work well in skewed or oddly shaped distribution.
-in those cases the median is used
-another solution is to use quartiles across the distribution. Separating the distribution by
quartiles allows us to establish landmarks and notice any outliers.
-interquartile range is the difference between one quartile to another (q3-q1)
Boxplots
-a display of different landmarks on a distribution (quartiles)
-line in the box is the median. It is not always in the middle of the box, it just needs to be inside
the box.
-boxplots can be displayed vertically or horizontally. Most of the time it is shown vertically.
-the size of the box is the interquartile range (q3-q1)
-width of the box does not matter, simply visual effect.
-in the boxplot there is an invisible line. It’s called the upper fence
-whisker (line on top of and bottom of box) always ends on a data point
-the whisker does not go past fence and it stops at the last data point before the fence. If there are
data past the fence, they are outliers.
-also an invisible line at the bottom called lower fence.
-the fences do not show data point, the whiskers do.
-if the median is close to one of the whiskers, it means the data is tightly clustered at one end and
very scattered in the other end.
-boxplots can easily detect outliers. Anything beyond the top fence and bottom fence are
considered outliers.
-stars= extreme outliers. These are often thrown out as data entry errors.
-outliers only contain about 1% of the data.
