Lecture #5 (Chap 1 & 2 Continued)
• The Variance (Average (Squared) Deviation)
-Most common measure of dispersion
-Measures the typical squared deviation about the center of the data using
the arithmetic mean.
-Calculated by averaging the squares of the individual deviations from the
Variance: Population: Sample:
-In R we use var( )
• The Standard Deviation
o Definition: A measure of how spread out or dispersed the data in a set are
relative to the set's mean (from website).
o The positive square root of the variance
o Falls in the same range of magnitude (and appears in the same units) as
the observations themselves.
Calculation: Population: Sample:
• Overall Range
-Range: measures the spread of the data
-Equals the difference between the largest and the smallest observation in a
data set (in R we could use the range ( ) function).
• Interfractile Ranges (2 numbers that contain half the data)
-Definition: Measure difference between 2 values (called fractile or
percentile) in the ordered
o Quartiles: divide the array into 4 quarters
o Interquartile Range: difference between 3 and 1 quartiles (contains
middle 50% of data) o Deciles: divide the array into 10 parts at the center of the range. 5 decile
(ie: the median)
• In R we use the IQR ( ) and quantile ( ) functions
Shape Measures: Skewness
• Definition: a measure of symmetry
• If the data is right tailed, you know the data is skewed (ie: seen in income
distributions, where a lot of people are centred around average income)
• A frequency distribution’s degree of distortion from horizontal symmetry
• Person’s (first) coefficient of skewness is (is it + or -)?
• An alternative (more popular, default in R) moment-based measure is given
o Skewness=0 for symmetric distributions
o For right skewed distribution; the mean is greater than the median which
is greater than the mode. Right Skewed= (mean>median>mode)
• Definition: How peaked a distribution is
• If you get a value less than 3 it means it is less peaked than the normal
distribution (If more than 3 opposite)
• Coefficient of kurtosis
o Kurtosis = 3 for the normal distribution o In R we first install the moments package via: install.
o The functions kurtosis ( ) and skewness ( ) can then be accessed.
The Five-Number Summary
• The Five-number summary of a set of observations consists of the smallest
observation, the first quartiles, the median, the third quartile, and the largest
observation, written in order from smallest to largest.
Minimum, Q1, Median, Q3, Maximum
(Q1= a quarter of all observations below and ¾ above; Median=Half of all
observations below, and half above; Q3= a quarter of all observations above