GGR270H1 Lecture Notes - Census Tract, Standard Deviation, Skewness

20 views3 pages
15 Oct 2013
School
Course
Professor
Page:
of 3
Measures of Dispersion
Range
Simplest measure of dispersion
Takes difference between smallest and largest value in the dataset, at the
interval/ratio scale
But, influenced by outliers
Range = Xmax - Xmin
Quartiles
Can yield more information and lesson impact of outliers
Data are divided into quartiles (4 Groups)
Centiles (5 groups), Percentile (100 groups)
Standard Deviation and Variance
Two of the most commonly used measures of dispersion
Comparing values of each measure to the Mean
Xi X (x bar)
Two key properties of the mean/ value relationship
o Sum of differences will always add up to Zero
o Sum of the squared difference will be the minimum sum possible.
Called “Least Squares property
o Least Squares property carries over into calculating
o Variance: 
 , second X has a bar ontop , x bar =
sample mean
o Standard Deviation 

Skewness
o Measures the degree of symmetry in the frequency of distribution
o Determine how evenly (or unevenly) the values are distributed either
side of the mean
o 

o You can only say if something is positively skewed or negatively
skewed, there is no value of the difference
Coefficient of Variation
o Allows for comparison of variability spatial samples samples tied to
a particular location
o Tests which sample has the greatest variability
o Standard Deviation or Variance are absolute measures, so they are
influenced by the size of the values in the dataset can’t compare
among the samples because the scales will be different
o Used for when you want to compare 2 locations. Ex. studying
climatology and you are looking at rainfall in different areas, so you
want to find which location has the greatest variability in rainfall?? Or
looking at average household income, census data can be measured at
the census tract level. And look at the variability in the census tracts,
to see which one has the most variability
o To allow a comparison of variation across two or more geographic
samples, can use a relative measure of dispersion called Coefficient
of Variation
o CV = S/X (standard deviation/ mean)
o Example:
o
o Which station has the greatest degree of variation??
Station C because it has a greater CV
Practical Significance of Standard Deviation
Empirical Rule, lets say the grey area is
68% of the data, so it falls in one
standard deviation, so negative one
standard deviation and one standard
deviation
o 95% will fall in negative 2
standard deviation to 2 standard
deviation
o this is a universal law for the
distribution
EXAMPLE: Mean = 20 and S = 5
o So 68% will fall between 20 +/- 5
or between 15-25
o 95% fall between 10-30
o 99.7% fall between 5-35 (3
standard deviations)
o Normal Distribution Sample
PUT ON CHEAT SHEET
Z-Scores
Standard scores are referred to as Z scores
Indicated how many standard deviations separate a particular value from the
mean
Z scores can be + or depending if they are > or < the mean
Z score of the mean is 0 and the standard deviation is + or 1.
Table of Normal Values provides probability information on a standardized
scale.
o Statistics is a tool for forecasting, we can look at the probability of a
particular value. If we have the mean of the assignment and the
standard deviation, what is the probability of a student getting a
certain grade?
But we can also calculate Z scores
Formulas involves comparing values to the mean value, and dividing by the
standard deviation
Results is interoperated as the “number od standard deviations an
observation lies above or below the mean”
 
o Z is the standard score
o S is the standard deviation of the sample
o X each value in the data set
o X bar the man of all values in the data set
E.x. Rain fall in Toronto: the mean= 39.95 inches of rainfall , S =
7.5 inches
Z = 1.07. Therefore 48 inches is 1.07 standard
deviations above the mean. (If it were a negative value,
you would have said below the mean)