GGR270 – Introductory Analytical Methods
– only use for 1-2 sentence replies. Office hour is right after class.
Tutorial is sept. 23 , held weekly.
Statistics – Any collection of numerical data; vital data, economic indicators, social statistics. A
methodology for collecting, presenting and analyzing data; summarize findings, theory
validation, forecast (putting an investment into something i.e subway system, you need to see
what the demand will be once it is built and after that because you want to make the money
back), evaluate, select among alternatives.
Descriptive Statistics – Organization/summarize data; replace large set of numbers with small
summary measures (analyzing 230 exam test marks; average, standard deviation, etc). This
can be done with graphs. Goal of this is to minimize information loss.
Inferential Statistics – Links descriptive statistics to probability theory; generalize results from a
smaller group to a much larger one; creating a subset from a sample and use that to reflect the
Population – Total set of elements (objects, people, regions) under examinations; i.e all potential
voters in an urban area. Denoted as N.
Sample – Subset of elements in the population; used to make inferences about certain
characteristics of the population. Try to predict the behavior of the population by looking closely
at the sample. Population is denoted as n. If you have different samples from one population,
they can be denoted as n1, n2, n3.
Variables and Data
Most people view data as plural.
Variable: Characteristic of the population that changes or varies over time, i.e temperature,
income, education, etc.
Bivarious statistics – how do two variables influence each other? Is there a correlation? What is
the relationship? i.e how does education affect income?
Two key categories of variables:
1. Quantitative – numerical, i.e number of students who… Can be
2. (1,2,3,4,) or continuous (1.5, 2.76, 3.413).
3. Qualitative – Non numerical, i.e male/female, plant species.
Data – results from measuring variables; set of measurements. Different categories – univariate,
Variables – Scales of Measurement I Scale defines the amount of information a variable contains and what statistical techniques can
On exam: Will give us a statistical problem and ask what test should be used; answered by
which scale and how many samples
1. Nominal – Lowest scale of measurement, no numerical value attached (least amount of
information). Classifies observations into mutually exclusive (can only be in one group)
and collectively exhaustive groups (there has to be a group something will fit in to).
Simply the name or category of the variable. Often called “categorical data”, i.e
occupation type, gender, etc.
2. Ordinal – Stronger scale as it allows data to be ordered or rank, i.e 12 largest towns in a
region, income by group (high low). Such as when you are asked which rate of income
group you are in (0-10,000, 10,000-30,000, 30,000-50,000, 50,000+ for example).
3. Interval – Unit distance separating numbers is important, i.e temperature (F or C). Does
not allow for ratios and does not have a true “zero”.
4. Ratio – strongest scale of measurement (most amount of information). Ratios of
distances on a number scale (you can say someone earns twice as much as someone
else). Presence of an absolute “zero”, i.e temperature (kelvin), income for all races.
In practice, we consider interval/ratio scales together.
Describing Data 1 – Graphs:
Pie charts – circular graph where measurements are distributed among categories. Good to see
how many people fall into certain groups.
Bar graph – graph where measurements are distributed among cateogires. Also good to see
how many people faill into certain groups.
Relative Frequency Histogram:
Graphs are quantitative rather than qualitative data; vertical axis (Y) shows “how often”
measurements fall into a particular class or subinterval (Frequency). Classes are plotted on the
horizontal (X) axis.
Rules of thumb: 5 to 12 intervals or categories. To find out how many bars, use 1 + (3.3)(log10)
(number of observations we have). Must be mutually exlusive and collectively exhaustive. Your
intervals should be the same width.
*** Make sure that when you’re making your graph for your histogram that you do not put a
space in between the bars. Also make sure to document everything you do and explain
Observations: 1, 11, 14, 21, 23, 27, 28, 33, 35, 50
# of classes: k = 1 + 3.3 Log10(10) = 4.3 rounded (always up) to 5.
Class width: Largest # - smallest # / # of classes = (50-1)/5 = 9.8 rounded (regularly) up to 10.
Income 000’s Frequency Relative Frequency 0-9.9 1 10%
10-19.9 2 20%
20-29.9 4 40%
30-39.9 2 20%
40-50 1 10%
Observing the Graph – Skewness
Is the distribution symmetrical? If it is, it’s called normal. When you have a large sample, you
usually end up with a normal sample. The direction of the skew is wherever the tail is. i.e if it’s
positively skewed, the crest will be on the left side of the graph and the tail will be on the right.
When it’s positive, it means that there is an “outlier” group, which is much more positive than the
rest of the data.
Observing the Graph – Mode
How many peaks are there? If there are two peaks/modes, then it is called bimodal.
Observing the Graph – Kurtosis
How distributed are the values? How peaked is the distribution?
A normal distribution is called mesokurtic. If it’s very flat, meaning that there’s significant
distance between the values of the observations it is called platykurtic. If the values tend to
cluster around a small number of observations, meaning that there is not a very large distance
between the values, it is called leptokurtic.
Describing Data 2 – Measures of the centre, measures of variability.
Standard deviation is the square root of the variation.
Statistics and Parameters:
Graphs are limited in what they can tell us.
Difficulty making inferences about a population when looking at a subset or sample.
Therefore, we need to use numerical measures.
Measures associated with the population are called parameters.
Measures associated with samples are called statistics.
Measures of the Centre:
Mean – Most commonly used measure of central tendency. It is the sum of all values or
observations divided by the number of observations.
Denominator of mean population is N whereas for sample it is n.
For sample, the x on the LHS of the equation has a horizontal line above it (x bar). For
population, it is mew x. For Riemann sums, whatever is underneath the sigma is which number
you start at, and on top of that is which number it goes to.
Example 2: Temperature data: 7.3, 10.7, 9.1, 8.4, 13.9, 9.4, 8.2.
The Riemann sum = all of them added up and then divided up 7.
Therefore, X bar = 67/7 = 9.57 = 9.6.
Rule of thumb for rounding: Round to the decimal place you have in your observation data. Less
than five, you round downwards; five or more, you round upwards.
Value occupying the “middle position” in an ordered set of observations.
Order the observations, lowest to highest, and locate the middle position.
Expressed as: 0.5(n+1)
Temperature data: : 7.3, 10.7, 9.1, 8.4, 13.9, 9.4, 8.2.
7.3, 8.2, 8.4, 9.1, 9.4, 10.7, 13.9
Therefore 9.1 is the median. Using the formula, .5(7+1) = 4 position in the ordered set.
Temperature data: 7.3, 8.4, 9.1, 9.4, 10.7, 13.9
Using formula, you get .5(6+1) = 3.5 position.
So, you add the 3 and 4 value, and then divide by two.
9.1 + 9.4 = 18.5 / 2 = 9.25 = 9.3.
Measures of the Centre (continued):
Mode – Value that occurs with the highest frequency; allows you to locate the peak of a relative
Choosing an Appropriate Measure:
Mean is usually the best measure as it is sensitive to change in a single observation.
Mean is not a good measure when the distribution is bimodal (2 modes), or when you have
skewed distributions, or if you have outliers (extreme values) are present in the data set
(because the outlier will pull the mean in whichever direction it is located.
Measures of Dispersion: Range: Simplest measure, difference between smallest/largest value at interval/ratio scale.
Influenced by outliers. Max value – min value.
Quartiles: Yields more information and is less affected by outliers. Data are divided into quartiles
(4 groups). Observations arranged in increasing order. If you have five groups, it’s called
Standard Deviance and Variance:
Two most commonly used measures of dispersion.
Compares value of each measure to the Mean (Xi – Xbar).
Two key properties of the mean/value relationship:
Sum of differences will always add up to zero.
Sum of squared differences will be the minimum sum possible. Called ‘Least Squares’ property.
Least squares property carries over into the calculation of Standard Deviance and Variance.
S^2 = Riemann (xi – xbar)^2 / n-1
Constructing a Worksheet:
Worksheet is a table where each column represents a component of the statistical formula.
st nd rd th
1 column = xi. 2 column = xbar. 3 column = difference. 4 column = difference squared.
Then, sum of all of 4 column and then divided by (n-1) = variance. Square root of that is
To check if you are doing it right, add up all of the 3 column, should equal 0.
Standard deviation gives us a standard picture of the width from the mean for each value.
Skewness: Measures the degree of symmetry in a frequency distribution. Determines how
evenly (or unevenly) the values are distributed either side of the mean. (3xbar – median)/S
Coefficient of Variation: Allows for comparison of variability spatial samples. Tests which sample
has the greatest variability.
Standard deviation or variance are absolute measures, so they are influenced by the size of the
values in the dataset.
To allow a comparison of variation across two or more geographic samples, you can use a
relative measure of dispersion called Coefficient of Variation.
CV = S / xbar
Example: Annual Rainfall
Station A Station B Station C
Xbar 92.6 97.3 38.8
S 16.6 12.8 9.1
CV 0.179 0.132 0.235 Station c has the greatest degree of variability.
Practical Significance of Standard Deviance:
Empirical rule: A certain percentage of your data will be +/- a certain value away from the mean.
For example, 68% of data are +/- 1 from the mean. 27% are +/- 2 SDs. 5% are +/- 3 SD’s.
If S is 5, then one standard deviation is +/- 5, 2 standard deviations is +/- 10.
Why do we need to consider measures of center and measures of dispersion when we’re
describing the shape of a distribution?
We need to consider measures of center because they show us where the most common value
is, whereas the measure of dispersion will show us how far most values stray from the mean.
Variance shows you how honest your mean is.
3 things you have to know about relative frequency histograms:
Placement of the first bar should be right up against the y-axis.
Spacing between bars should not have gaps.
Label your axes.
Z Scores – standard scores are referred to as Z scores. Indicates how many standard
deviations separate a particular value from the mean. Can be positive or negative, depending
on if they are greater or less than the mean. Z score of the mean is 0 and the standard deviation
is +/- 1.
Table of “Normal Values” provides probability information on a standardized scale.
The formula for calculating the z score involves comparing values to the mean value and
dividing by the standard deviation. The result is interpreted as the “number of standard
deviations an observation lies above or below the mean”.
Rainfall in Toronto.
Mean = 39.95 inches of rainfall. S = 7.5 inches.
Z score for 48 inches? Z = (48 – 39.95) / 7.5 = 8.05/7.5 = 1.07
Therefore, 48 inches is 1.07 standard deviations above the mean. Describing Bivariate Data
Simple bivariate graphs: Comparative Pie Charts, stacked bar graph.
Correlation: Allows us to observe, statistically, the relationship between two variables. Just
because two values are correlated, it doesn’t mean that one of them causes the other one. All
correlation gives you is the strength and direction of the relationship.
Most common graphic technique is the scatterplot. Each value we have has a value for the x
variable and a value for the y value. Scatterplots tell us the strength of the relationship and the
direction of it.
The narrower the line in a scatterplot, the stronger the bivariate relationship.
More rigorous approach to observing and measure strength and direction of a bivariate
Most constructed have a maximum value of 1.0 and can be positive or negative. +/- 1 is a
perfect positive/negative relationship.
Most common measure is “Pearson’s Product Moment Correlation” or “Pearson’s R”.
Used for interval/ratio scale data.
Spearman’s R is the correlation coefficient of nominal/ordinal scale data.
Pearson’s R and covariance.
Covariance measures the degree to which 2 variables vary together. Begins with deviations
around means of both variables or: (x-xbar) and (y-ybar).
CV xy[Sigma(x-xbar)(y-ybar)]/n-1 Sum of all d Waiting by my phone for you, but you were with
your other boo. You were getting wet by him, my eyes were getting wet by you eviations from the
mean multiplied by each other and divided by number of values minus 1. If this value is
negative, there’s a negative correlation. If it’s positive, there’s a positive correlation.
Anytime you’re dealing with a sample, you have to divide by n-1 to control the bias.
Xi Yi Xbar Ybar Xi – Xbar Yi – Ybar (Xi-Xbar)
* * * * * * *
* * * * * * *
Pearson’s R: Expressed as the ratio of the covariance of X and Y to the product of the standard
deviations of x and y.
R=( [Sigma(x-xbar)(y-ybar)]/n-1) / S S x y
If there is no relationship between the variables, Person’s R will be 0. 0.5 would be a moderate relationship. Closer to 1 is stronger.
Probability and Probability Distributions:
Studying spatial patterns is a key concern of geographers. We try to understand what has led to
those patterns. From that, we try to predict what patterns will come up in the future.
Geography is about describing, explaining, and predicting geographic patterns and processes.
Use probability for situations when patterns have some degree of uncertainty. For example,
weather forecasts – probability of precipitation.
Probability focuses on the occurrence of an event, where one of several possible outcomes
could result. These outcomes are (and must be) mutually exclusive. They can be thought of as
frequency of an event occurring relative to all possible outcomes.
P(A) = F(A) / F(E) where P(A) is probability of outcome A occurring, F(A) is absolute frequency
of A, and F(E) is frequency of all outcomes.
Maximum probability of an outcome is 1. All probabilities will add up to 1.
When we have a skewed distribution, what is a better measure of central tendency: Mean or
The median is a better measure because the mean will be influenced by the outliers and the
median will be a more accurate representation of the central tendency. The mean is more
sensitive to extreme values.
At least 0% of your values will lie between +/- 1 standard deviations.
“ 75% “ “ +/- 2 standard deviations.
“ 89% “ “ +/- 3 standard deviations.
General rule: K(standard deviations) 1 – (1/K ) = %
Empirical rule: No way to do “point estimates” due to it being related to intervals. No x/y-axis.
Only works when you have a normal distribution. Using this rule makes it easy to identify
On midterm – everything up to multiplication rule.
Probability Rules continued. Addition rule – used when finding probability of single independent events. P(A or B) = P(A) +
Ex. Probability of rolling a 5 or 6 is .167% each. So, add them up, you get .334 – that’s your
chance to roll either one.
Multiplication rule – P(A and B) = P(A) x P(B)
Ex. Rolling two sixes in a row - .167 x .167 = .028!
We often see consistent or typical patterns of probabilities in certain situations – probability
distributions (similar to frequency distributions; y axis contains probability of outcomes rather
than frequency of outcomes)
They can be both discrete and continuous – three key types: Binomial, poisson, normal
Discrete (whole numbers, not continuous) probability distribution. Used to determine the
probability of multiple events in independent trials. Each independent event has only 2 possible
outcomes (i.e either rain or no rain, flood or no flood).
Probability of event occurring is p, probability of it not occurring = 1-p = q
Discrete probability distribution. Used when looking at events that occur randomnly in space and
time. Especially used for distributions over space particularly with quadrat analysis of point
patterns. Also used when probability of an event occurring is less than it not occurring (rare
events, i.e tornados).
Most commonly applied distribution. It is the basis and provides the theoretical basis for
sampling theory and statistical inference. Need to look at the “area under the curve”. Total area
under the curve represents 100% of possible outcomes. 50% of values lie to the right of the
mean, and 50% to the left.
Need a methodology to effectively determine probability of values on the distribution. Could use
integral calculus; easier to use “Table of Normal Values”. Observations must be standardized to
use the table.
Sampling and Estimation
Aim of inferential statistics is to generalize about characteristics of larger population. Therefore,
we need a process to obtain a sample. Sampling can be spatial or non-spatial. Census tracts
are not always spatial.
Why Sample? Samples are necessary in cases of extremely large populations; also efficient and cost-effective.
Highly detailed information can be obtained easily. Allows for follow-up activity or repetition.
If a sample is representative, then it will accurately reflect the characteristics of the population,
without bias. Element of randomness must be introduced to preserve the representative sample
(selecting observations through a randomnized process). You can never eliminate bias, only
minimized it. Reducing bias means reducing error. Precision (reflects notion of large sample;
larger sample = more precise) and accuracy, help categorize sources of energy.
Number of different sampling designs exit: Simple randomn, systematic sampling, stratified, etc.
Can also have spatial sampling designs: Use Cartesian Coordinates, simple, stratified randomn,
Stratified randomn sampling:
Impose a grid on to a map, and then choose randomn points in the squares.
Transect sampling: Have two y-axes and two x-axes. Then, use endpoints to draw lines. Figure
out how long lines are, then find out how much land is used along the line. Then, add up all the
Sample statistics will change or vary for each randomn sample selected.
Sampling Distributions – Probability distributions for statistics
A sampling distribution is the distribution of a statistics that is drawn from all possible samples of
a given size n.
Can be developed for any statistics, not just the mean.
Central Limit Theorem
Sampling distribution will have its own mean and standard deviation.
But… the mean of a sampling distribution has important properties – summarized by Central
If all samples are randomnly drawn, and are independent, then the mean of the Sampling
Distribution of sample means will be the population mean, mew.
Central Limit Theorem continued
The frequency distribution of sample means will be normally distributed
What this means for us is that… when the sample size is large, the sample mean is likely to be
quite close to the population mean. Mean of a large sample is more likely to be closer to the true population mean than the mean of
a smaller sample.
Central Limit Theory – Variability:
Standard deviation of the sampling distribution is equal to the sample standard deviaton divided
y the square root of the sample size.
This is called the standard error of the Mean
Indicates how much a typical sample mean is likely to differ from the true population mean
Measures the amount of sampling error
The larger the sample, the smaller the amount of sampling error.
Anything over a sample size of 30 is a large sample.
Sigma xbar = s / root(n)]Stand error of a proportion, SEp = root (pq/n)
Statistical inference is concern about making decisions of predictions about population
parameters, using samples
Two easy ways to do this: estimation or hypothesis training.
Estimators are calculated using information from samples. Usually expressed as a
Two types: point, interval.
Practically, several statistics exist that could be point estimates.
How does the estimate behave in repeated sampling?
Two valuable characteristics of best estimator: unbiased, small variance.
Error of Estimation:
Under the empirical rule, 95% of all point estimates will lie within 2 (or more precisely, 1.96)
deviations from the mean.
If estimate is unbiased, the difference between the point estimate and the true parameter value
will be less than 1.96 standard deviations, or standard error.
Can call this the 95% margin of error.
Calculated as 1.96 * standard error.
Margin of Error:
N = 50, Xbar = 980 lbs, S = 105.
95% margin of error = 1.96 (s/ root n )
1.96(105/root50) = 29.1 lbs
Therefore, we can say with 95% confidence that the sample estimate of 980 lbs is within +/- 29
lbs of the population parameter.
As you adjust the margin of error, the margin falls and the confidence level decreases.
If you were to use the 90% margin of error, then…
1.65 = 105/root50 = 24.5, or +/- 24.5 lbs.
It works with proportions too… i.e 1.96 root(pq/n) Estimation:
15% of Canadians would rather vote in the US election: Survey.
The poll asked 2001 canadians over the age of 15 questions about how they perceive their role
and canada’s role in the world.
The survey has a margin of error of 2.2 percent, 19 times out of 20.
Most often you don’t know how precise the single sample mean as an estimator, i.e smaller
Place interval around the sample mean, and calculate the probability of the true population
mean falling within this interval.
General Formula: Xbar +/- (Z)(sigma xbar
Z values associated with say, 90% confidence level are +/- 1.65.
90% confidence interval is Xbar +/- 1.65(sigma xbar
Therefore, upper band of interval is Xbar + 1.65(sigma xbar
The lower band is Xbar – 1.65(sigma xbar
What does it mean if you are 95% confident?
If you constructed 20 intervals, each with different sample information, 19 out of 20 would
contain the population parameter mew, and 1 would not.
But, you can never be sure whether a particular interval contains mew.
1 – sigma Sigma