Class Notes
(807,255)

Canada
(492,677)

University of Toronto Mississauga
(23,314)

Sociology
(3,969)

SOC222H5
(88)

John Kervin
(32)

Lecture

# SOC222 Lecture 5

Unlock Document

University of Toronto Mississauga

Sociology

SOC222H5

John Kervin

Fall

Description

1
SOC 222 -- MEASURING the SOCIAL WORLD
Session #5 -- INTRO to INFERENTIAL STATISTICS
Oct 2013
TODAY’S OBJECTIVES
1. Know why samples may not be accurate
2. Know how to estimate the accuracy of a population estimate
3. Know how to estimate the standard error from the sample standard deviation
Terms to Know
population
sample
statistic
simple random sample
representative sample
sampling error
sampling distribution
standard deviation
standard error
SAMPLES & ACCURACY
We want to know how strongly related x is to y, but for a large population
We draw a sample b/c it would cost too much to get info from every case in the pop
The answer for the pop is the same as for the sample – a good UNBIASED estimate
But b/c a sample was used, cannot be totally sure if sample results would be
identical to real pop results
If we drew a second sample, and it has a different estimate – astonished if the same
What about a third sample? Which one is the best?
Better estimate = closer to real value in pop more accuracy
Don’t know real value so how can we tell how accurate we are?
EG:
1. Pop = all UTM students (case = student)
2. Pop = all employed adults in Canada (case = employed adult)
3. Pop = all Canadian cities (case = city)
4. Pop = all residents of Mississauga (case = person living in Mississauga)
1. 100 UTM students
2. 5000 employed adults in Canada
3. 25 Canadian cities
4. 100 residents of Mississauga
• The sample result gives us an estimate of the effect size in the population.
• We can’t be totally sure if the sample result would be identical to the population
result
Better estimate = closer to the population value 2
• When sampling, how accurate is our estimate of the relationship strength
in the population?
Has to do with accuracy, not measure of effect size (we got that from the sample)
Is it an accurate measure of effect size for the pop?
POPULATIONS & SAMPLES
1. Population: a group of cases we’re interested in
- We have some RQ about it
- Canadian cities – do Canadian cities with more pop have higher crime rates? Get
answer from sample
2. Sample: a set of cases selected from the population
- Number of ways to collect a sample
- How big a sample should you select? (how many cases)
3. Simple random sample (SRS): one kind of sample; (1) each case in sample
has selected by some procedure involving chance (rolling a die, look at table of
random #s, ask computer to spew out a random sample); (2) every case in pop has
a known probability of being selected for the sample – assumption that inferential
statistics requires (easiest one is everyone has a same chance)
Statistic – what we calculate from a sample (if we calculate a variance or a correlation
b/w age and height or mean in a sample, are all statistics)
- Computed from sample data with unknown parameter and used to infer unknown
parameters
What we estimate from the pop is called several things:
population characteristic (preferred term)
population value
parameter
You get a statistic from your sample to estimate a characteristic in the pop
RQ – is public speaking related to GPA average GPA in sample = statistic, estimated
avg GPA in pop and estimated relationship b/w the two in pop are pop characteristics
THE RICK EXAMPLE
• Rick’s RQ: what is the mean [for] days absent for company employees?
Rick knows:
• The company has six employees
Rick doesn’t know: (the population data):
Employee Days Absent
Ann 1
Bob 3
Cathy 3
Donna 5
Ed 7
Farrah 9
• The mean days absent is
∑ xi 1+3+3+5+7+9 28
x= = = =4.67
N 6 6
Estimating a Population Mean 3
Rick’s [random] sample size: 2
• His sample is Cathy and Farrah
• Cathy was absent 3 days
• Farrah was absent 9 days
• The mean: 6.00 days
• The estimate of the population mean is 6.00
Sample mean is always unbiased estimate of pop mean
unbiased estimate
Kranzler, p. 103
Being a good researcher means asking a second question: how accurate is this
answer?
Estimating Accuracy of Population Estimates
• Rick’s estimate is off by 1.33 days
• Rick’s second estimate: an estimate of the accuracy of his population
characteristic estimate
- Inferential stats
SAMPLING ERROR
Employee Days Absent
Ann 1
Bob 3
Cathy 3
Donna 5
Ed 7
Farrah 9
• EG #1:
• Cathy and Ed
• This sample is a representative sample
Representative sample: A sample with a distribution that matches the population
distribution.
This sample gives an accurate estimate – off by 0.33
Cathy = pretty typical of those who are not absent a lot
Ed = pretty typical of those are absent a lot
• EG #2:
• Ann and Bob
Sampling error: the probability of drawing a sample that gives an inaccurate
population estimate.
A random sample can give an inaccurate sample due to CHANCE
Mean would be 2 days – off by 2.67 days which is a lot, not a representative sample
Both are typical of people who are not absent a lot
There is a chance of an error or inaccurate estimate whenever we draw sample
Happens b/c always chance you won’t draw a good sample
Probability of a Bad Sample
Next best thing: 4
• Estimate the accuracy of his estimate
• IF Rick can figure out the chance of drawing a bad sample
If chance is high, then likely his estimate is wrong
If chance is low = likely right
The key:
• What’s the probability of drawing a bad sample?
• A sample that gives an inaccurate estimate
• Because the sample isn’t representative
sampling error
Rick’s problem becomes:
• How can I estimate the sampling error?
SAMPLING DISTRIBUTION
Drawn samples will be on the following 15:
Sample Days Absent Sample Mean
A, B 1, 3 2.0
A, C 1, 3 2.0
A, D 1, 5 3.0
A, E 1, 7 4.0
A, F 1, 9 5.0
B, C 3, 3 3.0
B, D 3, 5 4.0
B, E 3, 7 5.0
B, F 3, 9 6.0
C, D 3, 5 4.0
C, E 3, 7 5.0
C, F 3, 9 6.0
D, E 5, 7 6.0
D, F 5, 9 7.0
E, F 7, 9 8.0
*Rick cannot do this since he doesn’t have all the info*
sample means range between 2.0 and 8.0
Hard to see which ones are bad samples 5
• Each blue box is one possible sample
Donna & Ed
1 2 3 4 5 6 7 8
Sample Mean
^This looks a lot like a histogram – use SPSS to get actual histogram:
• The height of the graph at any point is just the number of cases at that point
This frequency distribution has a name: Sampling Distribution
Important for theoretical role in inferential stats, but no one actually calculates it 6
NOTE #1:
Sampling distribution: the distribution of the means (or other statistic) of all
possible samples of size N.
NOTE #2:
The mea

More
Less
Related notes for SOC222H5