SOC 222 -- MEASURING the SOCIAL WORLD
Session #5 -- INTRO to INFERENTIAL STATISTICS
** Issues with blackboard and marking.
Just tell us the numbers SPSS gives, no rounding or adding zeros.
Where we are
Populations and Samples
Samples & Population Estimates
Accuracy of Estimate
Good and Bad Samples
The Sampling Distribution
The Standard Error
Other Stuff in the Text
SPSS: E Notation
When numbers are very small
SPSS converts to E notation
EG: 1.34E-2 = .0134- the number you would add on the test.
E means move the deciamle point minus means move to the left. 2 is the number of
spaces to move.
• 1.34 is the starting number
• E means move the decimal point
• minus sign – move to the left (negative direction)
• 2 is number of spaces to move the decimal point
• plus sign you move it to the right
EG: 7.89E-3 = .00789
• If no minus sign, or a plus sign, move decimal point to the right 2
EG: 1.34E+4 = 13400
1. what is the effect size? The answer the question gives us the relationship
2. we did this for cat cat rat rat cat rat relations : today we look at the second
if we use a sample, how good is the estimate for the population? This is the topic
of inferential statistics we infer something about the population based on the
POPULATIONS & SAMPLES
1. Population: a group that we are interested in ex it could be Canadian cities.
Research question about a particular group.
2. Sample: a set of cases selected from the population. How we select them? Or
how many? (not things we will cover in this class).
3. Simple random sample:
• Random sample:
1. Each case in the sample has been selected by some procedure
2. Each case in the population has a known probability of being
selected. We know the chance of any particular case being selected. (we
need this for inferential statistics).
Different types of random sample
• simple random sample (SRS)-each case has the same
probability of being chosen in the population.
• Statistic- what we calculate from the sample is the statistic. Sample of students
would be the mean age. Sample of cites the slop of regression line that relates
poverty to the city would be an example of the statistic. Statistic is something you
can actually calculate. Mean correlation regression slope are all statistic,
statisitics is what we get from samples
• population characteristic- we estimate them based on sample statistics. What
we estimate for a population is called several things: population characteristic, 3
value or parameter. Population characteristic is what we estimate based on the
• population value
SAMPLES & POPULATION ESTIMATES
We don’t have time to sample whole population so we draw a sample.
• The sample effect size is our estimate of the effect size in the population.
Pop. (we did not get info from the entire population, so we are not certain the sample
statistic matches the population. No sample is likely to be perfectly accurate.
Want to know the effect size of x on y on the population ie) all utm students, all employed
Canadians in Canada. From these cases we draw a sample. Ie) time spent study effect
on marks. Population being all utm students.. we cant target all utm students so we
sample 100. The sample we draw is the source of our data. If we found in our sample a
correltion .36 between studying and mark then our estimate between the population
would be the same.
ACCURACY of ESTIMATES
Because we used a sample we didn't get information from every person in the
population. We cant be certain the population statistic is the same as the sample. If
we drew another sample from that population we may get a different result. Why would
some samples be inaccurate: RICK EXAMPLE – 4
Not looking at an X-Y relationship. Just looking at a simple problem – estimate a mean
for a certain population. He has a population of interest but can only afford a sample.
• This is his RQ: what is the mean days absent for company employees?
• The company has six employees (that’s his population) Also he can get
information on each employee from the department.
Here’s what Rick doesn’t know: (the population data):
Employee Days Absent
x=∑ x= 1+3+3+5+7+9 = 28=4.67
N 6 6
4.67 is (mean average for days absent)
Sample size two out of those six, rolls a die (simple random sample). 5
Estimating a Population Mean from a Sample
• Cathy and Farrah
• The sample mean is 6.0 days absent, on average
• So Rick’s estimate of the population mean is 6.0 days
• The true population mean is 4.67
• So Rick’s estimate is inaccurate by 1.33 days.
However we know the real population mean is 4.67. How do we calculate the accuracy
of a population estimate. –unforntantly we cant we can never know because we cannot
compute the accuracy of a population estimate.
Instead we use inferential stats.
GOOD AND BAD SAMPLES
Rick wants to know how off his days are.
Why does a sample give an inaccurate estimate-its because of the chance involved in a
simple random sample. It depends on what sample you draw.
Chance whether we get a good or bad samples..
Employee Days Absent
Farrah 9 6
If we draw samples of two there are 15 possible samples we can draw
In the first sample. [letters rep names]
Second: days absent
4 : how accurate
15 possible samples of size two
Here they are.
• The fourth column is the sample accuracy:
• High: estimate is off by 1 day or less
• Means of 4 or 5
• Low: estimate is off by 2 days or more
• Means of 2 or 7 or 8
Sample Days Absent Sample Mean Sample
A, B 1, 3 2.0 Low
A, C 1, 3 2.0 Low
A, D 1, 5 3.0
A, E 1, 7 4.0 High
A, F 1, 9 5.0 High
B, C 3, 3 3.0
B, D 3, 5 4.0 High
B, E 3, 7 5.0 High
B, F 3, 9 6.0
C, D 3, 5 4.0 High
C, E 3, 7 5.0 High
C, F 3, 9 6.0
D, E 5, 7 6.0
D, F 5, 9 7.0 Low
E, F 7, 9 8.0 Low
High means estimate off by one day or less, (high accuracy)
Low means your off by two or more days.
If you draw a sample of Cathy and Ed you did well, this is called drawing a
representative sample. Cathy is not absent much where Ed is absent many times.
Employee Days Absent
Donna 5 7
Why is it so good: because here we have a representative sample.. This is a sample
with a distribution that matches the population distribution
How does this happen: if we look at Cathy this is pretty typical of people who don't take
Ed is more representative of those who are absent often
Representative sample: A sample with a distribution that matches the population
This sample would be very inaccurate, not representative sample because Ann and bob
are examples of people who are not absent often, (sample error)
Employee Days Absent
• 6 samples give good estimates
• 5 are so-so
• 4 are bad
its not a representative sample. Ann and bob are typical of those who aren’t absent of a
lot doesn't have anybody who is absent often.
The quality of the estimate depends on whether the sample is representative
There is a chance of an error any time we draw a sample, whether good or bad.
Sampling Error- the probability of drawing a bad estimate 8
There is a chance we have an error in the population representation
The probability of drawing an inaccurate estimate is a sampling error
This is related to chance, there is a chance that we may get a sampling error. There is
always a chance you may pick an unrepresentative sample
Sampling error: the probability of drawing a sample that gives an inaccurate
Rick’s Probability of a Bad Sample
Rick cannot calculate the accuracy of his estimate. Next best thing is if he can calculate
the probability of drawing a bad sample. He doesn't know if he has a good bad or in
between sample. So rick cannot calculate the accuracy of his population. If risk new his
sampling error that would be good because he would know the chance of getting a bad
sample. Unfortunately rick cant even do this.
In the Rick example:
• 15 possible samples
• 4 were bad
• So probability of drawing a bad sample is 4/15
• = 27%
• This is the sampling error for Rick’s sample
Estimating Sampling Error
what if he could estimate his sampling error: if rick can estimate his sampling error then
he has something. If he knows that drawing a bad sample is high, he wont put to much
faith in. if he knows to that drawing a bad sample is low, then he can have confidence in
the population estimate. So if we can estimate the sampling error we know how much we
can apply on the population characteristic!
He doesn’t know the means of the sample, which is why he cannot calculate his sample
error. So what can he do? He can estimate his sampling error. If rick can estimate this
then he will know if he is drawing a good, moderate or poor sample. ISSUE: how do you
estimate the sampling error???(the probability of drawing a sample that gives and
inaccurate population estimate). To do this we turn to sampling distribution.
THE SAMPLING DISTRIBUTION 9
We do this to see how many bad samples, to see the quality. By eye balling this
table the sample means range from 2 at the lowest and 8 of the highest. How many
of these are bad? Easier to sse in a graphic.
Smallest sample: 2 days absent
Biggest sample: 8.0
How many of those 15 samples are bad: put each sample in p