SOC 222 -- MEASURING the SOCIAL WORLD
Session #5 -- INTRO to INFERENTIAL STATISTICS
SPSS: E Notation
When numbers are very small
SPSS converts to E notation
EG: 1.34E-2 = .0134
• 1.34 is the starting number
• E means move the decimal point
• minus sign – move to the left (negative direction)
• 2 is number of spaces to move the decimal point
EG: 7.89E-3 = .00789
• If no minus sign, or a plus sign, move decimal point to the right
EG: 1.34E+4 = 13400
POPULATIONS & SAMPLES
If we use sample… how good is the estimate for the population
1. Population: group of cases we are interested in, we have a question about that
group
2. Sample: set of cases selected from the population
3. Simple random sample
• random sample
1. Each case in the sample has been selected from some procedure
involving chance
2. Each case in population has known probability of being selected 2
• simple random sample (SRS) – most important to us, each case
has the same probability of being chosen
• statistic- calculated from the sample, mean correlation ect.. things you can
actually calculate
• population characteristic- estimate for the population based an sample statistics
• population value
• parameter
SAMPLES & POPULATION ESTIMATES
• The sample effect size is our estimate of the effect size in the population.
ACCURACY of ESTIMATES
Rick Example
• This is his RQ: what is the mean days absent for company employees?
Rick knows:
• The company has six employees- population
Here’s what Rick doesn’t know: (the population data):
Employee Days Absent
Ann 1
Bob 3
Cathy 3
Donna 5
Ed 7
Farrah 9
∑ xi 1+3+3+5+7+9 28
x= = = =4.67
N 6 6 3
Estimating a Population Mean from a Sample
• Cathy and Farrah
Summary:
• The sample mean is 6.0 days absent, on average
• So Rick’s estimate of the population mean is 6.0 days
• The true population mean is 4.67
• So Rick’s estimate is inaccurate by 1.33 days.
• Can never know the accuracy of a population estimate
• Cant compute it
GOOD AND BAD SAMPLES
Employee Days Absent
Ann 1
Bob 3
Cathy 3
Donna 5
Ed 7
Farrah 9
15 possible samples of size two
• The fourth column is the sample accuracy:
• Arbitrary:
• High: estimate is off by 1 day or less
• Means of 4 or 5
• Low: estimate is off by 2 days or more
• Means of 2 or 7 or 8
Sample Days Absent Sample Mean Sample
Accuracy
A, B 1, 3 2.0 Low
A, C 1, 3 2.0 Low
A, D 1, 5 3.0
A, E 1, 7 4.0 High
A, F 1, 9 5.0 High
B, C 3, 3 3.0
B, D 3, 5 4.0 High
B, E 3, 7 5.0 High
B, F 3, 9 6.0 4
C, D 3, 5 4.0 High
C, E 3, 7 5.0 High
C, F 3, 9 6.0
D, E 5, 7 6.0
D, F 5, 9 7.0 Low
E, F 7, 9 8.0 Low
Employee Days Absent
Ann 1
Bob 3
Cathy 3
Donna 5
Ed 7
Farrah 9
By picking cathy and ed we get a representative sample
Representative sample: A sample with a distribution that matches the population
distribution.
Other samples would be really inaccurate
Employee Days Absent
Ann 1
Bob 3
Cathy 3
Donna 5
Ed 7
Farrah 9
• 6 samples give good estimates
• 5 are so-so
• 4 are bad
The quality of the estimate depends on whether the sample is representative
Sampling Error 5
Sampling error: the probability of drawing a sample that gives an inaccurate
population estimate
Rick’s Probability of a Bad Sample
In the Rick example:
• 15 possible samples
• 4 were bad
• So probability of drawing a bad sample is 4/15
• = 27%
• This is the sampling error for Rick’s sample
Estimating Sampling Error
THE SAMPLING DISTRIBUTION
He needs to know the means for all the data? So you cant calculate sampling error
But he can try to estimate the sampling error

