Lecture

# lecture 5

School
University of Toronto Mississauga
Sociology
SOC222H5
John Kervin
Winter

1 SOC 222 -- MEASURING the SOCIAL WORLD Session #5 -- INTRO to INFERENTIAL STATISTICS SPSS: E Notation When numbers are very small SPSS converts to E notation EG: 1.34E-2 = .0134 • 1.34 is the starting number • E means move the decimal point • minus sign – move to the left (negative direction) • 2 is number of spaces to move the decimal point EG: 7.89E-3 = .00789 • If no minus sign, or a plus sign, move decimal point to the right EG: 1.34E+4 = 13400 POPULATIONS & SAMPLES If we use sample… how good is the estimate for the population 1. Population: group of cases we are interested in, we have a question about that group 2. Sample: set of cases selected from the population 3. Simple random sample • random sample 1. Each case in the sample has been selected from some procedure involving chance 2. Each case in population has known probability of being selected 2 • simple random sample (SRS) – most important to us, each case has the same probability of being chosen • statistic- calculated from the sample, mean correlation ect.. things you can actually calculate • population characteristic- estimate for the population based an sample statistics • population value • parameter SAMPLES & POPULATION ESTIMATES • The sample effect size is our estimate of the effect size in the population. ACCURACY of ESTIMATES Rick Example • This is his RQ: what is the mean days absent for company employees? Rick knows: • The company has six employees- population Here’s what Rick doesn’t know: (the population data): Employee Days Absent Ann 1 Bob 3 Cathy 3 Donna 5 Ed 7 Farrah 9 ∑ xi 1+3+3+5+7+9 28 x= = = =4.67 N 6 6 3 Estimating a Population Mean from a Sample • Cathy and Farrah Summary: • The sample mean is 6.0 days absent, on average • So Rick’s estimate of the population mean is 6.0 days • The true population mean is 4.67 • So Rick’s estimate is inaccurate by 1.33 days. • Can never know the accuracy of a population estimate • Cant compute it GOOD AND BAD SAMPLES Employee Days Absent Ann 1 Bob 3 Cathy 3 Donna 5 Ed 7 Farrah 9 15 possible samples of size two • The fourth column is the sample accuracy: • Arbitrary: • High: estimate is off by 1 day or less • Means of 4 or 5 • Low: estimate is off by 2 days or more • Means of 2 or 7 or 8 Sample Days Absent Sample Mean Sample Accuracy A, B 1, 3 2.0 Low A, C 1, 3 2.0 Low A, D 1, 5 3.0 A, E 1, 7 4.0 High A, F 1, 9 5.0 High B, C 3, 3 3.0 B, D 3, 5 4.0 High B, E 3, 7 5.0 High B, F 3, 9 6.0 4 C, D 3, 5 4.0 High C, E 3, 7 5.0 High C, F 3, 9 6.0 D, E 5, 7 6.0 D, F 5, 9 7.0 Low E, F 7, 9 8.0 Low Employee Days Absent Ann 1 Bob 3 Cathy 3 Donna 5 Ed 7 Farrah 9 By picking cathy and ed we get a representative sample Representative sample: A sample with a distribution that matches the population distribution. Other samples would be really inaccurate Employee Days Absent Ann 1 Bob 3 Cathy 3 Donna 5 Ed 7 Farrah 9 • 6 samples give good estimates • 5 are so-so • 4 are bad The quality of the estimate depends on whether the sample is representative Sampling Error 5 Sampling error: the probability of drawing a sample that gives an inaccurate population estimate Rick’s Probability of a Bad Sample In the Rick example: • 15 possible samples • 4 were bad • So probability of drawing a bad sample is 4/15 • = 27% • This is the sampling error for Rick’s sample Estimating Sampling Error THE SAMPLING DISTRIBUTION He needs to know the means for all the data? So you cant calculate sampling error But he can try to estimate the sampling error
Related notes for SOC222H5

