Class Notes (807,255)
Canada (492,677)
Sociology (3,969)
SOC222H5 (88)

SOC222 Lecture 5

11 Pages
Unlock Document

University of Toronto Mississauga
John Kervin

1 SOC 222 -- MEASURING the SOCIAL WORLD Session #5 -- INTRO to INFERENTIAL STATISTICS Oct 2013 TODAY’S OBJECTIVES 1. Know why samples may not be accurate 2. Know how to estimate the accuracy of a population estimate 3. Know how to estimate the standard error from the sample standard deviation Terms to Know population sample statistic simple random sample representative sample sampling error sampling distribution standard deviation standard error SAMPLES & ACCURACY We want to know how strongly related x is to y, but for a large population We draw a sample b/c it would cost too much to get info from every case in the pop The answer for the pop is the same as for the sample – a good UNBIASED estimate But b/c a sample was used, cannot be totally sure if sample results would be identical to real pop results If we drew a second sample, and it has a different estimate – astonished if the same What about a third sample? Which one is the best? Better estimate = closer to real value in pop  more accuracy Don’t know real value so how can we tell how accurate we are? EG: 1. Pop = all UTM students (case = student) 2. Pop = all employed adults in Canada (case = employed adult) 3. Pop = all Canadian cities (case = city) 4. Pop = all residents of Mississauga (case = person living in Mississauga) 1. 100 UTM students 2. 5000 employed adults in Canada 3. 25 Canadian cities 4. 100 residents of Mississauga • The sample result gives us an estimate of the effect size in the population. • We can’t be totally sure if the sample result would be identical to the population result Better estimate = closer to the population value 2 • When sampling, how accurate is our estimate of the relationship strength in the population? Has to do with accuracy, not measure of effect size (we got that from the sample) Is it an accurate measure of effect size for the pop? POPULATIONS & SAMPLES 1. Population: a group of cases we’re interested in - We have some RQ about it - Canadian cities – do Canadian cities with more pop have higher crime rates? Get answer from sample 2. Sample: a set of cases selected from the population - Number of ways to collect a sample - How big a sample should you select? (how many cases) 3. Simple random sample (SRS): one kind of sample; (1) each case in sample has selected by some procedure involving chance (rolling a die, look at table of random #s, ask computer to spew out a random sample); (2) every case in pop has a known probability of being selected for the sample – assumption that inferential statistics requires (easiest one is everyone has a same chance) Statistic – what we calculate from a sample (if we calculate a variance or a correlation b/w age and height or mean in a sample, are all statistics) - Computed from sample data with unknown parameter and used to infer unknown parameters What we estimate from the pop is called several things: population characteristic (preferred term) population value parameter You get a statistic from your sample to estimate a characteristic in the pop RQ – is public speaking related to GPA  average GPA in sample = statistic, estimated avg GPA in pop and estimated relationship b/w the two in pop are pop characteristics THE RICK EXAMPLE • Rick’s RQ: what is the mean [for] days absent for company employees? Rick knows: • The company has six employees Rick doesn’t know: (the population data): Employee Days Absent Ann 1 Bob 3 Cathy 3 Donna 5 Ed 7 Farrah 9 • The mean days absent is ∑ xi 1+3+3+5+7+9 28 x= = = =4.67 N 6 6 Estimating a Population Mean 3 Rick’s [random] sample size: 2 • His sample is Cathy and Farrah • Cathy was absent 3 days • Farrah was absent 9 days • The mean: 6.00 days • The estimate of the population mean is 6.00 Sample mean is always unbiased estimate of pop mean unbiased estimate Kranzler, p. 103 Being a good researcher means asking a second question: how accurate is this answer? Estimating Accuracy of Population Estimates • Rick’s estimate is off by 1.33 days • Rick’s second estimate: an estimate of the accuracy of his population characteristic estimate - Inferential stats SAMPLING ERROR Employee Days Absent Ann 1 Bob 3 Cathy 3 Donna 5 Ed 7 Farrah 9 • EG #1: • Cathy and Ed • This sample is a representative sample Representative sample: A sample with a distribution that matches the population distribution. This sample gives an accurate estimate – off by 0.33 Cathy = pretty typical of those who are not absent a lot Ed = pretty typical of those are absent a lot • EG #2: • Ann and Bob Sampling error: the probability of drawing a sample that gives an inaccurate population estimate. A random sample can give an inaccurate sample due to CHANCE Mean would be 2 days – off by 2.67 days which is a lot, not a representative sample Both are typical of people who are not absent a lot There is a chance of an error or inaccurate estimate whenever we draw sample Happens b/c always chance you won’t draw a good sample Probability of a Bad Sample Next best thing: 4 • Estimate the accuracy of his estimate • IF Rick can figure out the chance of drawing a bad sample If chance is high, then likely his estimate is wrong If chance is low = likely right The key: • What’s the probability of drawing a bad sample? • A sample that gives an inaccurate estimate • Because the sample isn’t representative sampling error Rick’s problem becomes: • How can I estimate the sampling error? SAMPLING DISTRIBUTION Drawn samples will be on the following 15: Sample Days Absent Sample Mean A, B 1, 3 2.0 A, C 1, 3 2.0 A, D 1, 5 3.0 A, E 1, 7 4.0 A, F 1, 9 5.0 B, C 3, 3 3.0 B, D 3, 5 4.0 B, E 3, 7 5.0 B, F 3, 9 6.0 C, D 3, 5 4.0 C, E 3, 7 5.0 C, F 3, 9 6.0 D, E 5, 7 6.0 D, F 5, 9 7.0 E, F 7, 9 8.0 *Rick cannot do this since he doesn’t have all the info* sample means range between 2.0 and 8.0 Hard to see which ones are bad samples 5 • Each blue box is one possible sample Donna & Ed 1 2 3 4 5 6 7 8 Sample Mean ^This looks a lot like a histogram – use SPSS to get actual histogram: • The height of the graph at any point is just the number of cases at that point This frequency distribution has a name: Sampling Distribution Important for theoretical role in inferential stats, but no one actually calculates it 6 NOTE #1: Sampling distribution: the distribution of the means (or other statistic) of all possible samples of size N. NOTE #2: The mea
More Less

Related notes for SOC222H5

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.