Class Notes (837,840)
Canada (510,505)
Sociology (4,081)
SOC222H5 (93)
Lecture

222-Feb 8th Lecture#5.docx

20 Pages
42 Views
Unlock Document

Department
Sociology
Course
SOC222H5
Professor
John Kervin
Semester
Spring

Description
1 SOC 222 -- MEASURING the SOCIAL WORLD Session #5 -- INTRO to INFERENTIAL STATISTICS ** Issues with blackboard and marking. Just tell us the numbers SPSS gives, no rounding or adding zeros. Announcements Where we are Populations and Samples Samples & Population Estimates Accuracy of Estimate Good and Bad Samples The Sampling Distribution The Standard Error Other Stuff in the Text Tutorial Vote SPSS: E Notation When numbers are very small SPSS converts to E notation EG: 1.34E-2 = .0134- the number you would add on the test. E means move the deciamle point minus means move to the left. 2 is the number of spaces to move. • 1.34 is the starting number • E means move the decimal point • minus sign – move to the left (negative direction) • 2 is number of spaces to move the decimal point • plus sign you move it to the right EG: 7.89E-3 = .00789 • If no minus sign, or a plus sign, move decimal point to the right 2 EG: 1.34E+4 = 13400 1. what is the effect size? The answer the question gives us the relationship 2. we did this for cat cat rat rat cat rat relations : today we look at the second question: if we use a sample, how good is the estimate for the population? This is the topic of inferential statistics we infer something about the population based on the sample. POPULATIONS & SAMPLES 1. Population: a group that we are interested in ex it could be Canadian cities. Research question about a particular group. 2. Sample: a set of cases selected from the population. How we select them? Or how many? (not things we will cover in this class). 3. Simple random sample: • Random sample: 1. Each case in the sample has been selected by some procedure involving chance. 2. Each case in the population has a known probability of being selected. We know the chance of any particular case being selected. (we need this for inferential statistics). Different types of random sample • simple random sample (SRS)-each case has the same probability of being chosen in the population. • Statistic- what we calculate from the sample is the statistic. Sample of students would be the mean age. Sample of cites the slop of regression line that relates poverty to the city would be an example of the statistic. Statistic is something you can actually calculate. Mean correlation regression slope are all statistic, statisitics is what we get from samples • population characteristic- we estimate them based on sample statistics. What we estimate for a population is called several things: population characteristic, 3 value or parameter. Population characteristic is what we estimate based on the population. • population value • parameter SAMPLES & POPULATION ESTIMATES  We don’t have time to sample whole population so we draw a sample. • The sample effect size is our estimate of the effect size in the population. Pop. (we did not get info from the entire population, so we are not certain the sample statistic matches the population. No sample is likely to be perfectly accurate. Want to know the effect size of x on y on the population ie) all utm students, all employed Canadians in Canada. From these cases we draw a sample. Ie) time spent study effect on marks. Population being all utm students.. we cant target all utm students so we sample 100. The sample we draw is the source of our data. If we found in our sample a correltion .36 between studying and mark then our estimate between the population would be the same. ACCURACY of ESTIMATES Because we used a sample we didn't get information from every person in the population. We cant be certain the population statistic is the same as the sample. If we drew another sample from that population we may get a different result. Why would some samples be inaccurate: RICK EXAMPLE – 4 Rick Example Not looking at an X-Y relationship. Just looking at a simple problem – estimate a mean for a certain population. He has a population of interest but can only afford a sample. • This is his RQ: what is the mean days absent for company employees? Rick knows: • The company has six employees (that’s his population) Also he can get information on each employee from the department. Here’s what Rick doesn’t know: (the population data): Employee Days Absent Ann 1 Bob 3 Cathy 3 Donna 5 Ed 7 Farrah 9 x=∑ x= 1+3+3+5+7+9 = 28=4.67 N 6 6 4.67 is (mean average for days absent) Sample size two out of those six, rolls a die (simple random sample). 5 Estimating a Population Mean from a Sample • Cathy and Farrah Summary: • The sample mean is 6.0 days absent, on average • So Rick’s estimate of the population mean is 6.0 days • The true population mean is 4.67 • So Rick’s estimate is inaccurate by 1.33 days. However we know the real population mean is 4.67. How do we calculate the accuracy of a population estimate. –unforntantly we cant we can never know because we cannot compute the accuracy of a population estimate. Instead we use inferential stats. GOOD AND BAD SAMPLES Rick wants to know how off his days are. Why does a sample give an inaccurate estimate-its because of the chance involved in a simple random sample. It depends on what sample you draw. Chance whether we get a good or bad samples.. Employee Days Absent Ann 1 Bob 3 Cathy 3 Donna 5 Ed 7 Farrah 9 6 If we draw samples of two there are 15 possible samples we can draw In the first sample. [letters rep names] Second: days absent Means th 4 : how accurate 15 possible samples of size two Here they are. • The fourth column is the sample accuracy: • Arbitrary: • High: estimate is off by 1 day or less • Means of 4 or 5 • Low: estimate is off by 2 days or more • Means of 2 or 7 or 8 Sample Days Absent Sample Mean Sample Accuracy A, B 1, 3 2.0 Low A, C 1, 3 2.0 Low A, D 1, 5 3.0 A, E 1, 7 4.0 High A, F 1, 9 5.0 High B, C 3, 3 3.0 B, D 3, 5 4.0 High B, E 3, 7 5.0 High B, F 3, 9 6.0 C, D 3, 5 4.0 High C, E 3, 7 5.0 High C, F 3, 9 6.0 D, E 5, 7 6.0 D, F 5, 9 7.0 Low E, F 7, 9 8.0 Low High means estimate off by one day or less, (high accuracy) Low means your off by two or more days. If you draw a sample of Cathy and Ed you did well, this is called drawing a representative sample. Cathy is not absent much where Ed is absent many times. Employee Days Absent Ann 1 Bob 3 Cathy 3 Donna 5 7 Ed 7 Farrah 9 ***important** Why is it so good: because here we have a representative sample.. This is a sample with a distribution that matches the population distribution How does this happen: if we look at Cathy this is pretty typical of people who don't take days off Ed is more representative of those who are absent often Representative sample: A sample with a distribution that matches the population distribution. This sample would be very inaccurate, not representative sample because Ann and bob are examples of people who are not absent often, (sample error) Employee Days Absent Ann 1 Bob 3 Cathy 3 Donna 5 Ed 7 Farrah 9 IMPORTAANT: • 6 samples give good estimates • 5 are so-so • 4 are bad its not a representative sample. Ann and bob are typical of those who aren’t absent of a lot doesn't have anybody who is absent often. The quality of the estimate depends on whether the sample is representative There is a chance of an error any time we draw a sample, whether good or bad. Sampling Error- the probability of drawing a bad estimate 8 There is a chance we have an error in the population representation The probability of drawing an inaccurate estimate is a sampling error This is related to chance, there is a chance that we may get a sampling error. There is always a chance you may pick an unrepresentative sample Sampling error: the probability of drawing a sample that gives an inaccurate population estimate. Rick’s Probability of a Bad Sample Rick cannot calculate the accuracy of his estimate. Next best thing is if he can calculate the probability of drawing a bad sample. He doesn't know if he has a good bad or in between sample. So rick cannot calculate the accuracy of his population. If risk new his sampling error that would be good because he would know the chance of getting a bad sample. Unfortunately rick cant even do this. In the Rick example: • 15 possible samples • 4 were bad • So probability of drawing a bad sample is 4/15 • = 27% • This is the sampling error for Rick’s sample Estimating Sampling Error what if he could estimate his sampling error: if rick can estimate his sampling error then he has something. If he knows that drawing a bad sample is high, he wont put to much faith in. if he knows to that drawing a bad sample is low, then he can have confidence in the population estimate. So if we can estimate the sampling error we know how much we can apply on the population characteristic! He doesn’t know the means of the sample, which is why he cannot calculate his sample error. So what can he do? He can estimate his sampling error. If rick can estimate this then he will know if he is drawing a good, moderate or poor sample. ISSUE: how do you estimate the sampling error???(the probability of drawing a sample that gives and inaccurate population estimate). To do this we turn to sampling distribution. THE SAMPLING DISTRIBUTION 9 We do this to see how many bad samples, to see the quality. By eye balling this table the sample means range from 2 at the lowest and 8 of the highest. How many of these are bad? Easier to sse in a graphic. Smallest sample: 2 days absent Biggest sample: 8.0 How many of those 15 samples are bad: put each sample in p
More Less

Related notes for SOC222H5

Log In


OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit