Class Notes (838,076)
Canada (510,662)
Statistics (248)
STAT151 (157)
Susan Kamp (11)
Lecture

11, 12, 13.pdf

20 Pages
81 Views
Unlock Document

Department
Statistics
Course
STAT151
Professor
Susan Kamp
Semester
Fall

Description
Ch 11 – 13 Gathering Data Ch 11 – 13 Gathering Data Does a Census Make Sense? - Census is a special sample that includes everyone and “sample” the entire population. - There are problems with taking a census: o Cost o Undercount o Time o Complexity Recall: Populations and Parameters - A parameter that is part of a model for a population is called a population parameter (often unknown) - A summary that is found from data in a sample are called sample statistics (are known once the data are observed) Two Types of Conclusions (Inferences) a)Population Inference - Results from the sample can be generalized to an entire population (as estimates) 1 of 20 Ch 11 – 13 Gathering Data b)Causal (cause-and-effect) Inference - The difference in the responses is caused by the difference in treatments when comparing the results from two treatment groups. Population Inference 1)We should only make population inferences when we have random sampling (ie. randomly select individuals in the samples from the population). o Randomizing protects us from the influences of all the features of our population, even ones that we may not have thought about.  Randomizing makes sure that on the average the sample looks like the rest of the population. o Non-random sampling leads to biased results (results that tend to over- or under- emphasize some characteristics of the population)  There is usually no way to fix a biased sample and no way to salvage useful information from it.  The best way to avoid bias is to select individuals for the sample at random. 2 of 20 Ch 11 – 13 Gathering Data o When we do not have random sampling from a population, conclusions should be restricted to the sample. That is, we should not generalize our results in the sample to anyone else. Random Sampling Methods 1)Simple Random Samples (SRS) - SRS of size n: each sample of size n in the population has the same chance of being selected. o Ex: put all the names of the individuals in the population in a hat and draw names to complete the sample o Ex: random number tables - Samples drawn at random generally differ from one another. o Each draw of random numbers selects different people for our sample. o These differences lead to different values for the variables we measure. o We call these sample-to-sample differences sampling variability. 3 of 20 Ch 11 – 13 Gathering Data Example: Suppose your local school district decides to randomly test high school students for attention deficit disorder (ADD). There are three high schools in the district, each with grades 9-12. The school board pools all of the students together and randomly samples 250 students. Is this a simple random sample? A.Yes, because the students were chosen at random. B.Yes, because each student is equally likely to be chosen. C.Yes, because they could have chosen any sample of 250 students from throughout the district. D.No, because we can’t guarantee that there are students from each school in the sample. E.No, because we can’t guarantee that there are students from each grade in the sample. Stratified Random Sampling - the population is first divided into homogeneous groups, called strata, then take SRS within each stratum before the results are combined. - Stratified random sampling can reduce bias. - Stratifying can also reduce the variability of our results. 4 of 20 Ch 11 – 13 Gathering Data - Ex: Suppose you want to estimate the proportion of Canadians that support federal party X based on an appropriate representation from each province. o You could break up the population by province and select a SRS from each province Example: Suppose the state decides to randomly test high school wrestlers for steroid use. There are 16 teams in the league, and each team has 20 wrestlers. State investigators plan to test 32 of these athletes by randomly choosing two wrestlers from each team. Is this a simple random sample? A. Yes, because the wrestlers were chosen at random. B. Yes, because each wrestler is equally likely to be chosen. C. Yes, because stratified samples are a type of simple random sample. D. No, because not all possible groups of 32 wrestlers could have been the sample. E. No, because a random sample of teams was not first chosen. 5 of 20 Ch 11 – 13 Gathering Data Example: The January 2005 Gallup Youth Survey telephoned a random sample of 1,028 U.S. teens aged 13-17 and asked these teens to name their favorite movie from 2004. Napoleon Dynamite had the highest percentage with 8% of teens ranking it as their favorite movie. Which is true? I. The population of interest is U.S. teens aged 13-17. II.8% is a statistic and not the actual percentage of all U.S. teens who would rank this movie as their favorite. III. This sampling design should provide a reasonably accurate estimate of the actual percentage of all U.S. teens who would rank this movie as their favorite. A. I only B. II only C. III only D. I II, and III Systematic Random Samples - Start the systematic selection from a randomly selected individual, then sample every k person. 6 of 20 Ch 11 – 13 Gathering Data - When there is no reason to believe that the order of the list could be associated in any way with the responses sought, systematic sampling can give a representative sample. - Systematic sampling can be much less expensive than true random sampling. - Ex: Suppose you want to estimate the proportion of - individuals that supports federal party X at your area. You can set up a booth at your area and ask every 50 person your question. Cluster Random Sampling - Splitting the population into similar groups (or clusters), select one or a few clusters at random and perform a census within each of them. o This sampling design is called cluster sampling. o If each cluster fairly represents the full population, cluster sampling will give us an unbiased sample. - Cluster sampling is not the same as stratified sampling. Example: A statistics teacher wants to know how her students feel about an introductory statistics course. She decides to administer a 7 of 20 Ch 11 – 13 Gathering Data survey to a random sample of students taking the course. She has several sampling plans to choose from. Name the sampling strategy in each. a. There are four levels of students taking the class: first year, second year, 3 year, and 4 year. Randomly select 15 students from each level. Stratified Random Sample rd th b. Randomly select a level (first year, second year, 3 year, and 4 year) and survey every student in that level. Cluster Random Sample c. Each student has a seven-digit student number. Randomly choose 60 numbers. Simple Random Sample d. Using the class roster, select every fifth student from the list. Systematic Random Sample Example: You want to determine the proportion of university students that have “jobs” while attending school. 8 of 20 Ch 11 – 13 Gathering Data 1) You have a complete list of students. SRS (Simple random sample) 2) You have a complete list of students and want to make sure each faculty is appropriately represented. Stratified Random Sample. Select a SRS from each faculty. 3) You do not have a complete list of students, but believe that at any given time the group of students in each classroom across campus all individually form a representative sample of the entire student body. Cluster Sample. Select a SRS of classrooms, then go to each selected classroom and sample everyone. 4) You do not have a complete list of students. You believe that students walking in front of the Registrar’s building throughout the day are a good representation of the entire student body. Because of the massive number of students walking past your booth you can’t sample everyone. Systematic Random Sample. Sample every 20th person that walks by. 9 of 20 Ch 11 – 13 Gathering Data Recall: Bias is the tendency for a sample to differ from the corresponding population in some systematic way. Sources of Bias: 1)Selection Bias (Undercoverage) - When some portion of the population is not sampled at all or has a smaller representation in the sample than it has in the population. - Usually the people that are not covered differ from the rest of the population, so bias exists. o Ex: a sample survey of households will miss not only homeless people but prison inmates. o Ex: an opinion poll conducted by telephone will miss the households without residential phones. 2)Response Bias: refers to anything in the survey design that influences the responses. o Ex: respondents may lie, especially if asked about illegal or unpopular behavior. o Ex: Smoking is very unhealthy, do you like smoking? 3)Nonresponse bias: A common and serious potential source of bias for most surveys.
More Less

Related notes for STAT151

Log In


OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit