Ch 11 – 13 Gathering Data
Ch 11 – 13 Gathering Data
Does a Census Make Sense?
- Census is a special sample that includes everyone and
“sample” the entire population.
- There are problems with taking a census:
Recall: Populations and Parameters
- A parameter that is part of a model for a population is called a
population parameter (often unknown)
- A summary that is found from data in a sample are called
sample statistics (are known once the data are observed)
Two Types of Conclusions (Inferences)
- Results from the sample can be generalized to an entire
population (as estimates)
1 of 20 Ch 11 – 13 Gathering Data
b)Causal (cause-and-effect) Inference
- The difference in the responses is caused by the difference
in treatments when comparing the results from two
1)We should only make population inferences when we have
random sampling (ie. randomly select individuals in the
samples from the population).
o Randomizing protects us from the influences of all the
features of our population, even ones that we may not
have thought about.
Randomizing makes sure that on the average the
sample looks like the rest of the population.
o Non-random sampling leads to biased results (results that
tend to over- or under- emphasize some characteristics of
There is usually no way to fix a biased sample and
no way to salvage useful information from it.
The best way to avoid bias is to select individuals
for the sample at random.
2 of 20 Ch 11 – 13 Gathering Data
o When we do not have random sampling from a
population, conclusions should be restricted to the
sample. That is, we should not generalize our results in
the sample to anyone else.
Random Sampling Methods
1)Simple Random Samples (SRS)
- SRS of size n: each sample of size n in the population has the
same chance of being selected.
o Ex: put all the names of the individuals in the population
in a hat and draw names to complete the sample
o Ex: random number tables
- Samples drawn at random generally differ from one another.
o Each draw of random numbers selects different people
for our sample.
o These differences lead to different values for the
variables we measure.
o We call these sample-to-sample differences sampling
3 of 20 Ch 11 – 13 Gathering Data
Suppose your local school district decides to randomly test high
school students for attention deficit disorder (ADD). There are
three high schools in the district, each with grades 9-12. The school
board pools all of the students together and randomly samples 250
students. Is this a simple random sample?
A.Yes, because the students were chosen at random.
B.Yes, because each student is equally likely to be chosen.
C.Yes, because they could have chosen any sample of 250
students from throughout the district.
D.No, because we can’t guarantee that there are students from
each school in the sample.
E.No, because we can’t guarantee that there are students from
each grade in the sample.
Stratified Random Sampling
- the population is first divided into homogeneous groups,
called strata, then take SRS within each stratum before the
results are combined.
- Stratified random sampling can reduce bias.
- Stratifying can also reduce the variability of our results.
4 of 20 Ch 11 – 13 Gathering Data
- Ex: Suppose you want to estimate the proportion of
Canadians that support federal party X based on an
appropriate representation from each province.
o You could break up the population by province and select
a SRS from each province
Example: Suppose the state decides to randomly test high school
wrestlers for steroid use. There are 16 teams in the league, and
each team has 20 wrestlers. State investigators plan to test 32 of
these athletes by randomly choosing two wrestlers from each team.
Is this a simple random sample?
A. Yes, because the wrestlers were chosen at random.
B. Yes, because each wrestler is equally likely to be chosen.
C. Yes, because stratified samples are a type of simple random
D. No, because not all possible groups of 32 wrestlers could
have been the sample.
E. No, because a random sample of teams was not first chosen.
5 of 20 Ch 11 – 13 Gathering Data
The January 2005 Gallup Youth Survey telephoned a random
sample of 1,028 U.S. teens aged 13-17 and asked these teens to
name their favorite movie from 2004. Napoleon Dynamite had the
highest percentage with 8% of teens ranking it as their favorite
movie. Which is true?
I. The population of interest is U.S. teens aged 13-17.
II.8% is a statistic and not the actual percentage of all U.S. teens
who would rank this movie as their favorite.
III. This sampling design should provide a reasonably accurate
estimate of the actual percentage of all U.S. teens who
would rank this movie as their favorite.
A. I only
B. II only
C. III only
D. I II, and III
Systematic Random Samples
- Start the systematic selection from a randomly selected
individual, then sample every k person.
6 of 20 Ch 11 – 13 Gathering Data
- When there is no reason to believe that the order of the list
could be associated in any way with the responses sought,
systematic sampling can give a representative sample.
- Systematic sampling can be much less expensive than true
- Ex: Suppose you want to estimate the proportion of
- individuals that supports federal party X at your area. You
can set up a booth at your area and ask every 50 person your
Cluster Random Sampling
- Splitting the population into similar groups (or clusters),
select one or a few clusters at random and perform a census
within each of them.
o This sampling design is called cluster sampling.
o If each cluster fairly represents the full population, cluster
sampling will give us an unbiased sample.
- Cluster sampling is not the same as stratified sampling.
Example: A statistics teacher wants to know how her students feel
about an introductory statistics course. She decides to administer a
7 of 20 Ch 11 – 13 Gathering Data
survey to a random sample of students taking the course. She has
several sampling plans to choose from. Name the sampling strategy
a. There are four levels of students taking the class: first year,
second year, 3 year, and 4 year. Randomly select 15 students
from each level.
Stratified Random Sample
b. Randomly select a level (first year, second year, 3 year, and 4
year) and survey every student in that level.
Cluster Random Sample
c. Each student has a seven-digit student number. Randomly choose
Simple Random Sample
d. Using the class roster, select every fifth student from the list.
Systematic Random Sample
Example: You want to determine the proportion of university
students that have “jobs” while attending school.
8 of 20 Ch 11 – 13 Gathering Data
1) You have a complete list of students.
SRS (Simple random sample)
2) You have a complete list of students and want to make sure each
faculty is appropriately represented.
Stratified Random Sample. Select a SRS from each faculty.
3) You do not have a complete list of students, but believe that at
any given time the group of students in each classroom across
campus all individually form a representative sample of the entire
Cluster Sample. Select a SRS of classrooms, then go to each
selected classroom and sample everyone.
4) You do not have a complete list of students. You believe that
students walking in front of the Registrar’s building throughout the
day are a good representation of the entire student body. Because of
the massive number of students walking past your booth you can’t
Systematic Random Sample. Sample every 20th person that walks
9 of 20 Ch 11 – 13 Gathering Data
Recall: Bias is the tendency for a sample to differ from the
corresponding population in some systematic way.
Sources of Bias:
1)Selection Bias (Undercoverage)
- When some portion of the population is not sampled at all or
has a smaller representation in the sample than it has in the
- Usually the people that are not covered differ from the rest of
the population, so bias exists.
o Ex: a sample survey of households will miss not only
homeless people but prison inmates.
o Ex: an opinion poll conducted by telephone will miss the
households without residential phones.
2)Response Bias: refers to anything in the survey design that
influences the responses.
o Ex: respondents may lie, especially if asked about illegal
or unpopular behavior.
o Ex: Smoking is very unhealthy, do you like smoking?
3)Nonresponse bias: A common and serious potential source of
bias for most surveys.