Class Notes
(809,049)

Canada
(493,506)

University of Toronto Scarborough
(30,819)

Statistics
(266)

STAB22H3
(208)

Ken Butler
(34)

Lecture 11

# STAB22-LEC11-(12).docx

Unlock Document

University of Toronto Scarborough

Statistics

STAB22H3

Ken Butler

Fall

Description

STAB22 - LEC11
SAMPLE SURVEYS: CHAPTER 12
[137]
CHAPTER 12: SAMPLE SURVEYS
What would happen if we tried to survey everybody?
- census = special survey where every single person in popn is surveyed
- disadvantages:
- too long to complete,
- costly
(ex) you are doing a survey on phone: imagine paying your workers for the 10,000+
calls in order to make a call to every single person in popn
How are sample surveys done?
- they do NOT try to get info from every single unit of their popn, but instead they
choose a subset of individuals from it
- goal is to have survey make results available quickly, and this can't possibly be done
when surveying EVERY SINGLE person in popn
Note
- claimed to be accurate "to within 3 percentage points 19 times out of 20"
- this will be explained later
[138]
EXAMINE A PART OF THE WHOLE What does population mean in this context?
- POPULATION = every1/eth we want to investigate
- it doesn't even have to be human (ex. could be breakfast cereals, animals, cars)
- whole set of things we are interested in knowing sth about
(ex) highway tolls
- popn is popn of Ontario
What is expected of the sample that we get from the popn?
- that it represents the popn
- ie. it is like the popn in all the impt ways
Sample
- this is a fraction of a whole (ie. popn) that we are examining
(ex) Highway tolls
- have to be wise wrt how we get our sample
- ex. a radio-call-in poll is about highway tolls: question asks: should the tolls be
extended to other highways as well?
- the responses retrieved are likely those who care about this issue, and/or have
strong opinions against it or with it.
- problem: our results will be biased b/c it will under-represent the part of the
popn that do not "care", or did not bother participating.
- results also over-represent ppl who have strong opinions to one side (b/c
these are the type of ppl that are willing to give their opinion) - but altho we will get a lot of ppl with strong opinions about this popn, this is not likely
to rep. popn as a whole
BIASED - sample that over-represents, or under-represents some portion of popn
- cannot trust conclusions derived from biased samples
Representative
- sample is representative if its like the popn in all impt ways
(ex) in the highway toll ex., sample will probably not be rep'ive b/c ppl who don't care to
give their opinions won't call, but those who do will have strong views towards this issue
Way to look at "biased" issue
- were trying to get pic of whole popn, but if sample is biased, then will have some
portions of popn that're missed out
=> want to select sample in such a manner that it rep. the popn
[139]
HOW MIGHT WE SELECT A REPRESENTATIVE SAMPLE?
Two possible ways we can do it:
1. "Matching"
- Retrieve a sample from carefully picked individuals that match popn in every way that
we think is possible
(ex) takes into acc't
- M's and F's
- approp.
- mix of ages
- # of ppl living in each city/rural area - mix of political opinions
- trying to think of all possible ways that popn from sample is diff, with the intent of
making the sample be as similar as posible to popn
- the etc. etc. suppose these are the other things that we did not take into acc't
- these might end up making a signif. diff. b/ween popn and sample
- may end up getting biased results
- ex. if we did not match popn w/ mix of males and females, then may get biased
results
2. "Randomization"
- more easier to do
- approximately represents popn in every way, even those ways that we did not think of
[140]
WHY DOES RANDOMIZATION WORK?
Two key aspects of randomization
- short-term unpredictable
- long-term predictable
(ex) coin toss
- if we do once, we cannot tell whether it will end up landing on heads or tails, but if we
do it numerous times, we can figure out that it roughly lands on heads, or tails 50% of
the time The randomization aspect is taken into acc't with random sampling
-- cannot predict which individual ppl will end up in sample, but I can tell that there
will be approx. right mix of m/f, young old, and eth else, just by virtue of randomness
- given we have large sample, we will get approx. approp. %'s of individuals
that characterize the popn, even in ways we may not have thought of
[141]
DO YOU HAVE ENOUGH NOODLES IN YOUR SOUP?
- were using this as an analogy to the randomization process (I highlighted the key
analogies, not all)
"Stir soup, and take a scoop"
- what we are doing here is randomizing, then taking a sample
"Doesn't matter how MUCH soup you are cooking, but what matters is how large the
scoop is"
- ie. popn size does NOT matter, but what matters is how BIG the size of the
SAMPLE is
[142]
THREE KEY [ASPECTS] FOR SAMPLING
1.
EXAMINE A PORTION OF THE WHOLE
= sample 2.
RANDOMIZE
- use some protocol of randomization to retrieve sample
3. SAMPLE SIZE
- this matters, NOT popn size.
- lim. of how big sample size is is how big popn is
[144] - (will return to [143] after this one)
HOW CAN WE DRAW A SAMPLE?
SRS (SIMPLE RANDOM SAMPLING)
- make a list of ALL the individuals in the popn, assign them some "tag" to mark them
with (ex. an assigned #), and pick numbers at random
- this can be done using a machine like this:
(ex) SRS
- putting 5 diff. coloured balls in the hat (one ball corresponds to each member of popn),
and randomly picking out 3 (sample)
Aspects of SRS
- equal likelihood of every member of popn being selected
- one's chances of being selected do not dep. on other individuals that are in
sample
- every possible sample that we can possibly retrieve from doing this is equally likely. - SAMPLING FRAME = list of WHOLE popn
(ex) using this lottery device
- each lottery ball w/in this cage corresponds to one memb. of popn
- if a given member is selected, that doesn't give hint as to whether or not another
member will come out
- ex. if ball 24 comes out, that doesn't tell me ath about 25 going to be the next
one to come (its random)
- won't get duplicate #'s (one per member)
- we are using this device to draw out balls until we get a sample, then we find
out ppl who correspond to those balls
[143]
POPULATIONS AND PARAMETERS, SAMPLES AND STATISTICS (Ex) Suppose we want to find what prop. of popn of GTA are in favour of highway
tolls
We want to know: out of all ppl in GTA, how many of them are in favour
p p-hat
= population parameter = sample statistic
- val. we get from asking EVERY single - val. we get from sample that we HOPE is
person of popn rep'ive of popn
- this is relatively more practical to get
- popn paramter is what we do want to
know, but generally speaking, it is => we are hoping that sample statistic is
impossible to work out what it is is; only closely related to popn paramter
way to figure out is to do census
- this is not what we want, rather we are
(way too expensive, trying to get this to be as close to p as
way too long) possible
(Ex) retrieving the sample statistic
- take sample of 100 ppl, then can work out how many plp in sample are in favour of
highway tolls - sample statistic; can be worked out from sample
SRS
- if we randomly draw out sample, then we can try to work out how close sample
statistic should be to popn paramter, despite us not knowing exactly what the popn
parameter is
What do we do in SRS, in general?
- take all members of popn, stick them in randomizing device, and then get them out
one by one Some concerns with SRS
- in order to make it work, have to make a list of every single member of popn
(ex) can be tedious if popn is massive, like for ppl of Ontario
- b/c what we want to do after we get every single member of popn is that we want to
assign a label to every single member, and draw out them randomly to be members of
our sample
- hard to do
[145]
DRAWING A SIMPLE RANDOM SAMPLE (SRS) USING RANDOM DIGIT TABLE
Example
- 80 students

More
Less
Related notes for STAB22H3