1/21/2013 4:00:00 PM
CHAPTER 6- SURVEYS & SAMPLING
Common in media (see BBC story)
The story: ―Motorists turn to public transport as fuel price bites‖ – Daily Record
―MORE than three in five drivers are turning to public transport due to high fuel
prices, a survey has revealed. The survey by transport firm National Express
found 61 per cent of car users are definitely or probably considering using public
transport due to the rise in prices at the pumps‖
o what context was it asked in were they asked immediately after paying
for their gas
o was it representative – how big was the sample etc.
o ―61% probably considering‖ … what does this really tell us, doesn‘t say
they actually are going to use public transport
Involve the use of self-report measured variables in descriptive research
o (more often than not descriptive research, can be used for any type of
research but most commonly used for descriptive research)
Can be used to collect either qualitative data or quantitative data
Steps in developing surveys
o Types of questions to use
o Type of instrument to use (mail/ internet, phone, person-to-person)
2) Pilot test/ seek opinions from others
3) Work Out
o What demographic info to collect
o Administration procedures/ instructions
Types of questions
A) Fixed-format questions
o Forced alternatives
alternatives (forcing them to choose between 2 or more options –
you are guaranteed to get a response from anyone, you may not
feel comfortable choosing either of these but you are forced to
choose one so you get a response but the tradeoff is that you are forcing people into a category that they may not belong to) –
Many things control me
Little in this world controls me
o Multiple choice
o Likert scales
(i.e. 1-2-3-4-5 OR very little – somewhat – a lot)
B) Free-format questions
Guide: Survey construction
1) Simple and direct
My overall feelings and thoughts about myself are predominantly
favorable most of the time, leading me to feel pretty satisfied
about who I am.
this could be more direct, ―on the whole, I am satisfied
with myself‖, the researcher writes the question the other
way because they want to make sure it covers everything
and prime the reader but everyone can understand this
question, you need to play a balancing game and be more
Have you ever suffered from auditory hallucinations?
This is a very direct question, depending on who you are
asking auditory hallucinations may not be simple enough,
not in everyone‘s vocabulary, ―have you ever heard voices
or sounds that might not have been real?‖ 2) Double-barreled questions
o ―Do you believe that airbags are unsafe and expensive?‖
it is a yes or no question, but there is two questions and they
may have different responses for each question, should be asking
these as two separate questions
3) Avoid loaded or leading questions
o Given the failure of welfare in the United States, do you feel welfare
programs should be eliminated?
(scale 1-5, not at all – very much so)
You have explicitly stated that they are failing so it would
be harder for them to say they shouldn‘t be eliminated you
have created a context around them and this is going to
cause a bias
the research may want a loaded question but you need to
understand the effects of this.
o Do you agree?
(1) ―A freeze in nuclear weapons should be opposed because it
would do nothing to reduce the danger of thousands of nuclear
weapons already in place and would leave the Soviet Union in a
position of nuclear superiority.‖
(2) ―A freeze in nuclear weapons should be favored because it
would begin a much-needed process to stop everyone in the world
from building nuclear weapons now and reduce the possibility of
nuclear war in the future.‖
58% agreed with (1), 56% agreed with (2), and 27% agreed with
they are not agreeing with the actual question they are
agreeing with the context around them
4) Avoid negative wording
I don‘t dislike eating ice cream every now and then.
―I like to eat ice cream every now and then‖
Ask questions in the way people are used to
5) Manage context
Sequencing questions: To boost response rate..
o put innocuous questions first, personal questions last
To increase accuracy..
o keep similar questions together
generally it improves the accuracy of the responses, it helps
people think more clearly because they don‘t have to remember all
the things they were thinking about prior
To increase honesty..
o collect subjects ‗signature‘ first
if you ask people for their signature first, rather than last it
makes people be more honest
Fixed- vs. free-format
Schuman & Scott (1987)
o Condition 1
Identify the most important problem facing America today.
the energy shortage
quality of public schools 32%
o Condition 2
Identify the most important problem facing America today.
only 1% mentioned education!
Frequency-scale choices can influence which answers are chosen
the options you choose will affect people‘s responses (frequency scale)
Q: How much TV do you watch per day? o Some people don‘t even look at the numbers they just think ―I
watch TV less than most people, or I am relative for the population so I
am going to choose the middle one, or the lowest one…‖
Effects of question order
The questions surrounding the question will also effect responses
―How satisfied are you with your life overall ? ‖
o Question 1:
How satisfied are you with your relationship?
people are reminded of their great relationship and they
are thinking life is good, or they are reminded of their
negative relationships and are in a negative frame of mind
and are disappointed with their life
frames their response to the next question…
o Question 2:
How satisfied are you with your life overall?
Importance of context
Schwartz & Clore: Contextual factors influence survey responses o Generally speaking when you ask people how satisfied with their life, if it
is sunny outside people will report higher satisfaction with their life than
when it isn‘t sunny
o How do you deal with context, if you remind people of the context ask
people of the weather and it negates the effects, remind them that maybe
they are just upset because of the weather – make people aware of their
Proper interpretation of surveys requires knowledge of the context
o Not only obvious mistakes: leading questions, double-barreled, etc.
o Also: items that surround a specific question, order of questions, anything
else that alters what comes to mind
more subtle things that can have an impact on what people say,
any context that will alter what comes to people‘s minds when they
are filling out the survey, you need to be aware of the contextual
influences are if you want to take the most away from it
Populations vs. samples
Does advertising causing children to smoke?
o Limitation you can only generalize to children who have never smoked,
and ultimately we want to know what made those children that do smoke
start smoking so we are missing the big picture, if we are trying to make a
conclusion about children that is our population we need to have a sample
representative of all children not just one subgroup of children (those that
have never smoked)
Do art exhibits affect tourism?
o It is not a useless piece of information, but it limits the information we
cannot generalize to all tourists in Chicago or all tourists everywhere
Representativeness of the sample
o How accurately can we generalize from a sample to the population
o Sample size (reliability)
large enough to be able to overcome any kind of noise the sample size you need in order to be able to generalize to a
population is independent from the population size (sample size we
need is largely unrelated to the population size)
o Biased sample
picking people who do not have the attributes of the population
o Whenever samples are used, the researcher will never be able to know
exactly the true characteristics of the population.
(this is the major limitation)
ultimately you will never know the true characteristics, our goal is
for our sample to be identical to the population but smaller – but
this is not possible
o Approximately the same as the population in every important respect
1) The existence of one or more sampling frames listing the entire
population of interest and
2) All selected individuals must actually be sampled
o Sampling bias
There is the potential the sample is not representative of the
Occurs when either of these conditions is not met
o Probability sampling
Know about every person in population
Can specify the population
Each person in the population have a known chance
o Nonprobability sampling
Population is not completely known
A) Probability sampling
o Know about every person in population
can specify the probability that any member of the population will
be included in the sample
best way to make inferences about a population Each person in the population has a known chance of being
1) Simple random sampling
Each person in the population has an equal chance of
A complete list of all of the people in the population
The importance of random selection!!!!
it is much smarter than you are, it can save you!!
2) Systematic random sampling (obtaining randomness)
If the list of names on the sampling frame is known to be
in a random sequence, every n name can be selected
Tables of random numbers
Role dice/flip coins
Computer generated random numbers
3) Stratified sampling
Involves drawing separate samples from a set of known
subgroups called strata rather than sampling from the
population as a whole
E.g., What is the ideal class size for university classes?
may differ by year, course content, etc.
The year people are in may affect their answers so
we may want to randomly sample some students
from first year, some students from second year etc.
so we know we have representation from all the
categories and it is most representative of the
population of the University students.
Disproportionate stratified sampling
Frequently used when the strata differ in size and
the researcher is interested in comparing the
characteristics of the strata
If we know there are more 1 styears than 4 thyears
and we want our sample strata‘s to be the same as
the population so that it looks the most like the
Drawing a sample that includes a larger proportion of
some strata than they are actually represented in the
If there is one strata that is under-represented
and we want them to be equally represented in the
results we might draw more from their strata
4) Cluster sampling
Can be used where a complete sampling frame does not
Breaks the population into a set of smaller groups (called
clusters ) for which there are sampling frames, and
randomly chooses some of the c we are looking a smaller
group that we know a lot about that is probably reflective of
the larger population, maybe for that smaller group we can
come up with this complete sampling frame
We are interested in all undergraduate students in
o how do we get a list of all these students in
Canada (ethics clearance from all the
institutions etc… very hard to do), each
University is really a cluster that is similar to the overall population, so sample the cluster
and generalize it to the larger population
To the extent that every cluster has similar attributes, the
sample will be representative
If clusters are not all the same, however, than the
sample will be biased
B) Nonprobability sampling
o Sampling procedure in which one cannot specify the probability that
any member of the population will be included in the sample
(population is not completely known)
Accidental or convenience sample
Introduces biases – big problem when people select
themselves to be part of the survey (return a magazine
survey, for example)
Used when the population of interest is rare or difficult to reach
One or more individuals from the population are contacted
These individuals lead the researcher to other population members
Summarizing Survey Data
Histogram of scores Summarizing the sample data
o Raw data
The data collected must be transformed to be meaningfully
interpreted using such techniques as:
Tables, histograms, grouped frequency distributions,
stem and leaf plots, etc.
Central tendency (mean, median, mode) and
dispersion (range, variance/standard deviation)
Central tendency: Mean
Central tendency: Median Central tendency: Mode
o Data distributions that are shaped like a bell are known as normal
mean, median, and mode all at the same point on the distribution.
extreme scores in a distribution
o Skewed distributions that are not symmetrical
positively or negatively skewed
Shapes of distributions
o positively skewed the outliers are on the right side of the distribution.
negatively skewed the outliers are on the left side of the distribution.
Measures of dispersion
Extent to which the scores are tightly clustered around or spread
out away from the central tendency
o Summarized using the
o Range is calculated based on only two data points
(i.e., the smallest and largest) o It would be better to use all data points (i.e., every participants score) to
Why not calculate the average difference of each person from the
Dispersion: Average distance from mean
CHAPTER 7-NATURALISTIC RESEARCH
o very useful research but has many limits.
1) Observational research
2) Archival research
Naturalistic research: Designed to describe and measure the thoughts, feelings, and behaviours of
people and animals _______________ . _________________________
Observational research: Jane Goodall
Jane Goodall‘s research with chimps
o Observed & recorded behavior of chimps for many years
o One of first to record tool use in non-humans
Observational research designs
1) Acknowledged vs. unacknowledged
2) Participant vs. observer
Guidelines for systematic observation:
o Which people, times, places
o Behavioural categories
o Frequency of behaviour, timing, accuracy
o Event sampling
o Individual sampling
o Time sampling
Analysis of existing data sources:
o Statistical records
Daily temps, sports records, crime data
o Survey archives
o Written and mass communications
Newspapers, Internet bulletin boards
Individual cases o Sometimes studied in naturalistic settings.
o Sometimes studied outside of naturalistic settings
Often brought into clinical setting for in-depth assessment
o Brain damage left him with change in personality and deficits in
Key indication that specific parts of the brain are associated with
ffects of sleep deprivation
o Peter Tripp
o Randy Gardner
Multiple memory systems
o In 1980‘s researchers interested in how self concepts stored in the brain
I‘m open minded, I‘m a fast runner
I tried eating oysters even though they look disgusting, I
won a foot race last week
o Following motorcycle accident, patient KC lost autobiographical memory,
but trait memory was spared.
anecdote vs case study:
what is the difference? What makes a case study scientific?
CHAPTER 8-HYPOTHESIS TESTING
The problem: Why stats are necessary to test hypotheses Quick review of hypothesis testing (video)
Slow review of hypothesis testing
Samples and populations
Reserach findings are based on samples drawn from populations
o allows us to take the data from the sample, and make conclusions
about the population.
Two group means
Research question: Are there sex-related differences in alcohol consumption?
o ask samples of males and females about number of drinks consumed
during last week
measures of central tendency
Mean, Median, Mode
o How do the means of our samples compare?
ex. did men or women on average assume more alcohol?
Measure of dispersion Range (R = max – min)
o how wide is the distribution?
Variance, standard deviation
o how much variance from subject to subject?
o based on this data:
... can we conclude that on average men consume more that
women at the population level?
NO! not enough information.
measure the entire population to know for sure.
why we need inferential statistics!
the variance in data from samples can obscure our results.
There is variance in our sample, which implies that there is variance in our
o s ≈ population variance
sample variance is relatively equal to or population variance
(it is a good estimate)
What if we drew two new samples?
o We would likely get two new means
Would the means still be different?
o Do males and females differ?
Results: average number of drinks consumed
males = 2.5 females = 1.3
Because means calculated on samples (not populations) difference
between means may have happened by chance.
Final goal is to say, for example: Men consume different levels of
alcohol than women, p < .05
-meaning, there is less than 5% chance that the difference
we saw between these means was purely by random
VIDEO: hypothesis testing:
o 1) hypothesis:
-Null hypotheiss (disprove)
-Alternative hypothesis (what your trying to prove)
-always about the population perimeters
-2 tailed test > can go in either direction (ex. = )
-1 tailed test > can go in one direction ( < )
o 2) signifigance:
-0.05 level of significgance
-Type 1 error: probability u will say null is wrong when it is correct.
o 3) sample
o 4) P-value
-calculate appropriate p-value
-if p-value is less than level of signifigance > reject null hypothesis
(using the p value to infer about the population means from
o 5) decide
Samples and populations
Research findings are based on samples drawn from populations
o allow us to infer what the population is like, based on sample data
Do the differences we see in samples reflect differences in the
population, or just random error?
Hypothesis: Frequent use of steroids is associated with lower than normal IQ o IQ tests are designed to have a mean of 100, and standard deviation of
15 in the general population.
o Method Ask a sample of 20 steroid users to complete an IQ test.
o Results Mean IQ was 90.
Would you expect this big a difference based on chance alone?
Assessing randomness of sample statistics
How much will the sample mean vary from one sample to the next?
o -IQ doesn't just vary person to person, it also varies sample to sample
Need to determine the sampling distribution
o distribution of a given statistic (e.g. mean) over repeated sampling
from a population.
o collect samples from the population
o calculate the mean for each sample
o plot the means – distribution of sample means
o -thanks to statistics, we know what the sampling distribution will look like
based on the population mean and variance.
o if we know about the population were studying (mean, SD)
use this tool to figure out what sampling distribution.
Sampling distribution of the mean
o we know IQ has a mean of 100, SD of 15.
o single sample:
has mean slightly off from the population mean (random error)
o how much would we expect means to vary?
estimate sampling distribution!
tells us if the single sample is reflective of the population.
(1) the sampling distribution has the same mean as the original distribution.
o (all the random errors cancel out)
(2) the standard deviation of the sample distribution is smaller than the standard
deviation of the population.
o (standard error of the mean > SD of the sampling distribution)
increase the number of people in the sample (sample size).. the distribution will shrink inwards (get smaller), because it
is less effected by the random variance (ex. outliers)
(3) as the sample size becomes larger, the shape of the distribution approaches
a normal distribution.
Central limit theorem
Given a population with mean μ and variance σ , the sampling distribution of the
mean for sample sizes of N will:
o Have sample mean equal to population mean
x = μ
o have variance equal to variance / sample size.
σ = σ /N
standard error (std. deviation of the sampling distribution)
o approach the normal distribution as..
sample size (N) increases.
Sampled 20 people. What types of sample means could we expect if we were
sampling from the general population?
Difference by chance? o
o Very few instances when we would get a mean of 90 if sampling form the
(difference is unlikely due simply to chance)
Frequency: Counts vs. proportions
o Take sample of n=50 people from general population & record IQ
What proportion of sample has an IQ less than 80?
Add up all bars left of 80, and divide by N.
We can convert the histogram to proportions of sample.
Now sum of all of the bars = 1.
(if it was counts, the sum would add up to the
o gives us a normal curve
area under the curve = 1
Ex. within 1 standard deviation of the mean = 2/3rds of the data
Difference by chance? o
Area under the curve to the left f 90 is much less than .05, so the
probability that our observed difference (90 vs. 100) happened by
chance is p< .05
This is significant, less than 5% chance we would see this
difference purely form sampling error.
Summary: IQ/steroid example
o Is IQ of steroid users less than IQ of general population?
Collected sample and found μ Steroid 90, which is less than μ General
o Do steroid users have lower IQ, or does difference simply reflect sampling
o Assume there is no difference and generate the sampling distribution
o Determine likelihood of getting observed difference (or larger), simply by
Likelihood < 5%, so reject null hypothesis.
o we knew the population variance (15 ), and so we..
computed MEAN of our sample
compared against sampling distribution of the MEAN
o If you don‘t know population variance, and want to compare two means,
need slight adjustment.
compute T-STATISTIC base on your samples.
compared against sampling distribution of T-STATISTIC
If you are comparing multiple groups simultaneously
o compute F-STATISTIC base on your samples.
o compared against sampling distribution of F-STATISTIC.
The null hypothesis Assumes observed data do not differ from what would be expected on the
basis of chance
o null hypothesis = H
o Alternate hypothesis= H
To reject null hypothesis,
o observed data must deviate more than what would normally be
expected under the sampling distribution.
p and (Alpha)
We assume that all observations come from same parent population
p = probability that observed difference could have occurred simply by chance.
o (likely hood we'd see alpha by chance)
= the arbitrary threshold at which the investigator is willing to discount the role
of chance as an explanation for an observed group difference
o (ex. alpha=.05)
When p ≤ alpha,
o chance could not be the explanation of the observed group difference.
1) systematic error
2) true difference in the population
(something we conclude)
The observed difference in IQ between steroid users and the
general population is statistically significant (p < .05).
When p > alpha,
o chance could not be ruled out as the explanation of the difference.
The observed difference was not statistically significant (p >
.05). We don’t know whether there is an IQ difference or not.
tell us how likely we would see the data, if our hypothesis is INCORRECT.
o (probability we would see the data by chance)
o Used to test research hypotheses
o Two-sided p-values take into consideration that unusual outcomes may
occur in more than one way (―prediction free test‖)
o Can be used in some special cases only one direction (ex. men drink more than women.)
ex. do men drink more than women on average.
o men drink 1 more on average..
-is this a likely conclusion due to purely chance?
o 5% of the graph can exceed the difference you found.
less than 5% of a chance of seeing a larger difference in one
direction. (can't account the other direction to chance)
o split the "5%" chance into 2.5% on both sides.
this pushes the cut-off a bit.
(need to find a slightly larger difference)
One-tailed vs. two-tailed tests
both tell us whether the effect is ignificant or not.
Use a one-tailed test..
o when you have a specific reason to believe the effect will be in a
particular direction, and you do not care if the effect is in the opposite
One-tailed tests will always result in smaller p values, giving a
greater chance of reaching significance for your directional
hypothesis. (have more power) Otherwise..
o use a two-tailed test.
The decision of whether to perform one-tailed or two-tailed tests must be made
prior to data collection.
o or everyone would know what tail to target, and would do a one-tailed
test (more powerful)
Type I error
Occurs when we reject the null hypothesis when it is true
o (saying there is a difference when there really isn't)
Likelihood is set to alpha (.05 usually)
(probability of making a type 1 error is the P-value)
5% is reasonably low probability of being wrong, could set lower.
Type II error
Incorrectly accepting the null hypothesis when there really is a difference
o (saying there isn't a difference, when there really is)
The probability of a type to error is given the label Beta ()
refers to Type II error (as alpha refers to Type I error)
How sure are we of our d