May 27 th
Sampling
Introduction to probility sampling
What is probability sampling? – sampling techniques based on random selection
Why do it? – several reasons
o 1. Bias in samples
This exists when some members of the population have a greater chance of
being selected into the sample than other members of the population
i.e. standing on a street corner handing out surverys in an affluent part of the
city – all the poor people will not have a chance to be selected which will result
in sample bias
o 2. Representative sample – way to avoid bias
Sample is representative of the population if the characteristics of the sample
closely approximate the characteristics of the pop.
i.e. if this is our population, the sample will be a miniature version of the larger
population
needs to be a ‘miniature version’ of larger pop.
it is impossible to have the sample be representative of the larger pop in every
aspect
but even though it is not necessary for the sample to rep pop in every way, it
must rep the key variables we are dealing with
i.e. we are doing a study on the political opinions of male and female students
at mcmaster
this makes gender an important variable
i.e. is pop is 60% m and 40% f, the sample must be the same
o 3. Random selection of samples – in order to get rep samples, we need random
selection - no specific way to get selected
Randomness is important as it is the only way to make sure everyone has an
equal chance in getting selected in the sample
i.e study of political opinions of males and females – how to get a random
sample?
We will need to get a list of all the students currently enrolled at mcmaster
Key concepts in probability sampling
o 1. Elements – elements are the units that comprise the population
Usually these elements are individuals
Sometimes the units are groups or organizations
o 2. Population – theoretically specified aggregation (combination) of the elements in a
study In our study of mac students, our aggregations could be undergrad, or grad,
either full time or part time
o 3. Study population - Study of pop is that aggregation of elements from which the
sample is actually selected – sometimes the researchers limit the population due to
practical or cost consideration or perhaps some reason to study the entire population
In our study we might limit the population to full time undergraduate students
(left our part time and graduate students)
o 4. Sampling units – elements considered for selection in some stage of sampling
This would be each of the individuals on our list of full time undergraduate
students
The theory of probability sampling
Probability theory – is a branch of mathematics
o This is basis for sampling techniques that produce representative samples
Parameter – summary description of a given variable in a population
o i.e mean(summary description) individual income of secretaries at mcmaster
(population)
sampling distribution – basis for estimating the parameters of a population
o 1. From one case to multiple cases – one case will give an estimate of the population
paraments
Other cases chosen randomly would give us the same of slightly different
estimates
i.e. one secretary chosen at random will constitute one case, and asking for her
income will give us some idea of what the income of secretaries at mcmaster is
but if we ask several more secretaries, we are going to get similar results but
their answers will be slightly different as all secretaries do not have the same
income (seniority dependant)
sampling distribution can be represented on a graph
y axis – number of secretaries, and x- axis income (in thousands)
graph 1 – will give us a normal curve
o 2. From one sample to multiple samples
One sample would give us an estimate of the population parameter if it is
chosen randomly
If we get more samples, it will give us the same or slightly different estimates of
the population parameters
Every time we draw a sample, if we are choosing the sample randomly the
results should be the same – maybe slight different but according to probability
theory should be very similar results
I.e. if we drew one sample of 30 secretaries, and another of 30 sec at mac, and
then yet another sample of 30 sec at mac and we try to determine the individual
income, we should be getting the same answer Sample 1 – maybe $29 000
Sample 2 – maybe $30 000
Sample 3 – maybe $31 000
According the sampling theory, the results should be similar every tim
Sampling error
o What is sampling error – it is the degree of error expected in probability sampling
o A major source of error in prob samp is size of sample being used
o When the sample in small, estimates of the population parameter are more likely to be
inaccurate
o When the sample is large, estimates of the population parameter are more accurate
o The basic principle is as the sample size increases, the sampling error decreases
o There is a formula for calculating sampling error, which is known as standard error
Why is sampling error useful to know? – because it tells us how good our estimates are
o Our estimates will have error, and we can use this to avoid larger amounts of error
o There are several propositions that we need to be familiar with
o Probability theory indicates that certain proportions of the sample estimates will fall
within specified increments (each equal to one standard error) from the population
parameters
o Probability theory indicates that approximately, 34% of the sample estimates will be
within one standard error above the population parameter
o Approximately 68% of the sample estimates will fall within + or – 1 standard error of the
pop parameter
o Furthermore, according to probability theory, 95% of the sample estimates will fall
under + or – two standard errors of the pop parameters
o Probability theory indicates that 99.9% of the sample estimates will fall within + or – 3
standard errors of the pop parameters
o So what we have is this: graph 2
o i.e. application to our study – we did a survery research, select a sample randomly
when we look at results we find that the indi income of sec at mac was $30 000
they found that using this particular sample had an error of $2000 – graph 3
what if we chose a smaller sample
we are going to have much poorer estimates – more error
larger sample – less error – graph 4
the sampling frame
what is a sampling frame – list of elements from which a probability sample is selected
some sources for a sampling frame
organizational membership lists are used to sampling individuals – i.e. lists of students or
faculty, employees of a company, members of professional associations
another example is a government agency list – gov agencies maintain lists of registered voters,
automobile owners, business permit holders, license professionals telephone directories – less used now but important in socio research over past decades, used
to sample individuals
lists of organizations – used to sample organization
street directories – city blocks and households (we will learn why this is important later)
these are five different sources of sampling frames
we need to keep in mind that there are problems with sampling frames, two specific type of
problems
o 1. Access – public, like telephone and street directories are public and easy to draw
samples from them
Problem occurs when the gov lists and other such lists are not public
o 2. Accuracy – these lists may be out of date or incomplete
o I.E organizational membership lists – people joining or leaving organizations – out of
date
i.e. telephone directories are incomplete, and many people have no landlines
Simple random sa

