18 Sampling Distribution Models
1)Population parameter is a numerical measure such as the
mean, median, mode, range, variance, or standard deviation
calculated for a population data; and is written with Greek
letters. Eg. µ and σ.
- Usually unknown and constant
2)Sample statistic is a summary measure calculated for a
sample data set; it is written with Latin letters. Eg.
- Regarded as random before sample is selected
- Observed after sample is selected
3)The value of the statistic varies from sample to sample, this is
called sampling variability.
4)The distribution of all the values of a statistic is called its
Population and Sample Proportions
Suppose we are just interested in one characteristic occurred in the
population of interest. For convenience, we will call the outcome
we are looking for “Success” (S). The population proportion, p,
1 of 28 is obtained by taking the ratio of the number of successes in a
population to the total number of elements in the population.
Example for population proportion:
- check for N students, how many are "nonresidents".
- check how many out of N patients survived at least five
years, after a specific cancer treatment.
Recall: N is population size.
Looking at a SRS of the size n from a large population, the
probability p can be estimated by calculating the sample
proportion (relative frequency) of Successes, p . That is,
p number of Successes(S)inthe sample
sample size .
Example for sample proportion:
- flip n coins and observe if “Tail” was tossed.
- look at n random persons and survey how many have an IQ
- look at n random students and survey how many have more
than two siblings.
2 of 28 Example:
Suppose a total of 10,000 patients in a hospital and 7,000 of them
like to play basketball. A sample of 200 patients is selected from
this hospital, and 128 of them like to play basketball. Find the
proportion of patients who like to play basketball in the population
and in the sample.
Find the sampling error for this case while assuming that the
sample is random and no nonsampling error has been made.
Sampling error =
The Sampling Distribution of a Sample Proportion ( ) p
- Consider two different samples from a population, which you
want to use for estimating the proportion of people with more
than 2 siblings in the population. Use the statistic
"proportion" for both samples. Are the outcomes the same?
- Most likely not! This is known as sampling variability.
- imagine we draw many samples and look at the sample
proportions for these samples.
3 of 28 - The histogram we’d get if we could see all the proportions
from all possible samples is called the sampling distribution
of the proportions.
- What would the histogram of all the sample proportions look
- We would expect the histogram of the sample proportions to
center at the true proportion, p, in the population.
- RULE 1: p is the mean of the sampling distribution of
Notation: p is also denoted as p
- RULE 2: The standard deviation of the sampling
distribution of p , p, is:
Notation: p is also denoted as SD p .
NOTE: the standard deviation of a sampling distribution is
called a standard error. Thus, SD p is called a standard
4 of 28 - As far as the shape of the histogram goes, we can simulate a
bunch of random samples that we didn’t really draw.
- It turns out that the histogram is unimodal, symmetric, and
centered at p.
- More specifically, the sampling proportion p is
approximately normal distributed for large n. (Central Limit
RULE 3: A rule of thumb states that the sample size is
considered to be sufficiently large if:
np > 10 and n(1 – p) > 10
NOTE: When n is large and p is not too close to 0 or 1, the
sampling distribution of p is approximately normal. The
further p is away from 0.5, the larger n must be for accurate
normal approximation of p .
- A sampling distribution model for how a sample proportion
varies from sample to sample allows us to quantify that
variation and how likely it is that we’d observe a sample
proportion in any particular interval.
- So, the distribution of the sample proportions is modeled with
a probability model that is
5 of 28 pq
- Because we have a Normal model, for example, we know
that 95% of Normally distributed values are within two
standard deviations of the mean. So we should not be
surprised if 95% of various polls gave results that were near
the mean but varied above and below that by no more than
two standard deviations.
Example: Which Brand of Pizza Do You Prefer?
• Two Choices: A or D.
• Assume that half of the population prefers Brand A and
half prefers Brand D.
6 of 28 • Take a random sample of n = 3 tasters.
Find the sampling distribution for the sample proportion. Find the
mean and standard deviation.
Sample No. Prefer Proportion Sample Probability
Pizza A Proportion
(A,A,A) 3 1 0 1/8
(A,A,D) 2 2/3 1/3 3/8
(A,D,A) 2 2/3 2/3 3/8
(D,A,A) 2 2/3 1 1/8
(A,D,D) 1 1/3
(D,A,D) 1 1/3
(D,D,A) 1 1/3
(D,D,D) 0 0
7 of 28 Assumptions and Conditions
- Most models are useful only when specific assumptions are
- There are two assumptions in the case of the model for the
distribution of sample proportions:
1. The Independence Assumption: The sampled values
must be independent of each other.
2. The Sample Size Assumption: The sample size, n,
must be large enough.
- Assumptions are hard—often impossible—to check. That’s
why we assume them.
- Still, we need to check whether the assumptions are
reasonable by checking conditions that provide information
about the assumptions.
- The corresponding conditions to check before using the
Normal to model the distribution of sample proportions:
Randomization Condition: The sample should be a
simple random sample of the population.
10% Condition: If sampling has not been made with
replacement, then the sample size, n, must be no larger
than 10% of the population.
8 of 28 Success/Failure Condition: The sample size has to be
big enough so that both np and nq are at least 10.
With these sampling distributions, we can apply standardization in
order to find the probability for the proportion of Successes. The z
value for a value of p is:
p p p p
p p(1 p) ,
which is well approximated by the standard normal distribution.
A study showed that the proportion of people in the 20 to 34 age
group with an IQ (on the Wechsler Intelligence Scale) of over 120
is about 0.35.
Let p = proportion of the sample with an IQ of at least 120.
a) Find the mean and standard deviation of sample proportion
9 of 28 b) What can you say about the distribution of sample proportion?
c) Find the probability for the event that in a sample of 50 there
are more than 30 people with an IQ of at least 120.
In an experiment, 32 subjects made a total of 60,000 guesses on a
set of 5 symbol cards.
Pure chance would give around 12,000 correct guesses, but the
subjects had a total of 12,489 correct guesses.
10 of 28 a) Find the mean and standard deviation of the sample
proportion of correct guesses.
b)What can you say about the distribution of sample proportion
of correct guesses?
c) Could this excess of 489 good guesses just be good luck? In
other words, calculate the probability for the event that in the
total guesses, there are more than 12489 correct guesses.
11 of 28 Example: (Please try it on your own)
Suppose that the true proportion of people who have failed a
professional exam is 0.87. A sample consists of 158 people is
a) Find the mean and standard deviation of the sample
proportion of people failed to pass the professional exam?
b)What can you say about the distribution of sample
c) Find the probability that the sample proportion of people
failed to pass the professional exam exceeds 0.94.
12 of 28 What about Quantitative Data?
- Proportions summarize categorical variables.
- The Normal sampling distribution model looks like it will be
- Can we do something similar with quantitative data?
- We can indeed. Even more remarkable, not only can we use
all of the same concepts, but almost the same model.
The population is a class of 5 Stat 151 students. Let be the
population mean of their weight. Select a random sample of size 3
and observe the average weight . We must be careful to
distinguish this number x from .
How can x , based on a sample of a small percentage of the class,
be an accurate estimate of ? After all, a second sample would
give a different value ofx .
This basic fact is called sampling variability (the value of a
statistic varies in repeated random sampling).
13 of 28 Now suppose you look at every possible random sample from this
Stat 151 population and the corresponding sample mean. For
these numbers, you can create the sampling distribution.
Population Data All possible samples of size 3
Student Weight Possible Weight in thex
A 70 ABC 70, 75, 75 73.3333
B 75 ABD 70, 75, 75 73.3333
C 75 ABE 70, 75, 80 75
D 75 ACD 70, 75, 75 73.3333
E 80 ACE 70, 75, 80 75
ADE 70, 75, 80 75
BCD 75, 75, 75 75
BCE 75, 75, 80 76.6667
BDE 75, 75, 80 76.6667