COMMERCE 291 – Lecture Notes 2014 – © Jonathan Berkowitz
Not to be copied, used, or revised without explicit written permission from the copyright owner.
Summary of Lectures 13, 14, and 15
Setting the stage for statistical inference:
We previously defined (in Chapter 3) the terms population, sample, parameter and
statistic/estimate. Now we begin our study of statistical inference.
Inference means “generalization”, from the sample to the population. It is based on
answering the questions, “How trustworthy are the results? Would the results persist if
the study were repeated many times?”
A parameter has a value that doesn’t change. A statistic (i.e. estimate) will give a
different value each time a new sample is taken – this is called sampling variability. The
distribution of values that a statistic can take based on all possible samples of the same
size from the same population is called the sampling distribution. We will discuss this
What makes an estimate a “good” estimate? Two properties: unbiasedness and low
Unbiased is related to the idea of accuracy. If you sample over and over again, does the
average of the estimates get closer to the true population parameter? For example, if
you get on the bathroom scale six times and take the average of the six readings to be
your weight, you expect to be close to your true weight, assuming that the scale is
properly calibrated (i.e. zeroed) – in other words, the scale is unbiased. If the scale is set
to read –2 kg with no weight on it, the scale is biased.
Low variability is related to the idea of precision. Are all the estimates from the various
samples are close to one another? In the previous example, precision means that the six
readings on the bathroom scale are nearly the same (regardless of whether that value is
your true weight!).
To reduce bias, use random sampling.
To reduce variability, use a larger sample.
1 Chapter 11. Confidence Intervals For Proportions
A Main Objective of Inference:
Draw conclusions about a larger universe based on the available data.
How do we do this?
Based on sample information, compute a "statistic" that can be used as a good guess
(i.e. "estimate") of the unknown parameter in a larger population of interest.
From now on we will use the word "estimate" to refer to such a "statistic".
Notation for some common parameters and their estimates
µ: population mean 𝑥̅: sample mean
σ: population standard deviation s: sample standard deviation (SD)
p : population proportion 𝑝̂: sample proportion
µ – µ 𝑥̅ − 𝑥̅
1 2 1 2
p1– p 2 𝑝1 − 𝑝2
Point estimates such as 𝑥̅ and 𝑝̂, are based on samples, and hence have sampling
variability. That is, if you sampled again, you would get different data and therefore
different point estimates. Confidence intervals are an excellent way of quantifying this
sampling variability and expressing it as a margin of error. The probability that a point
estimate equals the true parameter value is actually zero, but the probability that an
interval estimate “captures” the true parameter value can be made as high as you like,
say, for example, 95%. For example, if a physician gives a pregnant woman her “due
date” as a single day, the chance of a correct prediction is very small. But if, instead, the
woman is given two-week range as a due date, the chance of a correct prediction is very
Every opinion poll uses this strategy. There is always a disclaimer of the form, “the poll is
estimated to be accurate to within 3 percentage points 19 times out of 20.” This is
actually a 95% confidence interval. The pollster is quite sure (95% in fact) that his/her
poll will not miss the true percentage by more than 3% – that’s the poll’s margin of error.
The margin of error reminds you that effects may not be as large as the point estimate
indicates; they could be much smaller, or they could be much larger. But they remind
you not to get too excited about the accuracy of a point estimate.
2 So now that you know why you should compute confidence intervals, we’re ready to
discuss how to compute them.
We begin with a confidence interval for the population proportion p.
(In Chapter 13, we will construct a confidence interval, for the population mean . In later
Chapters we will construct confidence intervals for the difference between two
proportions or two means. The basic structures are the same for all the cases we will
encounter so learn it well here!)
The parameter of interest is p, and it is unknown. It represents the true proportion of
successes in the population and we will need to estimate it from the data, using the
sample proportion of successes, 𝑝̂.
How far away from p do we think 𝑝̂ is?
Use the sampling distribution, an𝑆𝐷 𝑝̂ = √ 𝑝𝑞and the 68-95-99.7 Rule
Thus in repeated samples:
About 68% of 𝑝̂'s will be within one SD(𝑝̂) of p.
About 95% of 𝑝̂'s will be within two SD(𝑝̂)'s of p.
Since we don't know p and q in the SD formula, use SE instead.
Thus any 𝑝̂ has a 95% chance of being within 2SEs of p.
Or, 𝑝̂ ± 2 has a 95% chance of capturing p.
This is called a Confidence Interval for p.
There are many confidence intervals (C.I.), one for each type of parameter or estimation
situation. All our confidence intervals will have the same form:
Point Estimate Margin of Error
During a class, JB picks up his coffee cup 50 times; 20 times he actually drinks from it.
Compute a 95% CI for the true proportion of times JB drinks from the cup.
𝑝̂ = 20/50 = 0.40
0.40 ± 2√ 50 = 0.40 ± 0.14 or (0.26,0.54) or (26%,54%)
3 What if we want a higher level of confidence? Change the "multiplier", which the
textbook call the "critical value" and denotes by z*.
In practice, only three choices are ever used:
(You can find these values, approximately, in the Z-table; but it is easier just to
memorize the three values, especially the middle one!)
Summary: "One-proportion z-interval (more commonly known as "one-sample
confidence interval for a proportion) is: 𝑝̂ ± z*𝑞 , where z* is chosen based on
the desired "level of confidence" (i.e. 90%, 95%, 99%).
Two Notes on Interpretation:
A Confidence Interval is a statement about the estimate and the sample it came from,
not about the parameter. The 95% CI for p is an interval has a 95% chance of containing
p. This is not the same as saying that there is a 95% chance that p is in the interval. The
probability has to do with your ability, using your random sample, to correctly capture the
parameter p. The parameter does not vary, but the confidence interval does.
A Confidence Intervals contains a range of believable, sensible or plausible values of the
true, but unknown, proportion p. It is an interval estimate that has a high likelihood of
containing the true population parameter.
Assumptions and Conditions
As always, the sample must be a RANDOM sample.
• 10% Condition: n should be no more than 10% of the population
• Success/Failure Condition: 𝑝̂> 10 and n𝑞̂> 10.
4 Computing sample size for a required margin of error:
Denote the margin of error (or “plus or minus number”) by ME: ME = z*√ 𝑛
Solving for n gives𝑛 = (𝑧 ) 𝑝𝑞̂
If you have a value o𝑝̂from previous work, substitute it in here, along with the desired
z* and ME and compute n.
If𝑝̂is unknown, use the most conservative value (i.e. the value that will give you the
largest n); that value𝑝̂= 0.5.
(𝑧∗) (0.5 (0.5)
Then 𝑛 = 2
For 95% confidence (i.e. “19 times out of 20”)* z = 1.96, or approximately 2.
Substituting 2 into the above equation gives a handy Rule of Thumb, to be used as a
guideline for sample size determination.
𝑛 ≈ 1 or 𝑀𝐸 ≈ 1 (for 95% confidence)
𝑀𝐸 2 √𝑛
Remember that this gives approximate values only.
The following gives the approximate required sample sizes for various margins of error.
For E = ±10%, need n ≈ 100
For E = ± 5%, need n ≈ 400
For E = ± 3%, need n ≈ 1,100 (I just remember it as about 1,000)
For E = ± 2%, need n ≈ 2,500
The small improvement in precision for 3% to 2% requires more than doubling the
sample size from 1,100 to 2,500! This explains why most survey samples are 1000 in
5 Chapter 12. Testing Hypotheses About Proportions
Hypothesis Tests (also called Tests of Significance) address the issue of which of two
claims or hypotheses about a parameter (or parameters) is better supported by the data.
For example, is a target mean being achieved or not; does a treatment group have a
higher mean outcome than a control group; is there a greater proportion of successes in
one group than another?
All tests have the following components:
1. Hypotheses: Ho: null hypothesis
Ha: alternative hypothesis
These are statements about population parameters. The null hypothesis represents “no
change from the current position” or the default position. The alternative hypothesis is
the research hypothesis. The burden of proof is on the investigator to convince the
skeptic to abandon the null hypothesis.
H o specifies a parameter and suggests a value for it.
H a gives values of the parameter that would be believable
if we rejected the null hypothesis
H o p = p o
H a p ≠ p o
2. Test Statistic: Uses estimate(s) of the parameter(s), the standard error(s) of the
estimate(s), and information in the null hypothesis and puts them together in a “neat
package” with a known distribution. Certain values of this test statistic will support H o
while others will support H .a
3. P-value: a probability that judges whether the data (via the value of the test statistic)
are more consistent with H or oith H . a
Note: The P-value assumes that H is truo and then evaluates the probability of getting
a value of the test statistic as extreme as or more extreme than what you observed. The
P-value is not the probability that H io true. The smaller the P-value, the greater the
evidence against the null hypothesis.
4. Conclusion: a statement of decision in terms of the original question. It is not enough
simply to write “reject H o
6 Let's develop our first hypothesis test:
Examples: Is a coin "fair"? Are daily changes in the Dow Jones Industrial Average (DJIA)
equally likely to be up as down? For both of these situations the hypotheses are:
Ho: p = 0.5
Ha: p ≠ 0.5
Compute 𝑝̂based on a random sample of size n. Compute SD( 𝑝̂) and use it to see how
far away from 0.5 (i.e. the o value) your𝑝̂ is. This will tell you if 𝑝̂is an unlikely or
Example: DJIA continued. From 1112 trading days, the DJIA is up 573 days.
So 𝑝̂ = 573/1112 = 0.5153 or 51.53%.
SD( 𝑝) = √ 𝑝0 0 = √ (0.5 (0.5)= 0.015
So 𝑝̂ is about 1 SD(𝑝̂) away from po.
Let's be more accurate:
𝑝−𝑝 0 = 𝑝−𝑝 0 = 0.5153−0.5 = 1.02
𝑆𝐷(𝑝) 𝑝0 0 0.015
How "likely" a value is 1.02?
Think back to Sampling Distributions (Chapter 9);
To standardize 𝑝, we use 𝑍 =
Here p and q=1-p are the unknown, since p is the proportion in the population.
But if o is a reasonable choice p and q=1-p, then 𝑝−𝑝 0 should behave like a Z.
𝑝 0 0
What does it mean to “behave like a Z?” It means that the value should be something
likely to be found on a Z-table, i.e. a value between about -2 and 2.
Use Table Z to find the area to the right of 1.02; that area is found to be 0.1539.
Since if the DJIA is not random it might be up more than half the time or down more than
half the time, we need the area to the left of -1.02; that is, the flip-side of the Z curve.
So the total area is double the single tail area, giving 2x0.1539 = 0.3078 or about 31%.
This probability is called the
P-value: how "surprising" are the data we observed if, in fact, the null hypothesis were
A low P-value means the data are unlikely given the H volue so we reject H . o
A high P-value means the data support H soowe do not reject H . o
How low is "low"? How "unlikely" is "unlikely"? What is "reasonable doubt"?
7 1 chance in 20? 1 chance in 100? 1 chance in 1000?
8 One-sample z-test for a single population proportion, p
(Note: The text calls this the one-proportion z-test)
H o p = po
H a p ≠ po
Remember that the confidence interval for p is: 𝑝̂ ± z* √
Test statistic: Z = (This test statistic has a z-distribution.)
√ 𝑝0 0
Note the difference between the test statistic and the confidence interval. In the test
𝑝𝑞̂ 𝑝0 0
statistic, the SE of 𝑝̂ is used ( ), but hypothesis test uses √ . That’s because in a
hypothesis test we compute the test statistic, assuming H is true.
Substitute in the values and compute the test statistic; call it z-stat.
P-value = 2 x Pr (z >| z-stat|)
If P-value is "small", then reject o .
Alpha Level and Significance
The threshold for a P-value (to decide whether it is "small") is called the alpha level (α)
or the level of significance. Sometimes it is stated as a percentage rather than a
proportion. Usually α = 0.05, but 0.10 or 0.01 are also used.
If P-value < α, then reject o ; i.e. the result is "statistically significant"
If P-value > α, then do not reject Ho
Our H a p ≠ p os called a two-sided or two-tailed alternative.
One-sided or one-tailed alternatives are also possible; H :ap > p o or Ha: p < po.
You need to pick one of these three alternatives at the beginning of the test. The choice
depends on the wording of the question.
For either Ha: p > poor H a p < po , you compute the same test statistic; but the
P-value = Pr (z > |z-stat|), which is exactly half the P-value of the two-sided a : p ≠op .
That is, the P-value for a two-sided alternative two times the P-value for a one-sided
alternative. Isn't that easy?
As with the confidence interval, this test assumes large samples, and works well when
np 0 10 and nq > 00.