28 ANOVA (ANalysis Of VAriance)
Are the Means of Several Groups Equal?
- We already know how to test whether two groups have equal
means (Ch 24)
- When we want to test whether more than 2 groups have equal
means, we could compare each pair of groups with a t-test.
- However, we’d wind up increasing the probability of a Type I
error, since each test would bring with it its own α.
- Fortunately, there is a test that generalizes the t-test to any
number of treatment groups.
- For comparing several means, there is yet another sampling
distribution model, called the F-model.
Here, we introduce the tool, ANOVA, for comparing the means of at
least two populations. It will provide us with the opportunity to
make general conclusions by finding an overall error probability α.
There are 2 possible conclusions:
1. There are some differences among the means at a level of
2. There are no differences among the means at a level of
1 of 23 Example: We want to compare three brands of gasoline.
Brand1 Brand2 Brand3
15 19 22
19 17 17
14 16 19
16 20 18
Average 16 18 19
St. Dev. 2.16 1.83 2.16
- It looks like these observations could have occurred from
treatments with the same means.
Let’s consider another scenario:
Brand1 Brand2 Brand3
15.2 18.5 19.6
16.1 17.5 19.3
16.8 18.2 18.4
15.9 17.8 18.7
Average 16 18 19
St. Dev. 0.66 0.44 0.55
- It’s easy to see that the means in the second set differ.
o It’s hard to imagine that the means could be that far apart
just from natural sampling variability alone.
2 of 23 NOTE: the two sets of mean travel distances in both scenarios are
the same. (They are 16, 18, and 19, respectively.) Then why do the
figures look so different?
- In order to assess whether these sample means differed because
their respective population means actually are different, we need
to introduce variability.
- Two types of variability:
1)variability within (inside) the groups
2)variability between the groups (ie. Difference between the
means of the groups)
- In the first scenario, the differences among the means look as
though they could have arisen just from natural sampling
variability from groups with equal means, so there’s not enough
evidence to reject H .
- In the second scenario, the variation within each group is so small
that the differences between the means stand out.
- And it’s the central idea of the F-test.
variation between thegroups
variation within thegroups
3 of 23 o We compare the difference between the means of the groups
with the variation within the groups
o When the differences between means are large compared
with the variation within the groups, we reject the null
hypothesis and conclude that the means are (probably) not
- We have an estimate from the variation within groups.
o Traditionally, it’s called the Error Mean Square (or
sometimes Within Mean Square) and written MSE.
o It’s just the variance of the residuals.
o Because it’s a pooled variance, we write it s p
- We’ve got a separate estimate from the variation between the
o We call this quantity the Treatment Mean Square (or
sometimes Between Mean Square) denoted by MS T.
o We expect it to estimate σ , if we assume the null hypothesis
- When the null hypothesis is true and the treatment means are
equal, both MS anE MS estimTte σ , and their ratio should be
4 of 23 close to 1.
- We can use their ratio, the F-statistic F = MS /MS to test the
- Just like Student’s t, the F-models are a family of distributions.
However, since we have 2 variance estimates, we have 2 degrees
of freedom parameters:
o MS esTimates the variance of the treatment means and has
df1= k – 1 when there are k groups
o MS isEthe pooled estimate of the variance within groups. If
there are a total of N observations and there are k groups,
MS hEs df = 2 – k.
- You’ll often see the Mean Squares and other info put into a table
called the ANOVA table.
- For the soap example in the book, the ANOVA table is:
- The ANOVA table was originally designed to organize the
- With advances in technology, we get all of this info, but we only
need to look at the F-ratio and the p-value
5 of 23 - Usually, you’ll get the P-value for the F-statistic from technology.
Any software program performing an ANOVA will automatically
“look up” the appropriate one-sided p-value for the F-statistic.
- If you want to do it yourself, you’ll need an F-table.
1. We have k independent random samples, one from each of the
2. The data within each treatment group must be independent.
3. The i population has a normal distribution with unknown
mean µ, ihere i = 1,…, k. The means may be different.
(Check by a histogram or a Normal probability plot of all the
residuals together; Check for outliers for each treatment group)
4. All the populations have the same standard deviation σ, whose
value is unknown.
The hypotheses in ANOVA are:
H : µ = µ = … = µ
0 1 2 k
H a the means are not all equal
F statistic I 1
The F-statistic is: MSE SSE
6 of 23 The F distributions
1. are a family of distributions with two parameters: the degrees of
freedom in the numerator and denominator;
2. interchanging the degrees of freedom changes the distributions;
3. are right-skewed;
4. have no probability below 0;
5. the peak of the density curve is near 1;
p-value = P(F > F0) with df1= k – 1 and d2 = N – k;
(ie. The p-value for 0 is the probability of obtaining a sample with
an F-statistic greater than the observed one i0 H is true.)
NOTE: the alternative in the ANOVA is always the same. It is not
one-sided or two-sided, but multi-sided. P-values are always and
only calculated as the probability in the right tail of an F-
ANOVA Table for k Independent Random Samples
Source df SS MS F
Treatments k-1 SS T MS TSS /Tk-1) MS /TS E
Error N-k SS E MS ESS /EN-k)
Total N-1 Total SS
7 of 23 Example (Continued):
Using the Data from Scenario 1, the following ANOVA summary is
Source of Variationdf SS MS F P-value F crit
Between Groups 2 18.66667 9.333333 2.210526 0.165597 4.256492
Within Groups 9 38 4.222222
Total 11 56.66667
8 of 23 Example:
In an effort to improve the quality of recording tapes, the effects of
four kinds of coatings A, B, C, D on the reproducing quality of
sound are compared.
The following values on distortion are obtained:
A B C D
10 14 17 12
15 18 16 15
8 21 14 17
12 15 15 16
15 17 15
Average 12 17 16 15
Variance 9.5 10 2 2.8
The following ANOVA output is obtained:
Variation SS df MS F P-value F crit
Groups 68 3 22.66667 4.340426 0.018136 3.159911
Groups 94 18 5.222222
Total 162 21
With the help of such a sample, we want to decide if the four
different coatings result in different mean distortions at α = 0.05
9 of 23 10 of 23 26 Comparing Counts
This chapter will cover 2 types of tests:
1. Tests of hypotheses for experiments with more than 2
categories, called goodness-of-fit tests.
2. Tests of hypotheses about contingency tables, called
independence and homogeneity tests.
Goodness-of-Fit Test or Univariate Test
In this section, the test for comparing the frequency distribution
of a categorical variable with more than 2 categories from a sample
with a given model probability distribution is introduced.
A company filling grass seed bags wants to evaluate their filling
machine. The following distribution is advertised on the bags:
Kinds of seeds Proportion (expected
The company wants to check if the seed distribution in the bags fits
the advertised distribution.
11 of 23 They take a sample of size 1000 and find the following summarized
Kinds of seeds Count
The 2 Goodness-of-Fit Test
Given a distribut