Ch. 28 - Comparing Several Means
Use t-tools? NO!
Æ Reason? Compound uncertainty
- In any test, there is uncertainty such that we reject H 0 when it’s true, or Type
I error. By comparing multiple means and using ONE t-test for each pair, the
“overall” Type I error will compound.
- For example, consider 3 means that are equal and each t-test uses α = 0.05.
Thus, there’s a 5% chance to show a difference when there isn’t (recall H 0
assumes no diff.). The chance of de3ecting at least one difference among the
three means is roughly 1 – 0.95 = 0.143 or 14.3% when the means are
EQUAL! (Note: 14.3% is the “overall” Type I error.)
- For 5 means, the “overall” Type I error becomes approximately 40%.
Def’n: ANalysis Of VAriance (ANOVA) is a procedure to test the equality of three or
more population means. NOTE: the name of the test refers to comparing different
sources of variability; it WILL test differences among means.
Test requires the following assumptions:
1. The populations are all normally distributed.
2. The populations all have the same standard deviation.
3. The samples from different populations are random and independent.
- Assumption #1 is checked with histograms/boxplots for each group.
- Assumption #2 is more critical but harder to assess. Still, we can use side-by-side
boxplots (or the informal rule from Ch. 24).
- Assumption #3 by analyzing the experiment design.
- yij observation for i subject in j group
- j = 1, …, k indexes groups
- i = 1, …, n jindexes subjects within groups
- nj= # of observations in j group; N = ∑ nj = total # of observations
- y jnd sjare sample mean and variance for the j grouph
- y = grand mean = mean for combined sample:
y = 1 y = 1 n y
N ∑j i ij N ∑ j j j
Statistical model, parameters, hypotheses:
Each observation can be represented by
Yij µ + τ +jε ij
where Y aij independentthandom observations,th is the overall mean, τ is a paramjter
associated with the j group called the j treatment effect, and ε is a rijdom error.
H 0 µ 1 … = µ k
H : the µ are not all equal
(OR, at least 2 µ jre different from each other; OR at least 1 µ is dijferent) ANOVA F test statistic:
For sources of variability in the model above, the ANOVA identity is
k nj k nj k nj
∑∑ (yij− y) 2 ∑∑ (y+j− y) 2 ∑∑ (yij y )j 2
j=1=1 j=1=1 j=1=1
SS Y= SS T SS E
where SS Y is the total sum of squares, SSTis the treatment sum of squares and SS iE the
error sum of squares . Also, the presence of “squares” suggest a ratio test. Thus, for the
above H 0, we have
SS = n (y − y()variabbiwteenles)
T ∑ j j