Ch. 26 - Comparing Counts
Notation: k = # of categories of a qualitative variable
k
p = true proportion of category i; i = 1,…, k (Note: p = 1)
i ∑ i
i=1
A random sample of size n will provide sample statistics of “observed counts”. These
values can compare against “expected counts” of np fir each category. Consequently, an
H 0an collectively test the validity of each i. How?
Def’n: The “goodness-of-fit” test uses the chi-square statistic, χ , is computed by
2
χ = (Obs − Exp )
∑ Exp
cells
where Obs = “observed count”, Exp = “expected count”, and you sum over all categories.
Sizeable differences between Obs and Exp of specific categories lead to large values of χ
and subsequent rejection of H 0 For formal rejection/non-rejection, we need a formal test.
Aside: The chi-squared distribution has the following properties:
- like the t-distribution, it has only one parameter, df, that can take on any positive
integer value.
- skewed to the right for small df but becomes more symmetric as df increases.
- curve where all areas correspond to nonnegative values.
2
- values denoted by χ
When H is correct and n sufficiently large, χ approx. follows a χ -dist’n with df = k – 1.
0 2 2
Using this dist’n, the corresponding P-value is the area to the right of χ under thek-1
curve (all curves found in Appendix Table X). For test validity, the following must hold:
1) Observed cell counts are based on a random sample.
2) The sample size is large (every expected count ≥ 5).
Ex26.1) Table 26X0 - Number of Films in 2012 by Film Rating
Film Rating Frequency ( Obs) Expected count (Exp)
G 15 np = 443(0.25) = 110.75
G
PG 62 110.75
PG-13 145 110.75
R 221 110.75
Are film ratings evenly distributed among all the movies made in 2012? Use α = 0.05.
Assumptions: Entire population of American films, not random sample. We will assume
it, but cautiously. Positively, all expected counts are greater than 5, so the “goodness-of-
fit” test is possible.
H 0: G = 0.25, pPG = 0.25, pPG-13 0.25, pR= 0.25
H A at least one i is not as claimed 2 2 2 2
χ2 = +15−+−10.+−) (62 110.75) (145 110.75) (221 110.75)
110.75 110.75 110.75 110.75
= 82.782 + 21.459 + 10.592 + 109.752
= 224.585
2 2
At χ k-1= χ3, 224.585 is higher than the largest value of 12.838, which has a P-value of
0.005. Thus, the P-value range is (0, 0.005). With this range and the given α = 0.05,
reject H0. Conclusively, there is enough evidence that the film ratings are not evenly
distributed.
Testing for Homogeneity
Def’n: A two-way frequency table (or a contingency table) summarizes categorical data.
Each cell in the table is a particular combination of categorical values.
Mar oinlasl occur by extending the table to include the sums of each row and
column. In addition, the grand total occurs.
Table 26X1 – 2-way table of responses
Hockey

More
Less