Page 80 - Applied Probability

P. 80

4. Hypothesis Testing and Categorical Data
Test
4.4 The Z
max
Consider a multinomial experiment with n trials and m categories. Denote
the probability of category i by p i and the random number of outcomes in
category i by N i . The Z max statistic [12, 15] is deﬁned by
N i − np i 63
= .
Z max max
1≤i≤m np i (1 − p i )
This statistic is designed to detect departures from the multinomial as-
sumptions caused by the clustering of the observations in one or a few
categories. Consequently, a one-sided test is appropriate, and the multino-
mial model is rejected when Z max is too large. The speciﬁc form of the
Z max statistic is suggested by the fact that the category speciﬁc statistics
N i − np i
Z i =
np i (1 − p i )
are standardized to have mean 0 and variance 1. Furthermore, when n
is large, each Z i is approximately normally distributed. The usual rule of
thumb np i ≥ 3 for normality is helpful, particularly if a continuity correc-
tion is added to Z i .
To compute p-values for Z max ,let z max be the observed value of the
statistic, and deﬁne the events A i = {Z i ≥ z max}. Then
m
#
Pr(Z max ≥ z max)=Pr A i
i=1
m

≤ Pr(A i ) (4.1)
i=1
≈ m[1 − Φ(z max)],
where Φ is the standard normal distribution function. Alternatively, each
Pr(A i ) can be computed exactly as a right-tail probability of a binomial
distribution with n trials and success probability p i .
The upper bound (4.1) can be supplemented by the lower bound
m m
#
Pr A i ≥ Pr(A i ) − Pr(A i ∩ A j )
i=1 i=1 i<j
m

≥ Pr(A i ) − Pr(A i )Pr(A j ) (4.2)
i=1 i<j
m m
m 2
1 2 1
= Pr(A i )+ Pr(A i ) − Pr(A i )
2 2
i=1 i=1 i=1
m(m − 1) 2
≈ m[1 − Φ(z max )] − [1 − Φ(z max)] .
2

75 76 77 78 79 80 81 82 83 84 85