Page 80 - Applied Probability
P. 80

4. Hypothesis Testing and Categorical Data
                                                 Test
                              4.4 The Z
                                             max
                              Consider a multinomial experiment with n trials and m categories. Denote
                              the probability of category i by p i and the random number of outcomes in
                              category i by N i . The Z max statistic [12, 15] is defined by
                                                                  N i − np i                 63
                                                       =                    .
                                                 Z max     max
                                                           1≤i≤m  np i (1 − p i )
                              This statistic is designed to detect departures from the multinomial as-
                              sumptions caused by the clustering of the observations in one or a few
                              categories. Consequently, a one-sided test is appropriate, and the multino-
                              mial model is rejected when Z max is too large. The specific form of the
                              Z max statistic is suggested by the fact that the category specific statistics
                                                               N i − np i
                                                     Z i  =
                                                               np i (1 − p i )
                              are standardized to have mean 0 and variance 1. Furthermore, when n
                              is large, each Z i is approximately normally distributed. The usual rule of
                              thumb np i ≥ 3 for normality is helpful, particularly if a continuity correc-
                              tion is added to Z i .
                                To compute p-values for Z max ,let z max be the observed value of the
                              statistic, and define the events A i = {Z i ≥ z max}. Then
                                                                      m
                                                                      #
                                              Pr(Z max ≥ z max)=Pr       A i
                                                                      i=1
                                                                   m

                                                              ≤      Pr(A i )              (4.1)
                                                                  i=1
                                                              ≈ m[1 − Φ(z max)],
                              where Φ is the standard normal distribution function. Alternatively, each
                              Pr(A i ) can be computed exactly as a right-tail probability of a binomial
                              distribution with n trials and success probability p i .
                                The upper bound (4.1) can be supplemented by the lower bound
                                       m           m
                                       #
                                   Pr     A i  ≥      Pr(A i ) −  Pr(A i ∩ A j )
                                       i=1         i=1        i<j
                                                   m

                                               ≥      Pr(A i ) −  Pr(A i )Pr(A j )         (4.2)
                                                   i=1        i<j
                                                   m             m            
  m        2
                                                   	          1  	       2   1
                                               =      Pr(A i )+    Pr(A i ) −     Pr(A i )
                                                              2              2
                                                   i=1          i=1            i=1
                                                                   m(m − 1)            2
                                               ≈ m[1 − Φ(z max )] −        [1 − Φ(z max)] .
                                                                      2
   75   76   77   78   79   80   81   82   83   84   85