Page 83 - Applied Probability
P. 83

4. Hypothesis Testing and Categorical Data
                              66
                                With these definitions in mind, note first the obvious initial values (a)
                              t 0,k,1 = 1 for k< d, (b) t 0,k,1 = 0 for k ≥ d, and (c) t j,k,1 = 1 for j> 0.
                              Now beginning with l = 1, compute t j,k,l recursively by conditioning on
                              how many observations fall in category l. Since at most d − 1 observations
                              can fall in category l without increasing W d by 1, the recurrence relation
                              for j =0 is
                                     t 0,k,l
                                                                i                  k−i
                                     min{d−1,k}
                                        	      k        p l                p l
                                 =                                 1 −                t 0,k−i,l−1 ,
                                               i    p 1 + ··· + p l   p 1 + ··· + p l
                                        i=0
                              and the recurrence relation for j> 0is
                                     t j,k,l
                                                                 i                 k−i
                                     min{d−1,k}
                                        	      k         p l               p l
                                  =                                1 −                t j,k−i,l−1
                                                i   p 1 + ··· + p l    p 1 + ··· + p l
                                        i=0
                                                             i
                                        k                                       k−i
                                       	    k        p l               p l
                                     +                         1 −                t j−1,k−i,l−1 .
                                            i   p 1 + ··· + p l    p 1 + ··· + p l
                                       i=d
                              These recurrence relations jointly permit replacing the matrix (t j,k,l−1 )by
                              the matrix (t j,k,l ). At the end of this recursive scheme on l =2,... ,m,we
                              extract the desired probability t w d −1,n,m .
                                This algorithm for computing the distribution function of W d relies on
                                                                     k
                                                                         i    k−i
                              evaluation of binomial probabilities b i,k =  r (1 − r)  . The naive way
                                                                     i
                              of computing the b i,k is to evaluate the binomial coefficient separately and
                              then multiply it by the two appropriate powers. The recurrence relations
                              b i,k = rb i−1,k−1 +(1 − r)b i,k−1 for 0 <i <k and the boundary recurrences
                              b 0,k =(1 − r)b 0,k−1 and b k,k = rb k−1,k−1 offer a faster and more stable
                              method. To start the recurrence, use the initial conditions b 0,1 =1 − r
                              and b 1,1 = r. It is noteworthy that the binomial recurrence increments the
                              number of trials k whereas the recurrence for the distribution function of
                              W d increments the number of categories l.
                              Example 4.5.1 Mutations in Hemoglobin α

                                Mutations in the human hemoglobin molecule have been observed in
                              many populations. Vogel and Motulsky [43] tabulate 66 mutations in the
                              141 amino acids of the hemoglobin α chain. Of these 141 amino acids, 16
                              show two or more mutations. With all p i =  1  , the mean of W 2 is λ =11.3.
                                                                    141
                              Under the Poisson approximation for W 2 , the associated p-value is .11. In
                              this example the Poisson approximation is poor, and the exact algorithm
                              yields the more impressive p-value of .028. Thus, the data suggest nonran-
                              domness. It may be that some amino acids are so essential for hemoglobin
   78   79   80   81   82   83   84   85   86   87   88