Page 91 - Applied Probability
P. 91

4. Hypothesis Testing and Categorical Data
                              74
                                   as stated in the text. (Hints: It suffices to show that (4.7) holds when
                                   n = 1 and that the set of random vectors satisfying (4.7) is closed
                                   under the formation of sums of independent random vectors. For (4.8)
                                   consider the vectors −N 1,..., −N m.)
                                 4. Using the Chen-Stein method and probabilistic coupling, Barbour et
                                   al. [4] show that the statistic W d satisfies the inequality
                                                                       1 − e −λ
                                        sup | Pr(W d ∈ A) − Pr(Z ∈ A)|≤       [λ − Var(W d )],(4.9)
                                        A⊂N                               λ
                                   where Z is a Poisson random variable having the same expectation
                                          m
                                   λ =       µ i as W d , and where N denotes the set {0, 1 ...} of non-

                                          i=1
                                   negative integers. Prove that
                                                        	    2
                                        λ − Var(W d )=      µ −        Cov(1 {N i≥d} , 1 {N j ≥d} ).
                                                             i
                                                          i      i  j =i
                                   In view of Problem 3, the random variables 1 {N i ≥d} and 1 {N j ≥d} are
                                   negatively correlated. It follows that the bound (4.9) is only useful
                                                                    2
                                   when the number λ −1 (1 − e −λ )    i  µ is small. What is the value of
                                                                    i
                                                    2
                                   λ −1 (1 − e −λ )     µ for the hemoglobin data when d = 2? Careful
                                                  i  i
                                   estimates of the difference λ − Var(W d ) are provided in [4].
                                 5. Consider a multinomial model with m categories, n trials, and prob-
                                   ability p i attached to category i. Express the distribution function of
                                   the maximum number of counts max i N i observed in any category in
                                   terms of the distribution functions of the W d . How can the algorithm
                                   for computing the distribution function of W d be simplified to give
                                   an algorithm for computing a p-value of max i N i ?
                                 6. Continuing Problem 5, define the statistic U d to be the number of
                                   categories i with N i <d. Express the right-tail probability Pr(U d ≥ j)
                                   in terms of the distribution function of W d . This gives a method for
                                   computing p-values of the statistic U d. In some circumstances U d has
                                   an approximate Poisson distribution. What do you conjecture about
                                   these circumstances?

                                 7. The nonparametric linkage test of de Vries et al. [10] uses affected
                                   sibling data. Consider a nuclear family with s affected sibs and a
                                   heterozygous parent with genotype a/b at some marker locus. Let n a
                                   and n b count the number of affected sibs receiving the a and b alleles,
                                   respectively, from the parent. If the other parent is typed, then this
                                   determination is always possible unless both parents and the child
                                   are simultaneously of genotype a/b. de Vries et al. [10] suggest the
                                   statistic T = |n a − n b |. Under the null hypothesis of independent
   86   87   88   89   90   91   92   93   94   95   96