Page 91 - Applied Probability

P. 91

4. Hypothesis Testing and Categorical Data
74
as stated in the text. (Hints: It suﬃces to show that (4.7) holds when
n = 1 and that the set of random vectors satisfying (4.7) is closed
under the formation of sums of independent random vectors. For (4.8)
consider the vectors −N 1,..., −N m.)
4. Using the Chen-Stein method and probabilistic coupling, Barbour et
al. [4] show that the statistic W d satisﬁes the inequality
1 − e −λ
sup | Pr(W d ∈ A) − Pr(Z ∈ A)|≤ [λ − Var(W d )],(4.9)
A⊂N λ
where Z is a Poisson random variable having the same expectation
m
λ = µ i as W d , and where N denotes the set {0, 1 ...} of non-

i=1
negative integers. Prove that
2
λ − Var(W d )= µ − Cov(1 {N i≥d} , 1 {N j ≥d} ).
i
i i j =i
In view of Problem 3, the random variables 1 {N i ≥d} and 1 {N j ≥d} are
negatively correlated. It follows that the bound (4.9) is only useful
2
when the number λ −1 (1 − e −λ ) i µ is small. What is the value of
i
2
λ −1 (1 − e −λ ) µ for the hemoglobin data when d = 2? Careful
i i
estimates of the diﬀerence λ − Var(W d ) are provided in [4].
5. Consider a multinomial model with m categories, n trials, and prob-
ability p i attached to category i. Express the distribution function of
the maximum number of counts max i N i observed in any category in
terms of the distribution functions of the W d . How can the algorithm
for computing the distribution function of W d be simpliﬁed to give
an algorithm for computing a p-value of max i N i ?
6. Continuing Problem 5, deﬁne the statistic U d to be the number of
categories i with N i <d. Express the right-tail probability Pr(U d ≥ j)
in terms of the distribution function of W d . This gives a method for
computing p-values of the statistic U d. In some circumstances U d has
an approximate Poisson distribution. What do you conjecture about
these circumstances?

7. The nonparametric linkage test of de Vries et al. [10] uses aﬀected
sibling data. Consider a nuclear family with s aﬀected sibs and a
heterozygous parent with genotype a/b at some marker locus. Let n a
and n b count the number of aﬀected sibs receiving the a and b alleles,
respectively, from the parent. If the other parent is typed, then this
determination is always possible unless both parents and the child
are simultaneously of genotype a/b. de Vries et al. [10] suggest the
statistic T = |n a − n b |. Under the null hypothesis of independent

86 87 88 89 90 91 92 93 94 95 96