Page 82 - Applied Probability

P. 82

4. Hypothesis Testing and Categorical Data
chromosomes are highly contracted and can be distinguished on the basis
of size, position of their centromeres, and characteristic banding patterns.
To map a probe, the DNA trapped within a metaphase spread on a micro-
scope slide is denatured in situ and hybridized with a tritium-labeled or
ﬂuorescent-labeled probe. A photographic emulsion immediately above the
spread registers the presence of the probe on one or more chromosomes. 65
When a human probe is hybridized to chromosomes of another mam-
malian species, the probe and corresponding conserved sequence on the
mammalian chromosome may be suﬃciently diﬀerent that the hybridiza-
tion signal is weak. In such cases the probe can appear to hybridize pref-
erentially to several diﬀerent chromosomal regions. To pick out the real
peaks of hybridization from purely random peaks, Ewens et al. [12] apply
the Z max test. Table 4.2 reproduces their data on the hybridization of the
human ZYF probe, a zinc ﬁnger protein probe on the Y chromosome, to
homologous regions of the chromosomes of the Australian marsupial Macro-
pus eugenii. Fourteen chromosomal segments and 279 hybridization events
appear in the table. The observed z max statistic of 7.030 is signiﬁcant at
the .001 level and conﬁrms the presence of a ZYF homologue on the p arm
of chromosome 5 of the marsupial. Recalculation of the Z max statistic with
segment 5p omitted shows a second signiﬁcant site on region 1p. Further
analysis identiﬁes no other signiﬁcant regions.

4.5 The W Statistic
d

Another useful statistic is the number of categories W d having d or more
observations, where d is some ﬁxed positive integer. This statistic has mean
m
λ = µ i , where
i=1
n
n k n−k
µ i = p (1 − p i )
i
k
k=d
is the probability that the count of category i satisﬁes N i ≥ d. If the
variance of W d is close to λ, then as discussed in Problem 4, W d follows an
approximate Poisson distribution with mean λ [4].
As a supplement to this approximation, it is possible to compute the
distribution function Pr(W d ≤ j) recursively by adapting a technique of
Sandell [32]. Once this is done, the p-value of an experimental result w d
can be recovered via Pr(W d ≥ w d )= 1 − Pr(W d ≤ w d − 1). The recursive
scheme can be organized by deﬁning t j,k,l to be the probability that W d ≤ j,
given k trials and l categories. The indices j, k, and l are conﬁned to the
ranges 0 ≤ j ≤ w d − 1, 0 ≤ k ≤ n, and 1 ≤ l ≤ m. The l categories implicit
in t j,k,l refer to the ﬁrst l of the overall m categories; the ith of these l
categories is assigned the conditional probability p i /(p 1 + ··· + p l ).

77 78 79 80 81 82 83 84 85 86 87