Page 82 - Applied Probability
P. 82
4. Hypothesis Testing and Categorical Data
chromosomes are highly contracted and can be distinguished on the basis
of size, position of their centromeres, and characteristic banding patterns.
To map a probe, the DNA trapped within a metaphase spread on a micro-
scope slide is denatured in situ and hybridized with a tritium-labeled or
fluorescent-labeled probe. A photographic emulsion immediately above the
spread registers the presence of the probe on one or more chromosomes. 65
When a human probe is hybridized to chromosomes of another mam-
malian species, the probe and corresponding conserved sequence on the
mammalian chromosome may be sufficiently different that the hybridiza-
tion signal is weak. In such cases the probe can appear to hybridize pref-
erentially to several different chromosomal regions. To pick out the real
peaks of hybridization from purely random peaks, Ewens et al. [12] apply
the Z max test. Table 4.2 reproduces their data on the hybridization of the
human ZYF probe, a zinc finger protein probe on the Y chromosome, to
homologous regions of the chromosomes of the Australian marsupial Macro-
pus eugenii. Fourteen chromosomal segments and 279 hybridization events
appear in the table. The observed z max statistic of 7.030 is significant at
the .001 level and confirms the presence of a ZYF homologue on the p arm
of chromosome 5 of the marsupial. Recalculation of the Z max statistic with
segment 5p omitted shows a second significant site on region 1p. Further
analysis identifies no other significant regions.
4.5 The W Statistic
d
Another useful statistic is the number of categories W d having d or more
observations, where d is some fixed positive integer. This statistic has mean
m
λ = µ i , where
i=1
n
n k n−k
µ i = p (1 − p i )
i
k
k=d
is the probability that the count of category i satisfies N i ≥ d. If the
variance of W d is close to λ, then as discussed in Problem 4, W d follows an
approximate Poisson distribution with mean λ [4].
As a supplement to this approximation, it is possible to compute the
distribution function Pr(W d ≤ j) recursively by adapting a technique of
Sandell [32]. Once this is done, the p-value of an experimental result w d
can be recovered via Pr(W d ≥ w d )= 1 − Pr(W d ≤ w d − 1). The recursive
scheme can be organized by defining t j,k,l to be the probability that W d ≤ j,
given k trials and l categories. The indices j, k, and l are confined to the
ranges 0 ≤ j ≤ w d − 1, 0 ≤ k ≤ n, and 1 ≤ l ≤ m. The l categories implicit
in t j,k,l refer to the first l of the overall m categories; the ith of these l
categories is assigned the conditional probability p i /(p 1 + ··· + p l ).