Page 82 - Applied Probability
P. 82

4. Hypothesis Testing and Categorical Data
                              chromosomes are highly contracted and can be distinguished on the basis
                              of size, position of their centromeres, and characteristic banding patterns.
                              To map a probe, the DNA trapped within a metaphase spread on a micro-
                              scope slide is denatured in situ and hybridized with a tritium-labeled or
                              fluorescent-labeled probe. A photographic emulsion immediately above the
                              spread registers the presence of the probe on one or more chromosomes. 65
                                When a human probe is hybridized to chromosomes of another mam-
                              malian species, the probe and corresponding conserved sequence on the
                              mammalian chromosome may be sufficiently different that the hybridiza-
                              tion signal is weak. In such cases the probe can appear to hybridize pref-
                              erentially to several different chromosomal regions. To pick out the real
                              peaks of hybridization from purely random peaks, Ewens et al. [12] apply
                              the Z max test. Table 4.2 reproduces their data on the hybridization of the
                              human ZYF probe, a zinc finger protein probe on the Y chromosome, to
                              homologous regions of the chromosomes of the Australian marsupial Macro-
                              pus eugenii. Fourteen chromosomal segments and 279 hybridization events
                              appear in the table. The observed z max statistic of 7.030 is significant at
                              the .001 level and confirms the presence of a ZYF homologue on the p arm
                              of chromosome 5 of the marsupial. Recalculation of the Z max statistic with
                              segment 5p omitted shows a second significant site on region 1p. Further
                              analysis identifies no other significant regions.



                              4.5 The W Statistic
                                             d

                              Another useful statistic is the number of categories W d having d or more
                              observations, where d is some fixed positive integer. This statistic has mean
                                    m
                              λ =      µ i , where
                                    i=1
                                                          n
                                                         	    n  k       n−k
                                                 µ i  =         p (1 − p i )
                                                                 i
                                                              k
                                                         k=d
                              is the probability that the count of category i satisfies N i ≥ d. If the
                              variance of W d is close to λ, then as discussed in Problem 4, W d follows an
                              approximate Poisson distribution with mean λ [4].
                                As a supplement to this approximation, it is possible to compute the
                              distribution function Pr(W d ≤ j) recursively by adapting a technique of
                              Sandell [32]. Once this is done, the p-value of an experimental result w d
                              can be recovered via Pr(W d ≥ w d )= 1 − Pr(W d ≤ w d − 1). The recursive
                              scheme can be organized by defining t j,k,l to be the probability that W d ≤ j,
                              given k trials and l categories. The indices j, k, and l are confined to the
                              ranges 0 ≤ j ≤ w d − 1, 0 ≤ k ≤ n, and 1 ≤ l ≤ m. The l categories implicit
                              in t j,k,l refer to the first l of the overall m categories; the ith of these l
                              categories is assigned the conditional probability p i /(p 1 + ··· + p l ).
   77   78   79   80   81   82   83   84   85   86   87