Page 314 - Applied Probability
P. 314

303
                                                                  14. Poisson Approximation
                              redundancy of a panel is desirable. In practice, the chromosome constitu-
                              tion of a clone cannot be predicted in advance, and the level of redundancy
                              is random. Minimum Hamming distance is a natural measure of the redun-
                              dancy of a panel. The Hamming distance ρ(c s ,c t ) of two columns c s and
                              c t counts the number of entries in which they differ. The minimum Ham-
                              ming distance of a panel is obviously defined as min {s,t} ρ(c s ,c t ), where
                              {s, t} ranges over all pairs of columns from the panel.
                                When somatic cell hybrid panels are randomly created, it is reasonable to
                              make three assumptions. First, each human chromosome is lost or retained
                              independently during the formation of a stable clone. Second, there is a
                              common retention probability p applying to all chromosome pairs. This
                              means that at least one member of each pair of homologous chromosomes
                              is retained with probability p. Rushton [17] estimates a range of p from .07
                              to .75. The value p =  1  simplifies our theory considerably. Third, different
                                                  2
                              clones behave independently in their retention patterns.
                                                                                  n
                                Now denote column s of a random panel of n clones by C . For any two
                                                                                  s
                                                      n
                                               n
                              distinct columns C and C , define X n  to be the indicator of the event
                                               s
                                                      t
                                                                {s,t}
                                     n
                                  n
                              ρ(C ,C ) <d, where d is some fixed Hamming distance. The random
                                  s  t
                              variable Y  n  =    X n   is 0 precisely when the minimum Hamming
                                       d      {s,t}  {s,t}
                                                                   23

                              distance equals or exceeds d. There are  2  pairs α = {s, t} in the index
                                                             n
                              set I, and each of the associated X has the same mean
                                                             α
                                                         d−1
                                                         	    n   i     n−i
                                                  p α  =         q (1 − q)  ,
                                                              i
                                                          i=0
                                                                      n
                                                                             n
                              where q =2p(1 − p) is the probability that C and C differ in any entry.
                                                                      s
                                                                             t
                                                             23
                              This gives the mean of Y d n  as λ =      p α .
                                                              2
                                The Chen-Stein heuristic suggests estimating Pr(Y d n  > 0) by the Poisson
                              tail probability 1 − e −λ . The error bound (14.3) on this approximation can
                              be computed by defining the neighborhoods B α = {β : |β| =2,β ∩α  = ∅},
                              where vertical bars enclosing a set indicate the number of elements in the
                              set. It is clear that X n  is independent of those X n  with β outside B α .
                                                  α                         β
                                                                 23
                                                                        2
                              The Chen-Stein constant b 1 reduces to  |B α |p . An elementary counting
                                                                 2      α
                              argument shows that

                                                         23     21
                                                 |B α | =    −      =43.
                                                         2       2
                              Since the joint probability p αβ does not depend on the particular pair
                                                                    23

                              β ∈ B α \{α} chosen, the constant b 2 is  2  (|B α |− 1)p αβ . Fortunately,
                              p αβ = p 2 α  when p =1/2. Indeed, by conditioning on the value of the
                              common column shared by α and β, it is obvious in this special case that
                                                     n
                                          n
                              the events X = 1 and X = 1 are independent and occur with constant
                                                     β
                                          α
                              probability p α . The case p  =1/2 is more subtle, and we defer the details
                              of computing p αβ to Problem 8. Table 14.1 provides some representative
   309   310   311   312   313   314   315   316   317   318   319