Page 319 - Applied Probability
P. 319

14. Poisson Approximation
                              308
                                                       2
                                                 2
                                                λ (m+1)
                                                           1
                                                          .
                              It follows that b 1
                                                   n
                                                          n
                                Evaluation of the constant b 2 is more difficult. Consider β ∈ B α such
                              that |β ∩ α| = k for 0 <k <m + 1. Let U (1) and U (k) be the positions of
                              the first and last of the k common points shared by β and α. If we condition
                              on the values U (1) = u (1) and U (k) = u (k) , then the indicator variables X α
                              and X β are independent and identically distributed. Hence,
                                              Pr(X α =1,X β =1 | U (1) = u (1) ,U (k) = u (k) )
                                                                              2
                                          =Pr(X α =1 | U (1) = u (1) ,U (k) = u (k) ) .
                              In order that X α = 1, the m+1−k remaining points in α must be within a
                              distance s to the left of u (1) and a distance s to the right of u (k) . It follows
                              that Pr(X α =1 | U (1) = u (1) ,U (k) = u (k) ) ≤ (3s) m+1−k  for u (k) − u (1) ≤ s.
                                This uniform bound yields a crude upper bound on p αβ if combined with
                              the probability that U (k) − U (1) ≤ s. This latter probability is
                                            Pr(U (k) − U (1) ≤ s)= ks k−1  − (k − 1)s k
                              for exactly the same reasons that produced equality (14.5). Thus, we can
                                                                k
                              assert that p αβ ≤ q k =[ks k−1 −(k−1)s ](3s) 2[m+1−k]  whenever |β∩α| = k.
                                              m+1    n−m−1
                              Since there are             such collections β for every α,
                                              k   m+1−k
                                                  m
                                                  	     n     m +1     n − m − 1
                                          b 2  ≤                                 q k .    (14.6)
                                                      m +1      k      m +1 − k
                                                  k=1
                                The upper bound (14.6) is hard to evaluate explicitly, but its dominant
                              contribution occurs when k = m. Indeed, because of the asymptotic rela-
                              tions n m+1 m    1 and q k   s 2m−k+1 , the kth term of the sum satisfies
                                        s
                                      n     m +1    n − m − 1          2m−k+2 2m−k+1

                                                              q k    n       s
                                    m +1      k     m +1 − k
                                                                                (2m−k+1)(m+1)
                                                                    n  2m−k+2 −      m
                                                                             n
                                                                       k−m−1
                                                                  = n    m  .
                              Thus, the dominant term of the sum is of order n −1/m , and for sufficiently
                              large n and sufficiently small s, the Poisson approximation Pr(S =0) ≈ e −λ
                              applies. The slow rate O(n −1/m ) of convergence of the total variation dis-
                              tance to 0 is rather disappointing in this example. The theoretical argu-
                              ments and numerical evidence presented by Glaz [9] and Roos [16] suggest
                              that a compound Poisson approximation performs better.
                              14.7 DNA Sequence Matching

                              A basic problem in DNA sequence analysis is to test whether two different
                              sequences share significant similarities. Strong similarity or homology often
   314   315   316   317   318   319   320   321   322   323   324