Page 307 - Applied Probability
P. 307

13. Sequence Analysis
                                                                                            295
                                                                       m
                                                                    p R s
                                                    F(s)
                                                          =
                                                                 m
                                                              p R s +(1 − s)Q(s)
                                                              Q(1)
                                                          =
                                                    F (1)
                                                                  ,
                                                               p R
                                                         n
                                   where Q(s)=    m−1    q n s . Note that F (1) is the expected time be-

                                                  n=0
                                   tween renewals. Explicitly calculate F (1) for the pattern GCGC.

                                   (Hints: To demonstrate equation (13.11), condition on the most re-
                                   cent renewal of the pattern. In proving equation (13.12), multiply
                                                      n
                                   equation (13.11) by s and sum on n. You will need to use the fact
                                   u j = 0 for 1 ≤ j ≤ m − 1.)
                                 5. Let x =(x 1 ,...,x m ) and y =(y 1 ,... ,y n ) be two strings such that
                                   m ≥ n. Show that:
                                                                             m

                                    (a) The number of alignments of length m is  .
                                                                             n
                                                                                 n
                                                                                     m+2
                                    (b) The number of alignments of length m +2 is      .
                                                                                 2   n
                                                                                       n
                                                                                           m+k
                                    (c) In general, the number of alignments of length m+k is  .
                                                                                       k   n
                                                                           n
                                                                             m+k
                                    (d) The total number of alignments is         .
                                                                        k  k   n

                                   Note that we distinguish the double pairs  x i −  and  −x i  .
                                                                         −y j     y j −

                                 6. If we refuse to distinguish the double pairs  x i −  and  −x i  in an
                                                                            −y j      y j −
                                   alignment between two strings x =(x 1 ,... ,x m ) and y =(y 1 ,...,y n),
                                                                                    m+n
                                   then show that the number of alignments reduces to   . (Hints:
                                                                                    m

                                   Argue that an alignment is completely determined by what pairs  x i
                                                                                             y j
                                                                                     n
                                                                    m
                                                                       n
                                                                                 m          m+n
                                   occur within it. Hence, there are     =             =
                                                                 k  k  k     k  m−k  k      m
                                   possible alignments.)
                                 7. A gap penalty g(k) ≥ 0 is said to be subadditive if g(k) ≤ kg(1)
                                   for all k ≥ 0 and concave if g(k +1) − g(k) ≥ g(k +2) − g(k +1)
                                   for all k ≥ 0. Show that a concave gap penalty g(k)with g(0) = 0
                                   is subadditive and that the particular choice g(k)= α ln(k +1) is
                                   concave for α ≥ 0. (Hint: Any concave function on the nonnegative
                                   integers can be extended to a concave function on [0, ∞) by linear
                                   interpolation. Subadditivity is then geometrically obvious.)
                                 8. Continuing Problem 7, let f(l) be a function on the nonnegative in-
                                   tegers, and define a function g k (l)= f(k + l) − f(k) for every integer
                                   k ≥ 0. Show that f(l) is concave if and only if each g k (l) is subaddi-
                                   tive.
                                 9. Consider a word of length n chosen from the four-letter DNA alphabet
                                   {A,T,C,G}. If letters are chosen for each position independently with
                                   probabilities p A, p T , p C , and p G , respectively, then let R A  be the
                                                                                       n
   302   303   304   305   306   307   308   309   310   311   312