Page 300 - Applied Probability
P. 300

13. Sequence Analysis
                              288
                              tions signal that the corresponding terms represent impossible conditions.
                              For example, E 0j = ∞ because it is impossible to end an alignment with

                               x 0
                                   when x is a null string.
                                −
                                These boundary values feed into the recurrences
                                     E ij  =min{ min {F i−k,j + g(k)}, min {G i−k,j + g(k)}}
                                                  1≤k≤i              1≤k≤i
                                     F ij  =min{ min {E i,j−k + g(k)}, min {G i,j−k + g(k)}} (13.6)
                                                  1≤k≤j              1≤k≤j
                                          =min{E i−1,j−1 ,F i−1,j−1 ,G i−1,j−1 } + d(x i ,y j ).
                                     G ij
                              In the first recurrence, the blank at the end of the alignment is part of
                              a larger gap involving k consecutive blanks. Overall the gap contributes

                              a gap distance g(k). Because the gap ends with  x i  , it must be flanked
                                                                           −

                              on the left by either  −  or  x i−k  . To the left of the gap, we take the
                                                   y j    y j
                              best possible alignment constructible from the partial strings (x 1 ,...,x i−k )
                              and (y 1 ,...,y j ) ending with the appropriate pair. The second recurrence
                              is proved similarly, and the third recurrence is even simpler because the
                              compatible alignments cannot end with a gap.
                                The computational complexity of algorithm (13.6) is of order
                                           m  n             n             m
                                          	 	              	    m +1     	    n +1
                                                (i + j)=               +
                                                                  2            2
                                          i=1 j=1          j=1           i=1
                                                                2
                                                                       2
                                                        = O(nm + mn ).
                              For an affine gap distance g(k)= α +(k − 1)β, the algorithm can be
                              revised so that it achieves the computational complexity O(mn) of the
                              original algorithm (13.5). The same boundary conditions apply, but now
                              the recurrences reduce to
                                       E ij  =min{E i−1,j + β, F i−1,j + α, G i−1,j + α}
                                       F ij  =min{E i,j−1 + α, F i,j−1 + β, G i,j−1 + α}  (13.7)
                                            =min{E i−1,j−1 ,F i−1,j−1 ,G i−1,j−1 } + d(x i ,y j ).
                                       G ij
                              The first of these recurrences is derived by considering the three possible

                              pairs  x i−1  ,  −  , and  x i−1  immediately preceding the end pair  x i  . The
                                     −     y j      y j                                 −
                              other recurrences are derived similarly.
                                Construction of a particular minimum distance alignment based on the
                              recurrences (13.7) again proceeds via a system of pointers with the twist
                              that a pointer in one matrix may point to an entry in another matrix. For
                              example, suppose in the first recurrence of (13.7) that the minimum term
                              is F i−1,j + α. Then the pointer at entry (i, j) of the E matrix points to
                              entry (i − 1,j) of the F matrix. In any event, starting with the minimum
                              term in D mn =min{E mn ,F mn ,G mn } and following the pointers back to a
                              boundary, one can trace out a minimum distance alignment.
   295   296   297   298   299   300   301   302   303   304   305