Page 302 - Applied Probability
P. 302

13. Sequence Analysis
                              290
                              here is proportional to n. It is true that this problem is solved recursively,
                              but there are only on the order of log m recursion levels to consider. To
                                                                2
                              simplify matters further, we now count only the minima involved in the
                              various invocations of the Needleman-Wunsch algorithm without traceback.
                              With this proviso, we claim that the complexity c(m, n) of the space-saving
                              algorithm satisfies c(m, n) ≤ 2mn. This inequality is trivially true for m
                              equal to 1. For larger values of m we argue inductively that
                                                     m     m
                                        c(m, n)  ≤ n    + n   + c(m/2,j)+ c(m/2,n − j)
                                                      2    2
                                                ≤ mn + mj + m(n − j)                      (13.9)
                                                =2mn.
                              The occurrences of nm/2 in the first inequality in (13.9) account for the two
                              initial applications of the unadorned Needleman-Wunsch algorithm. The
                              remaining terms c(m/2,j) and c(m/2,n − j) reflect the recursive nature of
                              the space-saving algorithm.
                              13.6 Maximum Similarity Alignment


                              There is nothing sacred in aligning strings by minimum distance. In simi-
                              larity alignment, a relatively large positive score s(x, x) > 0 is assigned to
                              a pairwise match, a relatively small positive score or a negative score s(x, y)
                              to a mismatch x  = y, and a negative score s(x, −)= s(−,x)= −δ< 0
                                                                                   ∗
                                                                            ∗
                              to an indel. The overall similarity of an alignment x and y of strings x

                                                       ∗
                              and y is the sum   s(x ,y ), where again only a finite number of terms
                                                    ∗
                                                i   i  i
                              contribute owing to the convention s(−, −) = 0. The similarity between x
                              and y is now defined as

                                                 S(x, y)  =  max     s(x ,y )
                                                                        ∗
                                                                           ∗
                                                                        i
                                                                           i
                                                             (x ,y )
                                                              ∗
                                                                ∗
                                                                   i
                              Calculation of S(x, y) proceeds just as the case of minimum distance align-
                              ment except that in the recurrence (13.5) and its boundary conditions we
                              substitute max for min and s(x, y) for d(x, y). If we penalize longer gaps
                              proportionately less severely than single gaps, then we assign a negative
                              score h(k) to a gap of length k. Extension of the underlying recurrences
                              and their associated boundary conditions for these more complicated gap
                              scores is left to the reader.
                                It is of some interest to determine when minimum distance and maximum
                              similarity give the same best alignment. The next proposition bears on this
                              question.
                              Proposition 13.6.1 Suppose that the metric d(x, y) and similarity score
                              s(x, y) are related by s(x, y)= c − d(x, y) and the gap distance g(k) and
   297   298   299   300   301   302   303   304   305   306   307