Page 302 - Applied Probability
P. 302
13. Sequence Analysis
290
here is proportional to n. It is true that this problem is solved recursively,
but there are only on the order of log m recursion levels to consider. To
2
simplify matters further, we now count only the minima involved in the
various invocations of the Needleman-Wunsch algorithm without traceback.
With this proviso, we claim that the complexity c(m, n) of the space-saving
algorithm satisfies c(m, n) ≤ 2mn. This inequality is trivially true for m
equal to 1. For larger values of m we argue inductively that
m m
c(m, n) ≤ n + n + c(m/2,j)+ c(m/2,n − j)
2 2
≤ mn + mj + m(n − j) (13.9)
=2mn.
The occurrences of nm/2 in the first inequality in (13.9) account for the two
initial applications of the unadorned Needleman-Wunsch algorithm. The
remaining terms c(m/2,j) and c(m/2,n − j) reflect the recursive nature of
the space-saving algorithm.
13.6 Maximum Similarity Alignment
There is nothing sacred in aligning strings by minimum distance. In simi-
larity alignment, a relatively large positive score s(x, x) > 0 is assigned to
a pairwise match, a relatively small positive score or a negative score s(x, y)
to a mismatch x = y, and a negative score s(x, −)= s(−,x)= −δ< 0
∗
∗
to an indel. The overall similarity of an alignment x and y of strings x
∗
and y is the sum s(x ,y ), where again only a finite number of terms
∗
i i i
contribute owing to the convention s(−, −) = 0. The similarity between x
and y is now defined as
S(x, y) = max s(x ,y )
∗
∗
i
i
(x ,y )
∗
∗
i
Calculation of S(x, y) proceeds just as the case of minimum distance align-
ment except that in the recurrence (13.5) and its boundary conditions we
substitute max for min and s(x, y) for d(x, y). If we penalize longer gaps
proportionately less severely than single gaps, then we assign a negative
score h(k) to a gap of length k. Extension of the underlying recurrences
and their associated boundary conditions for these more complicated gap
scores is left to the reader.
It is of some interest to determine when minimum distance and maximum
similarity give the same best alignment. The next proposition bears on this
question.
Proposition 13.6.1 Suppose that the metric d(x, y) and similarity score
s(x, y) are related by s(x, y)= c − d(x, y) and the gap distance g(k) and