Page 302 - Applied Probability

P. 302

13. Sequence Analysis
290
here is proportional to n. It is true that this problem is solved recursively,
but there are only on the order of log m recursion levels to consider. To
2
simplify matters further, we now count only the minima involved in the
various invocations of the Needleman-Wunsch algorithm without traceback.
With this proviso, we claim that the complexity c(m, n) of the space-saving
algorithm satisﬁes c(m, n) ≤ 2mn. This inequality is trivially true for m
equal to 1. For larger values of m we argue inductively that
m m
c(m, n) ≤ n + n + c(m/2,j)+ c(m/2,n − j)
2 2
≤ mn + mj + m(n − j) (13.9)
=2mn.
The occurrences of nm/2 in the ﬁrst inequality in (13.9) account for the two
initial applications of the unadorned Needleman-Wunsch algorithm. The
remaining terms c(m/2,j) and c(m/2,n − j) reﬂect the recursive nature of
the space-saving algorithm.
13.6 Maximum Similarity Alignment

There is nothing sacred in aligning strings by minimum distance. In simi-
larity alignment, a relatively large positive score s(x, x) > 0 is assigned to
a pairwise match, a relatively small positive score or a negative score s(x, y)
to a mismatch x = y, and a negative score s(x, −)= s(−,x)= −δ< 0
∗
∗
to an indel. The overall similarity of an alignment x and y of strings x

∗
and y is the sum s(x ,y ), where again only a ﬁnite number of terms
∗
i i i
contribute owing to the convention s(−, −) = 0. The similarity between x
and y is now deﬁned as

S(x, y) = max s(x ,y )
∗
∗
i
i
(x ,y )
∗
∗
i
Calculation of S(x, y) proceeds just as the case of minimum distance align-
ment except that in the recurrence (13.5) and its boundary conditions we
substitute max for min and s(x, y) for d(x, y). If we penalize longer gaps
proportionately less severely than single gaps, then we assign a negative
score h(k) to a gap of length k. Extension of the underlying recurrences
and their associated boundary conditions for these more complicated gap
scores is left to the reader.
It is of some interest to determine when minimum distance and maximum
similarity give the same best alignment. The next proposition bears on this
question.
Proposition 13.6.1 Suppose that the metric d(x, y) and similarity score
s(x, y) are related by s(x, y)= c − d(x, y) and the gap distance g(k) and

297 298 299 300 301 302 303 304 305 306 307