Page 300 - Applied Probability
P. 300
13. Sequence Analysis
288
tions signal that the corresponding terms represent impossible conditions.
For example, E 0j = ∞ because it is impossible to end an alignment with
x 0
when x is a null string.
−
These boundary values feed into the recurrences
E ij =min{ min {F i−k,j + g(k)}, min {G i−k,j + g(k)}}
1≤k≤i 1≤k≤i
F ij =min{ min {E i,j−k + g(k)}, min {G i,j−k + g(k)}} (13.6)
1≤k≤j 1≤k≤j
=min{E i−1,j−1 ,F i−1,j−1 ,G i−1,j−1 } + d(x i ,y j ).
G ij
In the first recurrence, the blank at the end of the alignment is part of
a larger gap involving k consecutive blanks. Overall the gap contributes
a gap distance g(k). Because the gap ends with x i , it must be flanked
−
on the left by either − or x i−k . To the left of the gap, we take the
y j y j
best possible alignment constructible from the partial strings (x 1 ,...,x i−k )
and (y 1 ,...,y j ) ending with the appropriate pair. The second recurrence
is proved similarly, and the third recurrence is even simpler because the
compatible alignments cannot end with a gap.
The computational complexity of algorithm (13.6) is of order
m n n m
m +1 n +1
(i + j)= +
2 2
i=1 j=1 j=1 i=1
2
2
= O(nm + mn ).
For an affine gap distance g(k)= α +(k − 1)β, the algorithm can be
revised so that it achieves the computational complexity O(mn) of the
original algorithm (13.5). The same boundary conditions apply, but now
the recurrences reduce to
E ij =min{E i−1,j + β, F i−1,j + α, G i−1,j + α}
F ij =min{E i,j−1 + α, F i,j−1 + β, G i,j−1 + α} (13.7)
=min{E i−1,j−1 ,F i−1,j−1 ,G i−1,j−1 } + d(x i ,y j ).
G ij
The first of these recurrences is derived by considering the three possible
pairs x i−1 , − , and x i−1 immediately preceding the end pair x i . The
− y j y j −
other recurrences are derived similarly.
Construction of a particular minimum distance alignment based on the
recurrences (13.7) again proceeds via a system of pointers with the twist
that a pointer in one matrix may point to an entry in another matrix. For
example, suppose in the first recurrence of (13.7) that the minimum term
is F i−1,j + α. Then the pointer at entry (i, j) of the E matrix points to
entry (i − 1,j) of the F matrix. In any event, starting with the minimum
term in D mn =min{E mn ,F mn ,G mn } and following the pointers back to a
boundary, one can trace out a minimum distance alignment.