Page 301 - Applied Probability
P. 301
289
13. Sequence Analysis
13.5 Parallel Processing and Memory Reduction
Computational complexity is not the only criterion for evaluating an algo-
rithm. Demands on memory and the potential for parallel processing are
also important. Proper implementation of the Needleman-Wunsch algo-
rithm leads to substantial improvements in both regards. Parallel process-
ing cannot be achieved by proceeding row by row or column by column in
the dynamic programming matrix, but it can be achieved by proceeding
along parallel diagonal lines. With a proportional gap penalty g(k)= kδ,
suppose we know the entries D i,j for all combinations (i, j)with i+j ≤ m.
Then each entry D i,j along the diagonal i + j = m + 1 is determined
independently via equation (13.5) from three entries in the two previous
diagonal lines i + j = m and i + j = m − 1. A multiprocessor machine with
shared memory could easily exploit this fact.
Memory reduction is more subtle. As presented, the Needleman-Wunsch
algorithm requires storing a pointer at each of mn cells. If it were not for
this storage requirement, then we could proceed row by row (or column by
column), discarding rows (or columns) as we go. Discarding already visited
rows or columns reduces the memory requirement to min{m, n}. It is possi-
ble to tweak the Needleman-Wunsch algorithm in a manner that retains this
storage advantage, preserves the traceback property, and merely doubles
the overall computational complexity. For the sake of convenience, assume
m
a proportional gap penalty and choose i = . Any optimal alignment of
2
x against y matches the letter x i with either a letter y j or a −. In the former
case, we perform the Needleman-Wunsch algorithm without traceback sep-
arately on the left substring (x 1 ,...,x i−1 ) and the full string y, and on the
right substring (x i+1 ,...,x m ) and the full string y. For the left substring of
(x 1 ,... ,x i−1 ), let the optimal alignment ending with the pair (i − 1,j − 1)
have distance L i−1,j−1 , and for the right substring (x i+1 ,...,x m ), let the
optimal alignment beginning with the pair (i +1,j + 1) have distance
R(i +1,j + 1). (It is trivial to modify the Needleman-Wunsch algorithm to
give optimal subalignments beginning with index pairs rather than ending
with index pairs.) If we match x i against −, then the pertinent left distance
for each j is L i−1,j rather than L i−1,j−1 . The best index j to match with
i solves the problem
min{L i−1,j−1 + d(x i ,y j )+ R i+1,j+1 ,L i−1,j + d(x i , −)+ R i+1,j+1 }.(13.8)
j
Once we identify j, we can fill out the rest of the optimal match by recur-
sively applying our modified algorithm to the pair of left x and y substrings
and to the pair of right x and y substrings.
Although it is clear that the maximum storage required by this scheme is
m+n, we must make certain the computational complexity does not balloon
out of control. One thing we can dismiss right off is the depicted additions
and minima involved in finding the solution to (13.8). The amount of work