Page 296 - Applied Probability

P. 296

13. Sequence Analysis
284
two alignments
-
GEN
S
P
GENOT Y
CS
I
E
GEN E T I - C E S
T
GENOT Y P E S
The ﬁrst alignment makes slightly more sense linguistically, but the second
alignment displays more aligned letter matches. Note that both alignments
involve the blank character -. In molecular applications, blank characters
are called insertions or deletions, or simply indels if there is no need to
distinguish between them. It is very convenient to add the blank character
to any alphabet. If there is metric d(x, y) deﬁned on the alphabet, we will
extend the metric so that d(−,x)= d(x, −)= δ> 0 for some constant δ.
It is also convenient to extend every ﬁnite string x =(x 1 ,...,x m )tobean
inﬁnite string by appending an inﬁnite number of blank characters to its
right end.
Aligning strings involves the insertion of a ﬁnite number of leading or in-
ternal blanks rather than an inﬁnite number of trailing blanks. However, we
are not allowed to insert leading and internal blanks willy-nilly in aligning

two strings; overlapping inserted blanks − are forbidden. This restriction
−
implies that there are only a ﬁnite number of possible alignments of two
strings x =(x 1 ,... ,x m ) and y =(y 1 ,...,y n)with m and n letters, re-
spectively. Indeed, if the derived strings x and y provide an alignment of
∗
∗
x and y, then the nontrivial part of the alignment extends from position
1 to at most position m + n. These extreme alignments are achieved by
appending n leading blanks to x to create x and taking y = y or by
∗
∗
∗
appending m leading blanks to y to create y and taking x = x.
∗
To calculate the number of alignments f(m, n) between two given strings
x =(x 1 ,...,x m ) and y =(y 1 ,... ,y n ), we derive a recurrence relation
whose concomitant initial condition is
f(0,j)= f(i, 0) = 1 (13.2)
for all nonnegative integers i and j. The recurrence relation (13.3) below
stems from the simple observation that the nontrivial part of any alignment

ends with one of the three pairs x m , − ,or x m . The remainder of the
− y n y n
alignment to the left of one of these pairs constitutes a legitimate alignment
between two shorter strings. Hence,
f(m, n)= f(m − 1,n)+ f(m, n − 1) + f(m − 1,n − 1). (13.3)
This recurrence relation is diﬃcult to solve, but we can easily show that it
grows extremely rapidly.
m+n m+n
Let g(m, n)= = . Then it is clear that the binomial coeﬃ-
m n
cient g(m, n) satisﬁes the same initial condition (13.2) as f(m, n) but the

291 292 293 294 295 296 297 298 299 300 301