Page 296 - Applied Probability
P. 296
13. Sequence Analysis
284
two alignments
-
GEN
S
P
GENOT Y
CS
I
E
GEN E T I - C E S
T
GENOT Y P E S
The first alignment makes slightly more sense linguistically, but the second
alignment displays more aligned letter matches. Note that both alignments
involve the blank character -. In molecular applications, blank characters
are called insertions or deletions, or simply indels if there is no need to
distinguish between them. It is very convenient to add the blank character
to any alphabet. If there is metric d(x, y) defined on the alphabet, we will
extend the metric so that d(−,x)= d(x, −)= δ> 0 for some constant δ.
It is also convenient to extend every finite string x =(x 1 ,...,x m )tobean
infinite string by appending an infinite number of blank characters to its
right end.
Aligning strings involves the insertion of a finite number of leading or in-
ternal blanks rather than an infinite number of trailing blanks. However, we
are not allowed to insert leading and internal blanks willy-nilly in aligning
two strings; overlapping inserted blanks − are forbidden. This restriction
−
implies that there are only a finite number of possible alignments of two
strings x =(x 1 ,... ,x m ) and y =(y 1 ,...,y n)with m and n letters, re-
spectively. Indeed, if the derived strings x and y provide an alignment of
∗
∗
x and y, then the nontrivial part of the alignment extends from position
1 to at most position m + n. These extreme alignments are achieved by
appending n leading blanks to x to create x and taking y = y or by
∗
∗
∗
appending m leading blanks to y to create y and taking x = x.
∗
To calculate the number of alignments f(m, n) between two given strings
x =(x 1 ,...,x m ) and y =(y 1 ,... ,y n ), we derive a recurrence relation
whose concomitant initial condition is
f(0,j)= f(i, 0) = 1 (13.2)
for all nonnegative integers i and j. The recurrence relation (13.3) below
stems from the simple observation that the nontrivial part of any alignment
ends with one of the three pairs x m , − ,or x m . The remainder of the
− y n y n
alignment to the left of one of these pairs constitutes a legitimate alignment
between two shorter strings. Hence,
f(m, n)= f(m − 1,n)+ f(m, n − 1) + f(m − 1,n − 1). (13.3)
This recurrence relation is difficult to solve, but we can easily show that it
grows extremely rapidly.
m+n m+n
Let g(m, n)= = . Then it is clear that the binomial coeffi-
m n
cient g(m, n) satisfies the same initial condition (13.2) as f(m, n) but the