Page 297 - Applied Probability
P. 297
13. Sequence Analysis
285
simpler recurrence
g(m, n)= g(m − 1,n)+ g(m, n − 1)
(13.4)
based on Pascal’s combinatorial triangle. We now argue inductively that
f(m, n) ≥ g(m, n). The induction is primed by the identical initial con-
dition (13.2). Assuming the inequality holds for all pairs of nonnegative
integers (m, n)with m + n = k − 1 > 0 for some integer k − 1, we advance
the induction from k − 1to k by neglecting the third term f(m − 1,n − 1)
on the right of recurrence (13.3), substituting g(m − 1,n) for f(m − 1,n)
and g(m, n − 1) for f(m, n − 1), and finally invoking recurrence (13.4).
This proves the inequality for all pairs (m, n)with m + n = k. The size
of g(n, n) can be accurately estimated by application of Stirling’s formula
√ 1
n! ∼ 2πn n+ 2 e −n . A straightforward calculation shows that
2 2n
g(n, n) ∼ √ .
πn
13.4 Minimum Distance Alignment
To an alignment x and y of x and y, we assign the distance d(x ,y ).
∗
∗
∗
∗
i i i
Although this is an infinite sum, all but a finite number of terms are 0 owing
to the fact d(−, −) = 0. The Levenshtein distance between the original
strings x and y we now define as
∗
D(x, y) = min d(x ,y ),
∗
i
i
(x ,y )
∗
∗
i
∗
∗
where the pair (x , y ) ranges over all possible alignments. The symmetry
property (a) characterizing a metric follows directly from the definition of
D(x, y). The inequality D(x, y) ≥ 0 is also obvious. Equality holds when
x = y because d(x, y) is a metric. Strict inequality holds when x = y
∗
∗
because there are only a finite number of alignment pairs (x , y ). Thus,
property (b) of a metric is also true.
The validity of the triangle inequality (c) is less obvious. One way of
∗
∗
proving it is to think of computing the alignment distance d(x ,y )by
i i i
transforming x into y by a sequence of steps, each step carrying its own
penalty. Consider the example of converting the word INDUSTRY into the
word INTEREST [5] in the following listing
IND U S TRY
IND U S TR -
IND U S T - -
IN R U S T - -
IN R E S T - -
IN T R E S T -
IN T E R E S T