Page 278 -
P. 278
266 6 Structural Pattern Recoenition
edit operations that are required to transform the string x into the string y. The most
common edit operations are:
1. Substitution of a symbol a in x by a symbol h in y;
2. Insertion of a symbol a in y;
3. Deletion of a symbol a in x.
We denote these operations as follows:
where A represents the null string.
Consider, for example, T = (h, d, u], x = duuh and y = hdhuh. String x can be
transformed into y as follows:
- insert h in x: hduuh;
- substitute u in x by h in y: hdhuh.
Another alternative is:
- substitute din x by h in y: huuh;
- insert din x: hduuh;
- substitute u in x by h in y: hdhuh.
In order to define a matching distance between x and y, we start by associating
costs with the edit operations: c(ei), where ei is any one of the edit operations.
Next, we compute the total cost of a sequence, s, of k edit operations that
transforms x into y, as:
The dissimilarity distance between x and y is defined as the minimum cost of all
sequences of edit operations that transform x into y:
d(x, y)= min{c(s), s transforms x into y] . (6-20)
An algorithm for computing this so-called Levenshtein distance is based on the
enumeration in matrix form of the computation of all sequences of edit operations
that transform x into y.
First, a matrix D(i, j) with (n+l) by (m+l) elements is built, as shown in Figure
6.17. In this matrix, any path from D(0,O) to D(n, m) represents a sequence of edit
operations transforming x into y. The top row of the matrix is labelled with the y