Page 278 -
P. 278

266    6 Structural Pattern Recoenition


                               edit operations that are required to transform the string x into the string y. The most
                               common edit operations are:

                               1. Substitution of a symbol a in x by a symbol h in y;
                               2.  Insertion of a symbol a in y;
                               3. Deletion of a symbol a in x.

                                 We denote these operations as follows:







                                 where A represents the null string.
                                 Consider, for example, T = (h, d, u], x = duuh and y = hdhuh. String x can be
                               transformed into y as follows:
                               - insert h in x: hduuh;
                               - substitute u in x by h in y: hdhuh.

                                 Another alternative is:

                               - substitute din x by h in y: huuh;
                               - insert din x: hduuh;
                               - substitute u in x by h in y: hdhuh.

                                 In order to define a matching distance between x and y, we start by  associating
                               costs  with  the edit  operations: c(ei), where  ei is  any  one  of  the edit  operations.
                               Next,  we  compute  the  total  cost  of  a  sequence,  s, of  k  edit  operations  that
                               transforms x into y, as:






                                 The dissimilarity distance between x and y is defined as the minimum cost of all
                               sequences of edit operations that transform x into y:

                                  d(x, y)= min{c(s), s transforms x into y] .                (6-20)

                                 An  algorithm for computing this so-called Levenshtein distance is based on the
                               enumeration in matrix form of the computation of all sequences of edit operations
                               that transform x into y.
                                 First, a matrix D(i, j) with (n+l) by (m+l) elements is built, as shown in Figure
                               6.17. In this matrix, any path from D(0,O) to D(n, m) represents a sequence of edit
                               operations transforming x into y. The top row of  the matrix is labelled with the y
   273   274   275   276   277   278   279   280   281   282   283