Page 279 -
P. 279

6.4 Structural Matchinn   267

                           symbols, and the first column with the symbols of x, with the seed element D(0, 0)
                           = 0 representing a situation where no edit operations have been  performed. From
                           this  seed element, the algorithm proceeds to  fill up  the matrix by  examining the
                           following three predecessors of D(i, j) (see Figure 6.17):
                            1.  D(i, j-1):  Progressing  from  D(i, j-1)  to  D(i, j)  means  the  insertion  of  the
                               corresponding top symbol of y into x, with a cost of Dl = D(i, j- 1) + c(3L + a).
                           2.  D(i-1, j):  Progressing  from  D(i-1, j)  to  D(ij) means  the  deletion  of  the
                               corresponding left symbol of x, with a cost of 02 = D(i-1, j) + c(a + A).
                            3.  D(i-1, j-1): Progressing from D(i-1, j-1) to D(i, j) means the substitution of the
                               corresponding left symbol of  x by  the corresponding top symbol of y, with a
                               cost of 03 = D(i-1, j-1) + c(a -+ b). If both symbols are equal, the cost of this
                               operation is zero.

                              The minimum of Dl, 02 and 03 is assigned to D(i, j).



                                                                   Insert
                                                       Y











                            Figure  6.17.  Computation of  the  Levenshtein distance  between  x  and  y,  using
                            unitary costs for all elementary edit operations. The dotted line corresponds to the
                            minimum cost path.



                              Let  us  use Figure 6.17 to  exemplify  this  procedure, assuming, for simplicity,
                            unitary costs for all edit operations. Applying rules 1  and 2 immediately fills up the
                            top  row  and  left column. Now  consider D(1, 1).  It can be reached from D(O, 0),
                            D(0,  I) and D(1, O), adding a cost of one to any of  these transitions. If  we now
                            consider D(3, 4), we find a minimum cost for the substitution, since both symbols
                            are equal, therefore we obtain D(3,4) = D(2,3).
                              As  shown  in  Figure  6.17  the  minimum  cost  of  transforming  x  into  y  is  2,
                            representing the two previously mentioned edit operations as the first alternative to
                            transforming x into y. The path corresponding to this sequence of operations is also
                            shown in Figure 6.17.
                              In  order  for  the  Levenshtein distance to  have  the  properties of  a  metric, the
                            following non-stringent conditions must be satisfied:
   274   275   276   277   278   279   280   281   282   283   284