Page 280 -
P. 280

268    6 Structural Pattern Recognition


                           1. c(a-+  b)2Oforany a, b;
                           2. c(a + b) = 0 if and only if a = b;
                           3. c(a -+ b) 5 c(h -+ b) + c(a -+ h) for any a, b;
                           4.  c(a -+ b)= c(b -+ a) for any a, b.


                             In practice, even if  these conditions are not satisfied, we still use d(x, y) as the
                           distance between x and y. In order to classify patterns described by strings, we can
                           now use the k-nearest neighbour classification approach, described in section 4.3.2,
                           for assigning patterns to classes, given a training set of patterns.
                             The basic algorithm for computing the Levenshtein distance can be modified in
                           order to cope with  a variety of  aspects, such as context dependent costs or string
                           attributes  (see  Bunke,  1990b,  for  details).  Let  us  consider  this  last  aspect,  of
                           matching strings x and y with d-dimensional attribute vectors a and b, respectively.
                           We define a new dissimilarity distance between x and y in the following way:




                           where d,(x, y) is the previous dissimilarity distance and d,(x,  y) is the dissimilarity
                           distance relative to the attributes.
                             This distance takes account of the attribute differences for the different symbols
                           as:

                                                                                        (6-2 la)


                           where Ixri - ysil takes account of attribute differences for homologous attributes and
                           wi is a suitably chosen weight.
                             The summation is taken for all x,  symbols that  substitute different y,  symbols.
                           As a matter of fact, this rule can be implemented by inserting an attribute cost into
                           steps 1-3 of the basic algorithm. A particular implementation of this procedure for
                           line segment primitives with length, I, and orientation angle, aattributes, is:







                              where I,  and D,  are the usual insertion and deletion costs. The length attribute is
                           divided by the string length, for the purpose of normalization. Function  f takes care
                           of the difference in angular attributes.
                              Let us see how this applies to the very  simple situation shown in Figure 6.18,
                           where  the only line segments used  are horizontal  and  vertical. Instead of  taking
                           into account an orientation attribute, we use a PDL approach so that the 3 symbols
                           u, h, d with length attribute are used. As the only structural operator used is +, we
                           can neglect it when computing distances.
                              From the results shown in Figure 6.18, obtained with the algorithm implemented
                           in  the  String Match.xls  file,  some  difficulties  of  this  approach  are  visible. For
   275   276   277   278   279   280   281   282   283   284   285