Page 280 -
P. 280
268 6 Structural Pattern Recognition
1. c(a-+ b)2Oforany a, b;
2. c(a + b) = 0 if and only if a = b;
3. c(a -+ b) 5 c(h -+ b) + c(a -+ h) for any a, b;
4. c(a -+ b)= c(b -+ a) for any a, b.
In practice, even if these conditions are not satisfied, we still use d(x, y) as the
distance between x and y. In order to classify patterns described by strings, we can
now use the k-nearest neighbour classification approach, described in section 4.3.2,
for assigning patterns to classes, given a training set of patterns.
The basic algorithm for computing the Levenshtein distance can be modified in
order to cope with a variety of aspects, such as context dependent costs or string
attributes (see Bunke, 1990b, for details). Let us consider this last aspect, of
matching strings x and y with d-dimensional attribute vectors a and b, respectively.
We define a new dissimilarity distance between x and y in the following way:
where d,(x, y) is the previous dissimilarity distance and d,(x, y) is the dissimilarity
distance relative to the attributes.
This distance takes account of the attribute differences for the different symbols
as:
(6-2 la)
where Ixri - ysil takes account of attribute differences for homologous attributes and
wi is a suitably chosen weight.
The summation is taken for all x, symbols that substitute different y, symbols.
As a matter of fact, this rule can be implemented by inserting an attribute cost into
steps 1-3 of the basic algorithm. A particular implementation of this procedure for
line segment primitives with length, I, and orientation angle, aattributes, is:
where I, and D, are the usual insertion and deletion costs. The length attribute is
divided by the string length, for the purpose of normalization. Function f takes care
of the difference in angular attributes.
Let us see how this applies to the very simple situation shown in Figure 6.18,
where the only line segments used are horizontal and vertical. Instead of taking
into account an orientation attribute, we use a PDL approach so that the 3 symbols
u, h, d with length attribute are used. As the only structural operator used is +, we
can neglect it when computing distances.
From the results shown in Figure 6.18, obtained with the algorithm implemented
in the String Match.xls file, some difficulties of this approach are visible. For