Page 106 -
P. 106
3:15
Page 69
HAN 09-ch02-039-082-9780123814791
2011/6/1
#31
2.4 Measuring Data Similarity and Dissimilarity 69
“How is dissimilarity computed between objects described by nominal attributes?”
The dissimilarity between two objects i and j can be computed based on the ratio of
mismatches:
p − m
d(i, j) = , (2.11)
p
where m is the number of matches (i.e., the number of attributes for which i and j are in
the same state), and p is the total number of attributes describing the objects. Weights
can be assigned to increase the effect of m or to assign greater weight to the matches in
attributes having a larger number of states.
Example 2.17 Dissimilarity between nominal attributes. Suppose that we have the sample data of
Table 2.2, except that only the object-identifier and the attribute test-1 are available,
where test-1 is nominal. (We will use test-2 and test-3 in later examples.) Let’s compute
the dissimilarity matrix (Eq. 2.9), that is,
0
d(2, 1) 0
.
d(3, 1) d(3, 2) 0
d(4, 1) d(4, 2) d(4, 3) 0
Since here we have one nominal attribute, test-1, we set p = 1 in Eq. (2.11) so that d(i, j)
evaluates to 0 if objects i and j match, and 1 if the objects differ. Thus, we get
0
1 0
.
1 1 0
0 1 1 0
From this, we see that all objects are dissimilar except objects 1 and 4 (i.e., d(4,1) = 0).
Table 2.2 A Sample Data Table Containing Attributes
of Mixed Type
Object test-1 test-2 test-3
Identifier (nominal) (ordinal) (numeric)
1 code A excellent 45
2 code B fair 22
3 code C good 64
4 code A excellent 28