Page 108 -
P. 108
#33
2011/6/1
HAN 09-ch02-039-082-9780123814791
3:15
Page 71
2.4 Measuring Data Similarity and Dissimilarity 71
dissimilarity between i and j is
r + s
d(i, j) = . (2.13)
q + r + s + t
For asymmetric binary attributes, the two states are not equally important, such as
the positive (1) and negative (0) outcomes of a disease test. Given two asymmetric binary
attributes, the agreement of two 1s (a positive match) is then considered more signifi-
cant than that of two 0s (a negative match). Therefore, such binary attributes are often
considered “monary” (having one state). The dissimilarity based on these attributes is
called asymmetric binary dissimilarity, where the number of negative matches, t, is
considered unimportant and is thus ignored in the following computation:
r + s
d(i, j) = . (2.14)
q + r + s
Complementarily, we can measure the difference between two binary attributes based
on the notion of similarity instead of dissimilarity. For example, the asymmetric binary
similarity between the objects i and j can be computed as
q
sim(i, j) = = 1 − d(i, j). (2.15)
q + r + s
The coefficient sim(i, j) of Eq. (2.15) is called the Jaccard coefficient and is popularly
referenced in the literature.
When both symmetric and asymmetric binary attributes occur in the same data set,
the mixed attributes approach described in Section 2.4.6 can be applied.
Example 2.18 Dissimilarity between binary attributes. Suppose that a patient record table (Table 2.4)
contains the attributes name, gender, fever, cough, test-1, test-2, test-3, and test-4, where
name is an object identifier, gender is a symmetric attribute, and the remaining attributes
are asymmetric binary.
For asymmetric attribute values, let the values Y (yes) and P (positive) be set to 1,
and the value N (no or negative) be set to 0. Suppose that the distance between objects
Table 2.4 Relational Table Where Patients Are Described by Binary Attributes
name gender fever cough test-1 test-2 test-3 test-4
Jack M Y N P N N N
Jim M Y Y N N N N
Mary F Y N P N P N
. . . . . . . .
. . . . . . . .
. . . . . . . .