Page 106 -
P. 106

3:15
                                                                           Page 69
                          HAN 09-ch02-039-082-9780123814791
                                                             2011/6/1
                                                                                   #31
                                                       2.4 Measuring Data Similarity and Dissimilarity  69


                                 “How is dissimilarity computed between objects described by nominal attributes?”
                               The dissimilarity between two objects i and j can be computed based on the ratio of
                               mismatches:
                                                                  p − m
                                                           d(i, j) =   ,                       (2.11)
                                                                    p
                               where m is the number of matches (i.e., the number of attributes for which i and j are in
                               the same state), and p is the total number of attributes describing the objects. Weights
                               can be assigned to increase the effect of m or to assign greater weight to the matches in
                               attributes having a larger number of states.

                 Example 2.17 Dissimilarity between nominal attributes. Suppose that we have the sample data of
                               Table 2.2, except that only the object-identifier and the attribute test-1 are available,
                               where test-1 is nominal. (We will use test-2 and test-3 in later examples.) Let’s compute
                               the dissimilarity matrix (Eq. 2.9), that is,

                                                                            
                                                        0
                                                    d(2, 1)   0             
                                                    
                                                                             
                                                                            .
                                                    d(3, 1)  d(3, 2)  0     
                                                      d(4, 1)  d(4, 2)  d(4, 3)  0
                               Since here we have one nominal attribute, test-1, we set p = 1 in Eq. (2.11) so that d(i, j)
                               evaluates to 0 if objects i and j match, and 1 if the objects differ. Thus, we get

                                                                     
                                                            0
                                                           1  0      
                                                           
                                                                      .
                                                                      
                                                           1  1 0    
                                                           
                                                            0  1 1   0
                               From this, we see that all objects are dissimilar except objects 1 and 4 (i.e., d(4,1) = 0).


                     Table 2.2 A Sample Data Table Containing Attributes
                               of Mixed Type
                               Object    test-1     test-2    test-3
                               Identifier  (nominal)  (ordinal)  (numeric)
                               1         code A     excellent  45
                               2         code B     fair      22
                               3         code C     good      64
                               4         code A     excellent  28
   101   102   103   104   105   106   107   108   109   110   111