Page 110 -
P. 110

3:15
                                                                           Page 73
                                                                                   #35
                                                             2011/6/1
                          HAN 09-ch02-039-082-9780123814791
                                                       2.4 Measuring Data Similarity and Dissimilarity  73


                               Symmetry: d(i, j) = d(j, i): Distance is a symmetric function.
                               Triangle inequality: d(i, j) ≤ d(i, k) + d(k, j): Going directly from object i to object j
                                 in space is no more than making a detour over any other object k.
                               A measure that satisfies these conditions is known as metric. Please note that the
                               non-negativity property is implied by the other three properties.

                 Example 2.19 Euclidean distance and Manhattan distance. Let x 1 = (1, 2) and x 2 = (3, 5) repre-
                               sent two objects as shown in Figure 2.23. The Euclidean distance between the two is
                               √
                                 2
                                     2
                                2 + 3 = 3.61. The Manhattan distance between the two is 2 + 3 = 5.
                                 Minkowski distance is a generalization of the Euclidean and Manhattan distances.
                               It is defined as
                                                  q
                                                                                      h
                                                                       h
                                                            h
                                           d(i, j) =  h  |x i1 − x j1 | + |x i2 − x j2 | + ··· + |x ip − x jp | ,  (2.18)
                               where h is a real number such that h ≥ 1. (Such a distance is also called L p norm in
                               some literature, where the symbol p refers to our notation of h. We have kept p as
                               the number of attributes to be consistent with the rest of this chapter.) It represents
                               the Manhattan distance when h = 1 (i.e., L 1 norm) and Euclidean distance when h = 2
                               (i.e., L 2 norm).
                                 The supremum distance (also referred to as L max , L ∞ norm and as the Chebyshev
                               distance) is a generalization of the Minkowski distance for h → ∞. To compute it, we
                               find the attribute f that gives the maximum difference in values between the two objects.
                               This difference is the supremum distance, defined more formally as:
                                                                      1
                                                        
                                                           p           h
                                                                            p
                                                          X         h
                                             d(i, j) = lim    |x if − x jf |    = max|x if − x jf |.  (2.19)
                                                    h→∞                     f
                                                          f =1
                               The L ∞  norm is also known as the uniform norm.



                                            x  = (3, 5)
                               5            2
                               4               Euclidean distance
                                 3             = (2  + 3 )  = 3.61
                                                 2
                                                    2 1/2
                               3
                                   x  = (1, 2)  Manhattan distance
                                    1
                               2               = 2 + 3 = 5
                               1       2       Supremum distance
                                               = 5 – 2 = 3
                                   1   2   3

                    Figure 2.23 Euclidean, Manhattan, and supremum distances between two objects.
   105   106   107   108   109   110   111   112   113   114   115