Page 111 -
P. 111

HAN 09-ch02-039-082-9780123814791


          74    Chapter 2 Getting to Know Your Data          2011/6/1  3:15  Page 74  #36



           Example 2.20 Supremum distance. Let’s use the same two objects, x 1 = (1, 2) and x 2 = (3, 5), as in
                         Figure 2.23. The second attribute gives the greatest difference between values for the
                         objects, which is 5 − 2 = 3. This is the supremum distance between both objects.

                           If each attribute is assigned a weight according to its perceived importance, the
                         weighted Euclidean distance can be computed as
                                       q
                                                               2
                                                  2
                                                                                 2
                                d(i, j) =  w 1 |x i1 − x j1 | + w 2 |x i2 − x j2 | + ··· + w m |x ip − x jp | .  (2.20)
                         Weighting can also be applied to other distance measures as well.
                   2.4.5 Proximity Measures for Ordinal Attributes

                         The values of an ordinal attribute have a meaningful order or ranking about them,
                         yet the magnitude between successive values is unknown (Section 2.1.4). An exam-
                         ple includes the sequence small, medium, large for a size attribute. Ordinal attributes
                         may also be obtained from the discretization of numeric attributes by splitting the value
                         range into a finite number of categories. These categories are organized into ranks. That
                         is, the range of a numeric attribute can be mapped to an ordinal attribute f having M f
                         states. For example, the range of the interval-scaled attribute temperature (in Celsius)
                         can be organized into the following states: −30 to −10, −10 to 10, 10 to 30, repre-
                         senting the categories cold temperature, moderate temperature, and warm temperature,
                         respectively. Let M represent the number of possible states that an ordinal attribute can
                         have. These ordered states define the ranking 1,..., M f .
                           “How are ordinal attributes handled?” The treatment of ordinal attributes is
                         quite similar to that of numeric attributes when computing dissimilarity between
                         objects. Suppose that f is an attribute from a set of ordinal attributes describing
                         n objects. The dissimilarity computation with respect to f involves the following
                         steps:

                         1. The value of f for the ith object is x if , and f has M f ordered states, representing the
                           ranking 1,..., M f . Replace each x if by its corresponding rank, r if ∈ {1,..., M f }.
                         2. Since each ordinal attribute can have a different number of states, it is often
                           necessary to map the range of each attribute onto [0.0, 1.0] so that each attribute
                           has equal weight. We perform such data normalization by replacing the rank r if
                           of the ith object in the f th attribute by

                                                            r if − 1
                                                       z if =     .                      (2.21)
                                                            M f − 1

                         3. Dissimilarity can then be computed using any of the distance measures described
                           in Section 2.4.4 for numeric attributes, using z if to represent the f value for the ith
                           object.
   106   107   108   109   110   111   112   113   114   115   116