Page 111 -
P. 111
HAN 09-ch02-039-082-9780123814791
74 Chapter 2 Getting to Know Your Data 2011/6/1 3:15 Page 74 #36
Example 2.20 Supremum distance. Let’s use the same two objects, x 1 = (1, 2) and x 2 = (3, 5), as in
Figure 2.23. The second attribute gives the greatest difference between values for the
objects, which is 5 − 2 = 3. This is the supremum distance between both objects.
If each attribute is assigned a weight according to its perceived importance, the
weighted Euclidean distance can be computed as
q
2
2
2
d(i, j) = w 1 |x i1 − x j1 | + w 2 |x i2 − x j2 | + ··· + w m |x ip − x jp | . (2.20)
Weighting can also be applied to other distance measures as well.
2.4.5 Proximity Measures for Ordinal Attributes
The values of an ordinal attribute have a meaningful order or ranking about them,
yet the magnitude between successive values is unknown (Section 2.1.4). An exam-
ple includes the sequence small, medium, large for a size attribute. Ordinal attributes
may also be obtained from the discretization of numeric attributes by splitting the value
range into a finite number of categories. These categories are organized into ranks. That
is, the range of a numeric attribute can be mapped to an ordinal attribute f having M f
states. For example, the range of the interval-scaled attribute temperature (in Celsius)
can be organized into the following states: −30 to −10, −10 to 10, 10 to 30, repre-
senting the categories cold temperature, moderate temperature, and warm temperature,
respectively. Let M represent the number of possible states that an ordinal attribute can
have. These ordered states define the ranking 1,..., M f .
“How are ordinal attributes handled?” The treatment of ordinal attributes is
quite similar to that of numeric attributes when computing dissimilarity between
objects. Suppose that f is an attribute from a set of ordinal attributes describing
n objects. The dissimilarity computation with respect to f involves the following
steps:
1. The value of f for the ith object is x if , and f has M f ordered states, representing the
ranking 1,..., M f . Replace each x if by its corresponding rank, r if ∈ {1,..., M f }.
2. Since each ordinal attribute can have a different number of states, it is often
necessary to map the range of each attribute onto [0.0, 1.0] so that each attribute
has equal weight. We perform such data normalization by replacing the rank r if
of the ith object in the f th attribute by
r if − 1
z if = . (2.21)
M f − 1
3. Dissimilarity can then be computed using any of the distance measures described
in Section 2.4.4 for numeric attributes, using z if to represent the f value for the ith
object.