Page 110 -
P. 110
3:15
Page 73
#35
2011/6/1
HAN 09-ch02-039-082-9780123814791
2.4 Measuring Data Similarity and Dissimilarity 73
Symmetry: d(i, j) = d(j, i): Distance is a symmetric function.
Triangle inequality: d(i, j) ≤ d(i, k) + d(k, j): Going directly from object i to object j
in space is no more than making a detour over any other object k.
A measure that satisfies these conditions is known as metric. Please note that the
non-negativity property is implied by the other three properties.
Example 2.19 Euclidean distance and Manhattan distance. Let x 1 = (1, 2) and x 2 = (3, 5) repre-
sent two objects as shown in Figure 2.23. The Euclidean distance between the two is
√
2
2
2 + 3 = 3.61. The Manhattan distance between the two is 2 + 3 = 5.
Minkowski distance is a generalization of the Euclidean and Manhattan distances.
It is defined as
q
h
h
h
d(i, j) = h |x i1 − x j1 | + |x i2 − x j2 | + ··· + |x ip − x jp | , (2.18)
where h is a real number such that h ≥ 1. (Such a distance is also called L p norm in
some literature, where the symbol p refers to our notation of h. We have kept p as
the number of attributes to be consistent with the rest of this chapter.) It represents
the Manhattan distance when h = 1 (i.e., L 1 norm) and Euclidean distance when h = 2
(i.e., L 2 norm).
The supremum distance (also referred to as L max , L ∞ norm and as the Chebyshev
distance) is a generalization of the Minkowski distance for h → ∞. To compute it, we
find the attribute f that gives the maximum difference in values between the two objects.
This difference is the supremum distance, defined more formally as:
1
p h
p
X h
d(i, j) = lim |x if − x jf | = max|x if − x jf |. (2.19)
h→∞ f
f =1
The L ∞ norm is also known as the uniform norm.
x = (3, 5)
5 2
4 Euclidean distance
3 = (2 + 3 ) = 3.61
2
2 1/2
3
x = (1, 2) Manhattan distance
1
2 = 2 + 3 = 5
1 2 Supremum distance
= 5 – 2 = 3
1 2 3
Figure 2.23 Euclidean, Manhattan, and supremum distances between two objects.