Page 80 -
P. 80
3.4 Dimensional Reduction 67
Multidimensional scaling is another method of representing data in a smaller
number of dimensions preserving as much as possible the similarity structure of the
data. Unlike factor analysis it is an iterative method. It has a broader application
than the principal components analysis that we have just described, since it makes
no assumptions as to the metric used for assessing similarity. It can even be used
with similarity measures that do not satisfy all properties listed in (2-8).
The basic idea is to try, in an iterative way, to position the feature vectors in the
reduced dimension space so that the distance or dissimilarity values among them
are preserved as much as possible. For this purpose, the following quadratic error
measure known as stress, is iteratively minimized:
where:
- xi, x are any arbitrary pair of distinct patterns ( i # j );
- d(xi, x ,) is the original dissimilarity between xi and xj;
- d*(x,, x ,) is the transformed dissimilarity in the lower dimensional space;
- f is a monotonic transformation function.
The details of this iterative process are beyond the scope of this book. The
interested reader can refer, for instance, to (Sammon, 1969) and (Leeuw and
Heisen, 1982).
NUMERIC -
A
VALUES
1 2 3 4 5 6 7 8
A B C D E F G H I
.O 33.8 10.3 12.3 98.1 25.3 8.6 25.
B 33.8 .O 36.4 31.7 76.7 35.4 34.7 50.
C 10.3 36.4 .O 15.5 95.9 27.6 11.6 24.
D 12.3 31.7 15.5 .O 96.2 20.9 9.6 21.
E 98.1 76.7 95.9 96.2 -0 101.4 101.4 115.
F 25.3 35.4 27.6 20.9 101.4 .O 20.9 26.
Figure 3.14. Dissimilarity matrix for the Food dataset. Dissimilarity values
correspond to Euclidian distances.