Page 69 -
P. 69
56 3 Data Clustering
Therefore, minimizing E corresponds in a certain sense to minimizing the
cluster volumes, which seems a sensible decision. If we use a Euclidian metric to
evaluate the distance between the pairs of feature vectors, we obtain the cluster
solution depicted in Figure 3.2a. This solution is quite distinct from the visual
clustering solution, shown in Figure 3.2b, which is obtained using the city-block
metric. If we had used the xCross data instead, it is easy to see that the only
adequate metric to use in this case is the Chebychev metric.
As we see from the cross data example, clustering solutions may depend
drastically on the metric used. There is, however, a subtle point concerning the
measurement scale of the features that must be emphasized. Whereas in the
following chapters we will design classifiers using supervised methods, which are
in principle independent (or moderately dependent) of the measurement scale of
the features, when performing data clustering, the solutions obtained may also vary
drastically with the measurement scale or type of standardization of the features.
Satybal
Faro
Real Faro
V~seu
V Real
05
b RRSONS
Figure 3.3. Visual clustering of crimes data with different coordinate axes. (a)
{Faro, V. Real ) {Setubal, Viseu}; (b) (Viseu, V. Real} {Setubal, Faro) .
An example of the influence of feature standardization can be seen in Figure
3.3, where the same patterns of the Crimes dataset are depicted changing the
position and scale of the coordinate axes. In Figure 3.3a two clusters can be
identified visually (Faro, V. Real) and {Setubal, Viseu); in Figure 3.3b the visual
clustering is different: (Viseu, V. Real ), {Faro, Setliball. Notice that this
contradictory result is a consequence of a different representation scale of the
coordinate axes, which corresponds with situations of different true scales: in a)
with a shrunken persons scale; in b) with a shrunken proper& scale. If the true
scales were used then one would tend to choose only one cluster. The problem is
aggravated when the features are of different nature, measured in different
measurement units and occupying quite disparate value rages. This scaling aspect
was already referred to in section 2.3 (see Figure 2.14) when discussing features'
contribution to pattern discrimination.