Page 239 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 239
228 UNSUPERVISED LEARNING
With these error criteria, different clustering results can be compared
provided that K is kept constant. Clustering results found for varying K
cannot be compared. The pitfall of using criteria like (7.10) is that the
optimal number of clusters is K ¼ N S , i.e. the case in which each object
belongs to its own cluster. In the average-distance criterion it will result
in the trivial solution: J ¼ 0.
The choice of the number of clusters K is a very fundamental problem.
In some applications, an expected number of clusters is known beforehand.
Using this number for the clustering does not always yield optimal results
for all types of clustering methods: it might happen that, due to noise, a
local minimum is reached. Using a slightly larger number of clusters is
therefore sometimes preferred. In most cases the number of clusters to look
for is one of the research questions in the investigation of a data set. Often,
the data is repeatedly clustered using a range of values of K, and the
clustering criterion values are compared. When a significant decrease in a
criterion value appears, a ‘natural’ clustering is probably found. Unfortu-
nately, in practice, it is very hard to objectively say when a significant drop
occurs. True automatic optimization of the number of clusters is possible
in just a very few situations, when the cluster shapes are given.
In the coming sections we will discuss four methods for performing
a clustering: hierarchical clustering, K-means clustering, mixtures of
Gaussians and finally self-organizing maps.
7.2.1 Hierarchical clustering
The basic idea of hierarchical clustering (Johnson, 1967) is to collect
objects into clusters by combining the closest objects and clusters to
larger clusters until all objects are in one cluster. An important advan-
tage is that the objects are not just placed in K distinct groups, but are
placed in a hierarchy of clusters. This gives more information about the
structure in the data set, and shows which clusters are similar or dissimi-
lar. This makes it possible to detect subclusters in large clusters, or to
detect outlier clusters.
Figure 7.4 provides an example. The distance matrix of the 13 world
cities given in Table 7.1 are used to find a hierarchical cluster structure.
The figure shows that at the highest level the cities in the southern
hemisphere are separated from the ones in the northern part. At a
distance level of about 5000 km we see the following clusters: ‘East
Asiatic cities’, ‘European/Arabic cities’, ‘North American cities’, ‘African
city’, ‘South American cities’, ‘Australian city’.