Page 239 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 239

228                                     UNSUPERVISED LEARNING

              With these error criteria, different clustering results can be compared
            provided that K is kept constant. Clustering results found for varying K
            cannot be compared. The pitfall of using criteria like (7.10) is that the
            optimal number of clusters is K ¼ N S , i.e. the case in which each object
            belongs to its own cluster. In the average-distance criterion it will result
            in the trivial solution: J ¼ 0.
              The choice of the number of clusters K is a very fundamental problem.
            In some applications, an expected number of clusters is known beforehand.
            Using this number for the clustering does not always yield optimal results
            for all types of clustering methods: it might happen that, due to noise, a
            local minimum is reached. Using a slightly larger number of clusters is
            therefore sometimes preferred. In most cases the number of clusters to look
            for is one of the research questions in the investigation of a data set. Often,
            the data is repeatedly clustered using a range of values of K, and the
            clustering criterion values are compared. When a significant decrease in a
            criterion value appears, a ‘natural’ clustering is probably found. Unfortu-
            nately, in practice, it is very hard to objectively say when a significant drop
            occurs. True automatic optimization of the number of clusters is possible
            in just a very few situations, when the cluster shapes are given.
              In the coming sections we will discuss four methods for performing
            a clustering: hierarchical clustering, K-means clustering, mixtures of
            Gaussians and finally self-organizing maps.



            7.2.1  Hierarchical clustering

            The basic idea of hierarchical clustering (Johnson, 1967) is to collect
            objects into clusters by combining the closest objects and clusters to
            larger clusters until all objects are in one cluster. An important advan-
            tage is that the objects are not just placed in K distinct groups, but are
            placed in a hierarchy of clusters. This gives more information about the
            structure in the data set, and shows which clusters are similar or dissimi-
            lar. This makes it possible to detect subclusters in large clusters, or to
            detect outlier clusters.
              Figure 7.4 provides an example. The distance matrix of the 13 world
            cities given in Table 7.1 are used to find a hierarchical cluster structure.
            The figure shows that at the highest level the cities in the southern
            hemisphere are separated from the ones in the northern part. At a
            distance level of about 5000 km we see the following clusters: ‘East
            Asiatic cities’, ‘European/Arabic cities’, ‘North American cities’, ‘African
            city’, ‘South American cities’, ‘Australian city’.
   234   235   236   237   238   239   240   241   242   243   244