Page 302 -
P. 302

Section 9.3  Image Segmentation by Clustering Pixels  270


                                 tween the closest elements as the inter-cluster distance, which tends to yield
                                 extended clusters (statisticians call this method single-link clustering). An-
                                 other natural choice is the maximum distance between an element of the first
                                 cluster and one of the second, which tends to yield rounded clusters (statis-
                                 ticians call this method complete-link clustering). Finally, one could use an
                                 average of distances between elements in the cluster, which also tends to yield
                                 “rounded” clusters (statisticians call this method group average clustering).
                               • How many clusters are there? This is an intrinsically difficult task if there
                                 is no model for the process that generated the clusters. The algorithms we
                                 have described generate a hierarchy of clusters. Usually, this hierarchy is
                                 displayed to a user in the form of a dendrogram—a representation of the
                                 structure of the hierarchy of clusters that displays inter-cluster distances—
                                 and an appropriate choice of clusters is made from the dendrogram (see the
                                 example in Figure 9.15).








                                      6


                                           5
                                                                       distance
                                     4                 1

                                                           2
                                                      3




                                                                                    1  2345       6

                            FIGURE 9.15: Left, a data set; right, a dendrogram obtained by agglomerative clustering
                            using single-link clustering. If one selects a particular value of distance, then a horizontal
                            line at that distance splits the dendrogram into clusters. This representation makes it
                            possible to guess how many clusters there are and to get some insight into how good the
                            clusters are.
                                 The main difficulty in using either agglomerative or divisive clustering meth-
                            ods directly is that there are an awful lot of pixels in an image. There is no rea-
                            sonable prospect of examining a dendrogram because the quantity of data means
                            that it will be too big. In practice, this means that the segmenters decide when
                            to stop splitting or merging by using a set of threshold tests. For example, an
                            agglomerative segmenter might stop merging when the distance between clusters
                            is sufficiently low or when the number of clusters reaches some value. A divisive
                            clusterer might stop splitting when the resulting clusters meet some similarity test.
   297   298   299   300   301   302   303   304   305   306   307