Page 88 -
P. 88

3.6 Cluster Validation   75

                          The  second  dataset  is  clustered  using  the  k-means  algorithm  in  the  same
                        conditions as in  step 2. The centroids derived in  this step for the Rocks dataset S2
                        are shown in Table 3.2. Notice the proximity to the centroids of Table 3.1.

                        5. Compute a  measure of  agreement between  the clustering  of  S2  based  on  the
                          nearest centroid of S1 and the direct clustering of S2.
                          For  the  Rocks  dataset  S2  only  two  patterns  changed  their  assignments:  from
                        cluster #I according to the centroids of Sl, to the neighbour cluster #2 (see Figure
                        3.19).



                         Table 3.3. Agreement table for the two clustering methods of the Rocks dataset.


                                       Cluster   Cluster   Cluster   Nr  of
                                         #I       #2      #3      occurrences
                                          2       0        0         33
                                          1        1       0         1
                                          1        1       0         1
                                          0       2        0         6
                                          0       0        2         27




                           The agreement between the two clustering methods (using the centroids of S1 or
                         via directly clustering S2) is shown in Table 3.3. The entries in this table under a
                         cluster column are the number of times a pattern was assigned to that cluster. The
                         "Nr of  occurrences" column indicates how  many  times  this event  occurred.  For
                         instance, both  methods unanimously  assigned a pattern to  cluster #1 thirty-three
                         times.
                           A  measure  of  agreement can  be  computed  using  Cohen's  K statistic.  As  this
                          method is of interest in a broad class of  pattern recognition situations, namely for
                          comparing classifiers, we  will  describe  here the  major  aspects  of  this  statistical
                          method whose details can be found in e.g. Siege1 and Castellan (1988).
                            Consider an  "agreement table" such as Table 3.3  with n  objects assigned by  k
                          judges  (classifiers, methods) to one of c categories (clusters, classes). In the Rocks
                          example we have n=68, k=2 and c=3. Instead of filling in a table with 68 rows we
                          condensed it by adding the extra "Nr of occurrences" column. Let us denote n,,  the
                          number of  times an object i is assigned to category j.
                            The d statistic is given by the formula:
   83   84   85   86   87   88   89   90   91   92   93