Page 243 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 243

232                                     UNSUPERVISED LEARNING

            7.2.2  K-means clustering

            K-means clustering (Bishop, 1995) differs in two important aspects from
            hierarchical clustering. First, K-means clustering requires the number of
            clusters K beforehand. Second, it is not hierarchical, instead it partitions
            the data set into K disjoint subsets. Again the clustering is basically
            determined by the distances between objects. The K-means algorithm
            has the following structure:


            Algorithm 7.3: K-means clustering

            1. Assign each object randomly to one of the clusters k ¼ 1, ... K.
            2. Compute the means of each of the clusters:
                                           1  X
                                     m ¼         z i                   ð7:12Þ
                                       k
                                          N k
                                             z i 2C k
            3. Reassign each object z i to the cluster with the closest mean m .
                                                                      k
            4. Return to step 2 until the means of the clusters do not change any-
               more.

            The initialization step can be adapted to speed up the convergence.
            Instead of randomly labelling the data, K randomly chosen objects are
            taken as cluster means. Then the procedure enters the loop in step 3.
            Note again that the procedure depends on distances, in this case between
            the objects z i and the means m . Scaling the feature space will here also
                                       k
            change the final clustering result. An advantage of K-means clustering is
            that it is very easy to implement. On the other hand, it is unstable:
            running the procedure several times will give several different results.
            Depending on the random initialization, the algorithm will converge to
            different (local) minima. In particular when a high number of clusters is
            requested, it often happens that some clusters do not gain sufficient
            support and are ignored. The effective number of clusters then becomes
            much less than K.
              In Figure 7.7 the result of a K-means clustering is shown for a simple
            2D data set. The means are indicated by the circles. At the start of the
            optimization, i.e. at the start of the trajectory, each mean coincides with
            a data object. After 10 iteration steps, the solution converged. The result
            of the last iteration is indicated by ‘ ’. In this case, the number of
            clusters in the data and the predefined K ¼ 3 match. A fairly stable
            solution is found.
   238   239   240   241   242   243   244   245   246   247   248