Page 243 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 243
232 UNSUPERVISED LEARNING
7.2.2 K-means clustering
K-means clustering (Bishop, 1995) differs in two important aspects from
hierarchical clustering. First, K-means clustering requires the number of
clusters K beforehand. Second, it is not hierarchical, instead it partitions
the data set into K disjoint subsets. Again the clustering is basically
determined by the distances between objects. The K-means algorithm
has the following structure:
Algorithm 7.3: K-means clustering
1. Assign each object randomly to one of the clusters k ¼ 1, ... K.
2. Compute the means of each of the clusters:
1 X
m ¼ z i ð7:12Þ
k
N k
z i 2C k
3. Reassign each object z i to the cluster with the closest mean m .
k
4. Return to step 2 until the means of the clusters do not change any-
more.
The initialization step can be adapted to speed up the convergence.
Instead of randomly labelling the data, K randomly chosen objects are
taken as cluster means. Then the procedure enters the loop in step 3.
Note again that the procedure depends on distances, in this case between
the objects z i and the means m . Scaling the feature space will here also
k
change the final clustering result. An advantage of K-means clustering is
that it is very easy to implement. On the other hand, it is unstable:
running the procedure several times will give several different results.
Depending on the random initialization, the algorithm will converge to
different (local) minima. In particular when a high number of clusters is
requested, it often happens that some clusters do not gain sufficient
support and are ignored. The effective number of clusters then becomes
much less than K.
In Figure 7.7 the result of a K-means clustering is shown for a simple
2D data set. The means are indicated by the circles. At the start of the
optimization, i.e. at the start of the trajectory, each mean coincides with
a data object. After 10 iteration steps, the solution converged. The result
of the last iteration is indicated by ‘ ’. In this case, the number of
clusters in the data and the predefined K ¼ 3 match. A fairly stable
solution is found.