Page 245 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 245
234 UNSUPERVISED LEARNING
The algorithm is implemented in PRTools with the function kmeans.
See Listing 7.5.
Listing 7.5
PRTools code for fitting and plotting a K-means clustering, with K ¼ 4.
load nutsbolts_unlabeled; % Load the data set z
lab ¼ kmeans(z,4); % Perform k-means clustering
y ¼ dataset(z,lab); % Label by cluster assignment
figure; clf; scatterd(y); % and plot it
7.2.3 Mixture of Gaussians
In the K-means clustering algorithm, spherical clusters were assumed by
the fact that the Euclidean distance to the cluster centres m is computed.
k
All objects on a circle or hypersphere around the cluster centre will have
the same resemblance to that cluster. However, in many cases clusters
have more structure than that. In the mixture of Gaussians model
(Dempster et al., 1977; Bishop, 1995), it is assumed that the objects in
each of the K clusters are distributed according to a Gaussian distribu-
tion. That means that each cluster is not only characterized by a mean m k
but also by a covariance matrix C k . In effect, a complete density estimate
of the data is performed, where the density is modelled by:
K
X
ð
pðzÞ¼ k N zjm ; C k Þ ð7:13Þ
k
k¼1
N(zjm ,C k ) stands for the multivariate Gaussian distribution. k are the
k
mixing parameters (for which P K k ¼ 1 and k 0). The mixing
k¼1
parameter k can be regarded as the probability that z is produced by
a random number generator with probability density N(zjm , C k ).
k
The parameters to be estimated are: the number K of mixing compon-
ents, the mixing parameters 1 , .. . , K , the mean vectors m and the
k
covariance matrices C k . We will denote this set of free parameters by
C ¼f k , m , C k jk ¼ 1, .. . , Kg. This increase in the number of free
k
parameters, compared to the K-means algorithm, leads to the need for
a higher number of training objects in order to reliably estimate these
cluster parameters. On the other hand, because more flexible cluster
shapes are applied, fewer clusters might have to be used to approximate
the structure in the data.