Page 277 -
P. 277
256 5 Segmentation
visualization simpler, let us only consider the L*u* coordinates, as shown in Figure 5.16c.
How many obvious (elongated) clusters do you see? How would you go about finding these
clusters?
The k-means and mixtures of Gaussians techniques use a parametric model of the den-
sity function to answer this question, i.e., they assume the density is the superposition of a
small number of simpler distributions (e.g., Gaussians) whose locations (centers) and shape
(covariance) can be estimated. Mean shift, on the other hand, smoothes the distribution and
finds its peaks as well as the regions of feature space that correspond to each peak. Since
a complete density is being modeled, this approach is called non-parametric (Bishop 2006).
Let us look at these techniques in more detail.
5.3.1 K-means and mixtures of Gaussians
While k-means implicitly models the probability density as a superposition of spherically
symmetric distributions, it does not require any probabilistic reasoning or modeling (Bishop
2006). Instead, the algorithm is given the number of clusters k it is supposed to find; it
then iteratively updates the cluster center location based on the samples that are closest to
each center. The algorithm can be initialized by randomly sampling k centers from the input
feature vectors. Techniques have also been developed for splitting or merging cluster centers
based on their statistics, and for accelerating the process of finding the nearest mean center
(Bishop 2006).
In mixtures of Gaussians, each cluster center is augmented by a covariance matrix whose
values are re-estimated from the corresponding samples. Instead of using nearest neighbors
to associate input samples with cluster centers, a Mahalanobis distance (Appendix B.1.1)is
used:
−1 =(x i − μ ) Σ −1
T
d(x i , μ ; Σ k )= x i − μ k k (x i − μ ) (5.26)
k
k
k Σ k
where x i are the input samples, μ are the cluster centers, and Σ k are their covariance es-
k
timates. Samples can be associated with the nearest cluster center (a hard assignment of
membership) or can be softly assigned to several nearby clusters.
This latter, more commonly used, approach corresponds to iteratively re-estimating the
parameters for a mixture of Gaussians density function,
p(x|{π k , μ , Σ k })= π k N(x|μ , Σ k ), (5.27)
k
k
k
where π k are the mixing coefficients, μ and Σ k are the Gaussian means and covariances,
k
and
1
N(x|μ , Σ k )= e −d(x,μ ;Σ k ) (5.28)
k
k
|Σ k |
is the normal (Gaussian) distribution (Bishop 2006).
To iteratively compute (a local) maximum likely estimate for the unknown mixture pa-
rameters {π k , μ , Σ k }, the expectation maximization (EM) algorithm (Dempster, Laird, and
k
Rubin 1977) proceeds in two alternating stages:
1. The expectation stage (E step) estimates the responsibilities
1
z ik = π k N(x|μ , Σ k ) with z ik =1, (5.29)
k
Z i
k