Page 307 -

P. 307

Section 9.3 Image Segmentation by Clustering Pixels 275

or equivalently, when
2
x i −y || )
y = i x i g(|| h .
x i −y 2
i g(|| h || )
The mean shift procedure involves producing a series of estimates y (j) where

x i −y
(j) 2
x i g(|| h || )
i
(j+1)
y = .
x i −y
(j) 2
g(|| || )
i h
The procedure gets its name from the fact that we are shifting to a point which
has the form of a weighted mean (see Algorithm 9.5).
Start with an estimate of the mode y (0) and a set of n data vectors x i
of dimension d, a scaling constant h,and g the derivative of the kernel proﬁle

Until the update is tiny
Form the new estimate
x i −y || )
(j) 2
y (j+1) = i x i g(|| h
x i −y (j) 2
g(|| || )
i h
Algorithm 9.5: Finding a Mode with Mean Shift.

9.3.5 Clustering and Segmentation with Mean Shift
Clustering with mean shift is, in principle, straightforward. We start the mean shift
procedure at every data point, producing a mode for each data point. Because we
are working with continuous variables, every one of these modes is diﬀerent, but
we expect the modes to be very tightly clustered. There should be a small set of
actual modes, and each of these estimates is very close to one of them. These esti-
mates are themselves useful, because they represent a form of ﬁltering of the image.
We could replace each pixel with its mode representation; this gives a signiﬁcant
smoothing of the image in a way that respects image boundaries (Figure 9.20). To
cluster the data, we apply, say, an agglomerative clusterer to the mode estimates.
Because we expect the modes to be very tightly clustered, group average distance
is a good choice of distance, and we can stop clustering when this distance exceeds
a small threshold. This will produce a set of small, tight clusters that are widely
separated. We now map each data point to the cluster center corresponding to its
mode (Algorithm 9.6).
This recipe can be applied nearly directly to image segmentation. We repre-
sent each pixel with a feature vector, then cluster the feature vectors; each cluster
center represents a segment, and we replace each pixel with the number of its cluster
center. Improved performance can be obtained by a representation that balances
spatial and appearance features more explicitly. In particular, we represent the ith
s
pixel with a feature vector x i which has two components: x , which has dimension
i

302 303 304 305 306 307 308 309 310 311 312