Page 307 -
P. 307

Section 9.3  Image Segmentation by Clustering Pixels  275


                            or equivalently, when
                                                                       2
                                                                  x i −y  || )
                                                     y =    i  x i g(||  h  .
                                                                 x i −y 2
                                                             i  g(||  h  || )
                            The mean shift procedure involves producing a series of estimates y (j)  where

                                                                   x i −y
                                                                       (j)  2
                                                              x i g(||  h  || )
                                                             i
                                                    (j+1)
                                                  y     =                   .
                                                                  x i −y
                                                                      (j)  2
                                                               g(||      || )
                                                              i     h
                            The procedure gets its name from the fact that we are shifting to a point which
                            has the form of a weighted mean (see Algorithm 9.5).
                            Start with an estimate of the mode y (0)  and a set of n data vectors x i
                            of dimension d, a scaling constant h,and g the derivative of the kernel profile

                            Until the update is tiny
                                Form the new estimate
                                                 x i −y  || )
                                                     (j) 2
                                  y (j+1)  =  i  x i g(||  h
                                                x i −y (j) 2
                                              g(||    || )
                                             i     h
                                          Algorithm 9.5: Finding a Mode with Mean Shift.


                     9.3.5 Clustering and Segmentation with Mean Shift
                            Clustering with mean shift is, in principle, straightforward. We start the mean shift
                            procedure at every data point, producing a mode for each data point. Because we
                            are working with continuous variables, every one of these modes is different, but
                            we expect the modes to be very tightly clustered. There should be a small set of
                            actual modes, and each of these estimates is very close to one of them. These esti-
                            mates are themselves useful, because they represent a form of filtering of the image.
                            We could replace each pixel with its mode representation; this gives a significant
                            smoothing of the image in a way that respects image boundaries (Figure 9.20). To
                            cluster the data, we apply, say, an agglomerative clusterer to the mode estimates.
                            Because we expect the modes to be very tightly clustered, group average distance
                            is a good choice of distance, and we can stop clustering when this distance exceeds
                            a small threshold. This will produce a set of small, tight clusters that are widely
                            separated. We now map each data point to the cluster center corresponding to its
                            mode (Algorithm 9.6).
                                 This recipe can be applied nearly directly to image segmentation. We repre-
                            sent each pixel with a feature vector, then cluster the feature vectors; each cluster
                            center represents a segment, and we replace each pixel with the number of its cluster
                            center. Improved performance can be obtained by a representation that balances
                            spatial and appearance features more explicitly. In particular, we represent the ith
                                                                                s
                            pixel with a feature vector x i which has two components: x , which has dimension
                                                                                i
   302   303   304   305   306   307   308   309   310   311   312