Page 280 -
P. 280

5.3 Mean shift and mode finding                                                         259


               Comaniciu and Meer (2002) prove that this algorithm converges to a local maximum of f(x)
               under reasonably weak conditions on the kernel k(r), i.e., that it is monotonically decreasing.
               This convergence is not guaranteed for regular gradient descent unless appropriate step size
               control is used.
                  The two kernels that Comaniciu and Meer (2002) studied are the Epanechnikov kernel,
                                          k E (r) = max(0, 1 − r),                  (5.40)

               which is a radial generalization of a bilinear kernel, and the Gaussian (normal) kernel,
                                                         1

                                           k N (r) = exp − r .                      (5.41)
                                                         2
               The corresponding derivative kernels g(r) are a unit ball and another Gaussian, respectively.
               Using the Epanechnikov kernel converges in a finite number of steps, while the Gaussian
               kernel has a smoother trajectory (and produces better results), but converges very slowly near
               a mode (Exercise 5.5).
                  The simplest way to apply mean shift is to start a separate mean-shift mode estimate
               y at every input point x i and to iterate for a fixed number of steps or until the mean-shift
               magnitude is below a threshold. A faster approach is to randomly subsample the input points
               x i and to keep track of each point’s temporal evolution. The remaining points can then be
               classified based on the nearest evolution path (Comaniciu and Meer 2002). Paris and Durand
               (2007) review a number of other more efficient implementations of mean shift, including their
               own approach, which is based on using an efficient low-resolution estimate of the complete
               multi-dimensional space of f(x) along with its stationary points.
                  The color-based segmentation shown in Figure 5.16 only looks at pixel colors when deter-
               mining the best clustering. It may therefore cluster together small isolated pixels that happen
               to have the same color, which may not correspond to a semantically meaningful segmentation
               of the image.
                  Better results can usually be obtained by clustering in the joint domain of color and lo-
               cation. In this approach, the spatial coordinates of the image x s =(x, y), which are called
               the spatial domain, are concatenated with the color values x r , which are known as the range
               domain, and mean-shift clustering is applied in this five-dimensional space x j . Since location
               and color may have different scales, the kernels are adjusted accordingly, i.e., we use a kernel
               of the form
                                                     2         2

                                                  x r       x s
                                      K(x j )= k        k         ,                 (5.42)
                                                  h 2 r      h 2 s
               where separate parameters h s and h r are used to control the spatial and range bandwidths of
               the filter kernels. Figure 5.18 shows an example of mean-shift clustering in the joint domain,
               with parameters (h s ,h r ,M) = (16, 19, 40), where spatial regions containing less than M
               pixels are eliminated.
                  The form of the joint domain filter kernel (5.42) is reminiscent of the bilateral filter kernel
               (3.34–3.37) discussed in Section 3.3.1. The difference between mean shift and bilateral fil-
               tering, however, is that in mean shift the spatial coordinates of each pixel are adjusted along
               with its color values, so that the pixel migrates more quickly towards other pixels with similar
               colors, and can therefore later be used for clustering and segmentation.
                  Determining the best bandwidth parameters h to use with mean shift remains something
               of an art, although a number of approaches have been explored. These include optimizing
   275   276   277   278   279   280   281   282   283   284   285