Page 306 -
P. 306

Section 9.3  Image Segmentation by Clustering Pixels  274


                                 This is a quite general strategy that applies to many different bump functions.
                            We will use a specific kernel smoother, writing

                                                                              2
                                                         (2π) (−d/2)     1 ||x||
                                               K(x; h)=           exp −
                                                            h d          2  h
                            for the bump function. We introduce a positive scale parameter h, which we can
                            adjust to get the best representation. Then, our model of the density is

                                                             n
                                                           1
                                                  f(x)=          K (x i − x; h)
                                                           n
                                                              i=1
                            (you should check that this is a density, i.e., that it is non-negative, and that its
                            integral is one). We can estimate h by maximizing the average likelihood of held-
                            out data. In this procedure, we repeat the following experiment numerous times:
                            hold out one data point at random, fit a density model to the remaining data, and
                            then compute the likelihood of the held-out data point as a function of h (perhaps
                            by computing values at a set of different sample points in h). We average the
                            likelihoods computed in this way, then use the h that maximizes that likelihood.
                                                                             
  1
                                 We can simplify notation by writing k(u)=exp − u (this is called the
                                                                                2
                                                 (2π) (−d/2)
                            kernel profile)and C =       ,so that
                                                   nh d
                                                             n
                                                                   x − x i  2
                                                   f(x)= C     k ||      ||                    (9.1)
                                                                     h
                                                            i=1
                            Now write g =  d  k(u). Starting from some point x 0 , we should like to find a
                                           du
                            nearby point that maximizes the density value. We will use this local maximum
                            (local mode) as a cluster center. The mean shift procedure maximizes expressions
                            of the form of Equation 9.1. We are seeking y such that the gradient ∇f vanishes
                            at that point. We must require that

                                 ∇f(x) |x=y  =   0
                                                           x i − y
                                                                  2
                                             =   C    ∇k(||      || )
                                                             h
                                                    i
                                                               -          2  .
                                                   2               x i − y
                                             =   C      [x i − y] g(||   || )
                                                   h                 h
                                                     i
                                                    #                    $   #               $
                                                              x i −y 2
                                                   2    i  x i g(||  h  || )        x i − y  2
                                             =   C                2   − y ×      g(||     || ) .
                                                   h         x i −y                   h
                                                         i  g(||  h  || )      i
                                                x i −y 2
                            We expect that   g(||    || ) is nonzero, so that the maximum occurs when
                                            i     h
                                                   #                   $
                                                            x i −y 2
                                                       x i g(||  || )
                                                      i       h     − y =0,
                                                           x i −y  2
                                                       i  g(||  h  || )
   301   302   303   304   305   306   307   308   309   310   311