Page 314 -
P. 314

Section 9.4  Segmentation, Clustering, and Graphs  282


                            pixel. We adopt the convention that δ i = −1if the ith pixel is background, and
                            δ i = 1 if the ith pixel is foreground. We will find a value for this binary variable
                            that minimizes an energy function. The energy will have two types of term. The
                            first type of term will encourage pixels similar to the foreground (resp. background)
                            model to have foreground (resp. background) labels. The second will encourage
                            pixels to have the same label as their neighbors.
                                 Write p for a vector representing the ith pixel. The vector could contain
                                       i
                            intensity; intensity and color; intensity, color and texture; or other information.
                            Write d f (p) for a function that compares the pixel vector p with the foreground
                            model; this function is large when the pixel is not like the foreground, and small
                            when it is like the foreground. Similarly, write d b (p) for a function that compares
                            the pixel vector with the background. Write N(i) for the neighbors of pixel i.Write
                            B(p , p ) for a non-negative, symmetric function that compares two pixels, which
                               i  j
                            we will use as a cost for assigning neighboring pixels to different models. This could
                            be as simple as a constant, which just encourages neighboring pixels to have the
                            same label. More complicated B should be large for two pixels that are similar,
                            and small for different pixels; in this case, we will encourage the label to change
                            between pixels that look different.
                                            1
                                 Notice that ( )(1 − δ i δ j ) has the value 1 when δ i and δ j are different, and 0
                                            2
                            otherwise. Write I for the set of all pixels, U for the set of unknown pixels, F for
                            the set of known foreground pixels, and B for the set of known background pixels.
                            Now we can write an energy function
                                                            1               1

                                           ∗
                                         E (δ)  =      d f (p ) (1 + δ i )+ d b (p ) (1 − δ i )+
                                                           i
                                                                          i
                                                            2               2
                                                    i∈I
                                                                     1

                                                            B(p , p )( )(1 − δ i δ j )
                                                                i
                                                                  j
                                                                     2
                                                    i∈I j∈N(i)
                            which we must minimize subject to δ k =1 for k ∈F and δ k =0 for k ∈B. Notice
                            that we can make this energy function small by labelling pixels that agree with the
                            foreground model with δ = 1, those that agree with the background model with
                            δ = −1, and ensuring that labels change at pixels that look different (i.e., where
                            B is small). Minimizing this energy might be hard, because it is a combinatorial
                            problem (δ j can take only two values).
                                 It turns out that minimizing E can be rephrased as minimizing a cut on a
                            graph. The easiest way to see this is with a figure. Imagine a cut on the graph of
                            Figure 9.23. In this graph, each pixel is represented by a vertex, the source ver-
                            tex corresponds to the foreground label, and the target vertex corresponds to the
                            background label. There is one edge connecting each pixel to the source and one
                            connecting it to the target; we can cut the graph by cutting only one of these two
                            edges, and if we cut both, the cut is not minimal. We can interpret a cut that cuts
                            only one of these edges as a map from a pixel to foreground (resp. background)
                            depending on whether the edge to the source (resp. target) remains uncut. Fur-
                            thermore, the value of a cut that cuts only one of these two edges for each pixel
                            is the same as the value of the energy function E for the corresponding labelling.
                            As a result, we can minimize the energy function by computing the minimum cut.
                            This is known to be polynomial (from the references in Section 9.4.1), but in fact
   309   310   311   312   313   314   315   316   317   318   319