Page 314 -
P. 314
Section 9.4 Segmentation, Clustering, and Graphs 282
pixel. We adopt the convention that δ i = −1if the ith pixel is background, and
δ i = 1 if the ith pixel is foreground. We will find a value for this binary variable
that minimizes an energy function. The energy will have two types of term. The
first type of term will encourage pixels similar to the foreground (resp. background)
model to have foreground (resp. background) labels. The second will encourage
pixels to have the same label as their neighbors.
Write p for a vector representing the ith pixel. The vector could contain
i
intensity; intensity and color; intensity, color and texture; or other information.
Write d f (p) for a function that compares the pixel vector p with the foreground
model; this function is large when the pixel is not like the foreground, and small
when it is like the foreground. Similarly, write d b (p) for a function that compares
the pixel vector with the background. Write N(i) for the neighbors of pixel i.Write
B(p , p ) for a non-negative, symmetric function that compares two pixels, which
i j
we will use as a cost for assigning neighboring pixels to different models. This could
be as simple as a constant, which just encourages neighboring pixels to have the
same label. More complicated B should be large for two pixels that are similar,
and small for different pixels; in this case, we will encourage the label to change
between pixels that look different.
1
Notice that ( )(1 − δ i δ j ) has the value 1 when δ i and δ j are different, and 0
2
otherwise. Write I for the set of all pixels, U for the set of unknown pixels, F for
the set of known foreground pixels, and B for the set of known background pixels.
Now we can write an energy function
1 1
∗
E (δ) = d f (p ) (1 + δ i )+ d b (p ) (1 − δ i )+
i
i
2 2
i∈I
1
B(p , p )( )(1 − δ i δ j )
i
j
2
i∈I j∈N(i)
which we must minimize subject to δ k =1 for k ∈F and δ k =0 for k ∈B. Notice
that we can make this energy function small by labelling pixels that agree with the
foreground model with δ = 1, those that agree with the background model with
δ = −1, and ensuring that labels change at pixels that look different (i.e., where
B is small). Minimizing this energy might be hard, because it is a combinatorial
problem (δ j can take only two values).
It turns out that minimizing E can be rephrased as minimizing a cut on a
graph. The easiest way to see this is with a figure. Imagine a cut on the graph of
Figure 9.23. In this graph, each pixel is represented by a vertex, the source ver-
tex corresponds to the foreground label, and the target vertex corresponds to the
background label. There is one edge connecting each pixel to the source and one
connecting it to the target; we can cut the graph by cutting only one of these two
edges, and if we cut both, the cut is not minimal. We can interpret a cut that cuts
only one of these edges as a map from a pixel to foreground (resp. background)
depending on whether the edge to the source (resp. target) remains uncut. Fur-
thermore, the value of a cut that cuts only one of these two edges for each pixel
is the same as the value of the energy function E for the corresponding labelling.
As a result, we can minimize the energy function by computing the minimum cut.
This is known to be polynomial (from the references in Section 9.4.1), but in fact