Page 320 -
P. 320
Section 9.6 Notes 288
algorithms that solve the relevant two-label Markov random field (see Vogler et al.
(2000); Boykov and Jolly (2001); or Boykov and Funka Lea (2006)). There are now
many important variants. Grabcut is due to Rother et al. (2004); Objcut uses prior
information about object shapes to improve the cut (Kumar et al. 2010); and see
also Duchenne et al. (2008). There are numerous matting methods, which Wang
and Cohen (2007) survey in detail.
The normalized cuts formalism is due to Shi and Malik (1997) and (2000).
Variants include applications to motion segmentation Shi and Malik (1998a)and
methods for deducing similarity metrics from outputs Shi and Malik (1998b). There
are numerous alternate criteria (e.g., Cox et al. (1996), Perona and Freeman (1998)).
There is a considerable early literature on the evaluation of segmentation.
Useful references include: Zhang (1996a); Zhang (1997); Beauchemin and Thom-
son (1997); Zhang and Gerbrands (1994); Correia and Pereira (2003); Lei and
Udupa (2003); Warfield et al. (2004); Paglieroni (2004); Cardoso and Corte Real
(2005); Cardoso and Corte Real (2006); Cardoso et al. (2009); Carleer et al. (2005);
and Crum et al. (2006). Evaluation is easier in the context of a specific task; pa-
pers dealing with assorted tasks include Yasnoff et al. (1977), Hartley et al. (1982),
Zhang (1996b), and Ranade and Prewitt (1980). Martin et al. (2001) introduced
the Berkeley segmentation dataset, which is now a standard for evaluation, but
there are a variety of criteria one can use. Unnikrishnan et al. (2007) use the Rand
index; Polak et al. (2009) use multiple object boundaries; Polak et al. (2009) give a
detailed evaluation of four segmentation algorithms; Hanbury and Stottinger (2008)
compare metrics; and Zhang et al. (2008) give a recent survey of evaluation meth-
ods. Good image segments are most likely internally coherent, but making that
idea useful is hard (Bagon et al. 2008).
Since it is hard to get a segmentation right, Russell et al. (2006) suggest
working with multiple segmentations and then choosing good pieces. This idea is
now very influential. Multiple segmentations have been used to improve estimates
of support (Malisiewicz and Efros 2007), and to drive recognition (Pantofaru et
al. 2008) or (Malisiewicz and Efros 2008). One could organize the multiple segments
into an inclusion hierarchy (Tacc and Ahuja 1997); the hierarchies yield object
models (Todorovic and Ahuja 2008b), and can be matched (Todorovic and Ahuja
2008a).
We haven’t discussed some aspects of perceptual organization in great detail
mainly because our emphasis is on exposition rather than historical accuracy, and
these methods follow from the unified view. For example, there is a long thread of
literature on clustering image edge points or line segments into configurations that
are unlikely to have arisen by accident. We cover some of these ideas in the following
chapter, but also draw the readers attention to Amir and Lindenbaum (1996),
Huttenlocher and Wayner (1992), Lowe (1985), Mohan and Nevatia (1992), Sarkar
and Boyer (1993), and to Sarkar and Boyer (1994). In building user interfaces, it can
(as we hinted before) be helpful to know what is perceptually salient (e.g., Saund
and Moran (1995)).