Page 319 -
P. 319
Section 9.6 Notes 287
held out test people. Modern segmenters can do quite well at this test, but not as
well as people do (Figure 9.25).
The Berkeley Segmentation Data Set consists of 300 manually segmented im-
ages, and is distributed at http://www.eecs.berkeley.edu/Research/Projects/
CS/vision/bsds/. This page also maintains up-to-date benchmarks on that dataset.
A more recent version (BSDS-500) has 500 manually segmented images; see http://
www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/resources.
html. Again, there is a set of benchmarks on that dataset available. The Lo-
tus Hill Institute provides a large dataset, free for academic use, at http://www.
imageparsing.com/. Annotations are much richer than just region structure, and
extend to a detailed semantic hierarchy of region relations.
9.6 NOTES
Segmentation is a difficult topic, and there are a huge variety of methods. Surveys
of mainly historical interest are Riseman and Arbib (1977), Fu and Mui (1981),
Haralick and Shapiro (1985), Nevatia (1986), and Pal and Pal (1993).
One reason is that it is typically quite hard to assess the performance of a
segmenter at a level more useful than that of showing some examples. The original
clustering segmenter is Ohlander et al. (1978). Clustering methods tend to be rather
arbitrary—remember, this doesn’t mean they’re not useful—because there really
isn’t much theory available to predict what should be clustered and how. It is clear
that what we should be doing is forming clusters that are helpful to a particular
application, but this criterion hasn’t been formalized in any useful way. In this
chapter, we have attempted to give the big picture while ignoring detail, because a
detailed record of what has been done would be unenlightening. Everyone should
know about agglomerative clustering, divisive clustering, k-means, mean shift, and
at least one graph-based clustering algorithm (your choice!), because these ideas
are just so useful for so many applications; segmentation is just one application of
clustering.
There is a large literature on the role of grouping in human visual perception.
Standard Gestalt handbooks include Kanizsa (1979), and Koffka (1935). Subjec-
tive contours were first described by Kanisza; there is a broad summary discus-
sion in Kanizsa (1976). The authoritative book by Palmer (1999) gives a much
broader picture than we can supply here. There is a great deal of information
about the development of different theories of vision and the origins of Gestalt
thinking in Gordon (1997). Some groups appear to be formed remarkably early in
the visual process, a phenomenon known as pop out (Triesman 1982).
We believe the watershed is originally due to Digabel and Lantu´ejoul (1978);
see Vincent and Soille (1991). Fukunaga and Hostetler (1975) first described mean
shift, but it was largely ignored until the work of Cheng (1995). It is now a main-
stay of computer vision research; as we shall see in the following chapters, it has
numerous applications.
A variety of graph theoretical clustering methods have been used in vision
(see Sarkar and Boyer (1998), and Wu and Leahy (1993); there is a summary
in Weiss (1999)).
Interactive segmentation became possible because of extremely fast min-cut