Page 319 -
P. 319

Section 9.6  Notes  287


                            held out test people. Modern segmenters can do quite well at this test, but not as
                            well as people do (Figure 9.25).
                                 The Berkeley Segmentation Data Set consists of 300 manually segmented im-
                            ages, and is distributed at http://www.eecs.berkeley.edu/Research/Projects/
                            CS/vision/bsds/. This page also maintains up-to-date benchmarks on that dataset.
                            A more recent version (BSDS-500) has 500 manually segmented images; see http://
                            www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/resources.
                            html. Again, there is a set of benchmarks on that dataset available. The Lo-
                            tus Hill Institute provides a large dataset, free for academic use, at http://www.
                            imageparsing.com/. Annotations are much richer than just region structure, and
                            extend to a detailed semantic hierarchy of region relations.

                     9.6 NOTES
                            Segmentation is a difficult topic, and there are a huge variety of methods. Surveys
                            of mainly historical interest are Riseman and Arbib (1977), Fu and Mui (1981),
                            Haralick and Shapiro (1985), Nevatia (1986), and Pal and Pal (1993).
                                 One reason is that it is typically quite hard to assess the performance of a
                            segmenter at a level more useful than that of showing some examples. The original
                            clustering segmenter is Ohlander et al. (1978). Clustering methods tend to be rather
                            arbitrary—remember, this doesn’t mean they’re not useful—because there really
                            isn’t much theory available to predict what should be clustered and how. It is clear
                            that what we should be doing is forming clusters that are helpful to a particular
                            application, but this criterion hasn’t been formalized in any useful way. In this
                            chapter, we have attempted to give the big picture while ignoring detail, because a
                            detailed record of what has been done would be unenlightening. Everyone should
                            know about agglomerative clustering, divisive clustering, k-means, mean shift, and
                            at least one graph-based clustering algorithm (your choice!), because these ideas
                            are just so useful for so many applications; segmentation is just one application of
                            clustering.
                                 There is a large literature on the role of grouping in human visual perception.
                            Standard Gestalt handbooks include Kanizsa (1979), and Koffka (1935). Subjec-
                            tive contours were first described by Kanisza; there is a broad summary discus-
                            sion in Kanizsa (1976). The authoritative book by Palmer (1999) gives a much
                            broader picture than we can supply here. There is a great deal of information
                            about the development of different theories of vision and the origins of Gestalt
                            thinking in Gordon (1997). Some groups appear to be formed remarkably early in
                            the visual process, a phenomenon known as pop out (Triesman 1982).
                                 We believe the watershed is originally due to Digabel and Lantu´ejoul (1978);
                            see Vincent and Soille (1991). Fukunaga and Hostetler (1975) first described mean
                            shift, but it was largely ignored until the work of Cheng (1995). It is now a main-
                            stay of computer vision research; as we shall see in the following chapters, it has
                            numerous applications.
                                 A variety of graph theoretical clustering methods have been used in vision
                            (see Sarkar and Boyer (1998), and Wu and Leahy (1993); there is a summary
                            in Weiss (1999)).
                                 Interactive segmentation became possible because of extremely fast min-cut
   314   315   316   317   318   319   320   321   322   323   324