Page 205 -
P. 205

184                                                          4 Feature detection and matching































                Figure 4.2 Two pairs of images to be matched. What kinds of feature might one use to establish a set of
                correspondences between these images?




                                detect features in all the images under consideration and then match features based on their
                                local appearance (Section 4.1.3). The former approach is more suitable when images are
                                taken from nearby viewpoints or in rapid succession (e.g., video sequences), while the lat-
                                ter is more suitable when a large amount of motion or appearance change is expected, e.g.,
                                in stitching together panoramas (Brown and Lowe 2007), establishing correspondences in
                                wide baseline stereo (Schaffalitzky and Zisserman 2002), or performing object recognition
                                (Fergus, Perona, and Zisserman 2007).
                                   In this section, we split the keypoint detection and matching pipeline into four separate
                                stages. During the feature detection (extraction) stage (Section 4.1.1), each image is searched
                                for locations that are likely to match well in other images. At the feature description stage
                                (Section 4.1.2), each region around detected keypoint locations is converted into a more com-
                                pact and stable (invariant) descriptor that can be matched against other descriptors. The
                                feature matching stage (Section 4.1.3) efficiently searches for likely matching candidates in
                                other images. The feature tracking stage (Section 4.1.4) is an alternative to the third stage
                                that only searches a small neighborhood around each detected feature and is therefore more
                                suitable for video processing.
                                   A wonderful example of all of these stages can be found in David Lowe’s (2004) paper,
                                which describes the development and refinement of his Scale Invariant Feature Transform
                                (SIFT). Comprehensive descriptions of alternative techniques can be found in a series of
                                survey and evaluation papers covering both feature detection (Schmid, Mohr, and Bauck-
                                hage 2000; Mikolajczyk, Tuytelaars, Schmid et al. 2005; Tuytelaars and Mikolajczyk 2007)
                                and feature descriptors (Mikolajczyk and Schmid 2005). Shi and Tomasi (1994) and Triggs
                                (2004) also provide nice reviews of feature detection techniques.
   200   201   202   203   204   205   206   207   208   209   210