Page 221 -
P. 221

200                                                          4 Feature detection and matching























                Figure 4.21 Recognizing objects in a cluttered scene (Lowe 2004) c   2004 Springer. Two of the training images
                in the database are shown on the left. These are matched to the cluttered scene in the middle using SIFT features,
                shown as small squares in the right image. The affine warp of each recognized database image onto the scene is
                shown as a larger parallelogram in the right image.


                                4.1.3 Feature matching

                                Once we have extracted features and their descriptors from two or more images, the next step
                                is to establish some preliminary feature matches between these images. In this section, we
                                divide this problem into two separate components. The first is to select a matching strategy,
                                which determines which correspondences are passed on to the next stage for further process-
                                ing. The second is to devise efficient data structures and algorithms to perform this matching
                                as quickly as possible. (See the discussion of related techniques in Section 14.3.2.)


                                Matching strategy and error rates
                                Determining which feature matches are reasonable to process further depends on the context
                                in which the matching is being performed. Say we are given two images that overlap to a fair
                                amount (e.g., for image stitching, as in Figure 4.16, or for tracking objects in a video). We
                                know that most features in one image are likely to match the other image, although some may
                                not match because they are occluded or their appearance has changed too much.
                                   On the other hand, if we are trying to recognize how many known objects appear in a clut-
                                tered scene (Figure 4.21), most of the features may not match. Furthermore, a large number
                                of potentially matching objects must be searched, which requires more efficient strategies, as
                                described below.
                                   To begin with, we assume that the feature descriptors have been designed so that Eu-
                                clidean (vector magnitude) distances in feature space can be directly used for ranking poten-
                                tial matches. If it turns out that certain parameters (axes) in a descriptor are more reliable
                                than others, it is usually preferable to re-scale these axes ahead of time, e.g., by determin-
                                ing how much they vary when compared against other known good matches (Hua, Brown,
                                and Winder 2007). A more general process, which involves transforming feature vectors
                                into a new scaled basis, is called whitening and is discussed in more detail in the context of
                                eigenface-based face recognition (Section 14.2.1).
   216   217   218   219   220   221   222   223   224   225   226