Page 221 -

P. 221

200 4 Feature detection and matching

Figure 4.21 Recognizing objects in a cluttered scene (Lowe 2004) c 2004 Springer. Two of the training images
in the database are shown on the left. These are matched to the cluttered scene in the middle using SIFT features,
shown as small squares in the right image. The afﬁne warp of each recognized database image onto the scene is
shown as a larger parallelogram in the right image.

4.1.3 Feature matching

Once we have extracted features and their descriptors from two or more images, the next step
is to establish some preliminary feature matches between these images. In this section, we
divide this problem into two separate components. The ﬁrst is to select a matching strategy,
which determines which correspondences are passed on to the next stage for further process-
ing. The second is to devise efﬁcient data structures and algorithms to perform this matching
as quickly as possible. (See the discussion of related techniques in Section 14.3.2.)

Matching strategy and error rates
Determining which feature matches are reasonable to process further depends on the context
in which the matching is being performed. Say we are given two images that overlap to a fair
amount (e.g., for image stitching, as in Figure 4.16, or for tracking objects in a video). We
know that most features in one image are likely to match the other image, although some may
not match because they are occluded or their appearance has changed too much.
On the other hand, if we are trying to recognize how many known objects appear in a clut-
tered scene (Figure 4.21), most of the features may not match. Furthermore, a large number
of potentially matching objects must be searched, which requires more efﬁcient strategies, as
described below.
To begin with, we assume that the feature descriptors have been designed so that Eu-
clidean (vector magnitude) distances in feature space can be directly used for ranking poten-
tial matches. If it turns out that certain parameters (axes) in a descriptor are more reliable
than others, it is usually preferable to re-scale these axes ahead of time, e.g., by determin-
ing how much they vary when compared against other known good matches (Hua, Brown,
and Winder 2007). A more general process, which involves transforming feature vectors
into a new scaled basis, is called whitening and is discussed in more detail in the context of
eigenface-based face recognition (Section 14.2.1).

216 217 218 219 220 221 222 223 224 225 226