Page 226 -

P. 226

4.1 Points and patches 205

Figure 4.26 The three Haar wavelet coefﬁcients used for hashing the MOPS descriptor devised by Brown,
Szeliski, and Winder (2005) are computed by summing each 8 × 8 normalized patch over the light and dark gray
regions and taking their difference.

Efﬁcient matching

Once we have decided on a matching strategy, we still need to search efﬁciently for poten-
tial candidates. The simplest way to ﬁnd all corresponding feature points is to compare all
features against all other features in each pair of potentially matching images. Unfortunately,
this is quadratic in the number of extracted features, which makes it impractical for most
applications.
A better approach is to devise an indexing structure, such as a multi-dimensional search
tree or a hash table, to rapidly search for features near a given feature. Such indexing struc-
tures can either be built for each image independently (which is useful if we want to only
consider certain potential matches, e.g., searching for a particular object) or globally for all
the images in a given database, which can potentially be faster, since it removes the need to it-
erate over each image. For extremely large databases (millions of images or more), even more
efﬁcient structures based on ideas from document retrieval (e.g., vocabulary trees,(Nist´ er and
Stew´ enius 2006)) can be used (Section 14.3.2).
One of the simpler techniques to implement is multi-dimensional hashing, which maps
descriptors into ﬁxed size buckets based on some function applied to each descriptor vector.
At matching time, each new feature is hashed into a bucket, and a search of nearby buckets
is used to return potential candidates, which can then be sorted or graded to determine which
are valid matches.
A simple example of hashing is the Haar wavelets used by Brown, Szeliski, and Winder
(2005) in their MOPS paper. During the matching structure construction, each 8 × 8 scaled,
oriented, and normalized MOPS patch is converted into a three-element index by perform-
ing sums over different quadrants of the patch (Figure 4.26). The resulting three values are
normalized by their expected standard deviations and then mapped to the two (of b =10)
nearest 1D bins. The three-dimensional indices formed by concatenating the three quantized
3
values are used to index the 2 =8 bins where the feature is stored (added). At query time,
only the primary (closest) indices are used, so only a single three-dimensional bin needs to
be examined. The coefﬁcients in the bin can then be used to select k approximate nearest
neighbors for further processing (such as computing the NNDR).
A more complex, but more widely applicable, version of hashing is called locality sen-
sitive hashing, which uses unions of independently computed hashing functions to index
the features (Gionis, Indyk, and Motwani 1999; Shakhnarovich, Darrell, and Indyk 2006).
Shakhnarovich, Viola, and Darrell (2003) extend this technique to be more sensitive to the

221 222 223 224 225 226 227 228 229 230 231