Page 226 -
P. 226
4.1 Points and patches 205
Figure 4.26 The three Haar wavelet coefficients used for hashing the MOPS descriptor devised by Brown,
Szeliski, and Winder (2005) are computed by summing each 8 × 8 normalized patch over the light and dark gray
regions and taking their difference.
Efficient matching
Once we have decided on a matching strategy, we still need to search efficiently for poten-
tial candidates. The simplest way to find all corresponding feature points is to compare all
features against all other features in each pair of potentially matching images. Unfortunately,
this is quadratic in the number of extracted features, which makes it impractical for most
applications.
A better approach is to devise an indexing structure, such as a multi-dimensional search
tree or a hash table, to rapidly search for features near a given feature. Such indexing struc-
tures can either be built for each image independently (which is useful if we want to only
consider certain potential matches, e.g., searching for a particular object) or globally for all
the images in a given database, which can potentially be faster, since it removes the need to it-
erate over each image. For extremely large databases (millions of images or more), even more
efficient structures based on ideas from document retrieval (e.g., vocabulary trees,(Nist´ er and
Stew´ enius 2006)) can be used (Section 14.3.2).
One of the simpler techniques to implement is multi-dimensional hashing, which maps
descriptors into fixed size buckets based on some function applied to each descriptor vector.
At matching time, each new feature is hashed into a bucket, and a search of nearby buckets
is used to return potential candidates, which can then be sorted or graded to determine which
are valid matches.
A simple example of hashing is the Haar wavelets used by Brown, Szeliski, and Winder
(2005) in their MOPS paper. During the matching structure construction, each 8 × 8 scaled,
oriented, and normalized MOPS patch is converted into a three-element index by perform-
ing sums over different quadrants of the patch (Figure 4.26). The resulting three values are
normalized by their expected standard deviations and then mapped to the two (of b =10)
nearest 1D bins. The three-dimensional indices formed by concatenating the three quantized
3
values are used to index the 2 =8 bins where the feature is stored (added). At query time,
only the primary (closest) indices are used, so only a single three-dimensional bin needs to
be examined. The coefficients in the bin can then be used to select k approximate nearest
neighbors for further processing (such as computing the NNDR).
A more complex, but more widely applicable, version of hashing is called locality sen-
sitive hashing, which uses unions of independently computed hashing functions to index
the features (Gionis, Indyk, and Motwani 1999; Shakhnarovich, Darrell, and Indyk 2006).
Shakhnarovich, Viola, and Darrell (2003) extend this technique to be more sensitive to the