Page 228 -
P. 228
4.1 Points and patches 207
slicing that uses a series of 1D binary searches on the point list sorted along different dimen-
sions to efficiently cull down a list of candidate points that lie within a hypercube of the query
point. Grauman and Darrell (2005) reweight the matches at different levels of an indexing
tree, which allows their technique to be less sensitive to discretization errors in the tree con-
struction. Nist´ er and Stew´ enius (2006) use a metric tree, which compares feature descriptors
to a small number of prototypes at each level in a hierarchy. The resulting quantized visual
words can then be used with classical information retrieval (document relevance) techniques
to quickly winnow down a set of potential candidates from a database of millions of images
(Section 14.3.2). Muja and Lowe (2009) compare a number of these approaches, introduce a
new one of their own (priority search on hierarchical k-means trees), and conclude that mul-
tiple randomized k-d trees often provide the best performance. Despite all of this promising
work, the rapid computation of image feature correspondences remains a challenging open
research problem.
Feature match verification and densification
Once we have some hypothetical (putative) matches, we can often use geometric alignment
(Section 6.1) to verify which matches are inliers and which ones are outliers. For example,
if we expect the whole image to be translated or rotated in the matching view, we can fit a
global geometric transform and keep only those feature matches that are sufficiently close to
this estimated transformation. The process of selecting a small set of seed matches and then
verifying a larger set is often called random sampling or RANSAC (Section 6.1.4). Once an
initial set of correspondences has been established, some systems look for additional matches,
e.g., by looking for additional correspondences along epipolar lines (Section 11.1)orinthe
vicinity of estimated locations based on the global transform. These topics are discussed
further in Sections 6.1, 11.2, and 14.3.1.
4.1.4 Feature tracking
An alternative to independently finding features in all candidate images and then matching
them is to find a set of likely feature locations in a first image and to then search for their
corresponding locations in subsequent images. This kind of detect then track approach is
more widely used for video tracking applications, where the expected amount of motion and
appearance deformation between adjacent frames is expected to be small.
The process of selecting good features to track is closely related to selecting good features
for more general recognition applications. In practice, regions containing high gradients in
both directions, i.e., which have high eigenvalues in the auto-correlation matrix (4.8), provide
stable locations at which to find correspondences (Shi and Tomasi 1994).
In subsequent frames, searching for locations where the corresponding patch has low
squared difference (4.1) often works well enough. However, if the images are undergo-
ing brightness change, explicitly compensating for such variations (8.9) or using normalized
cross-correlation (8.11) may be preferable. If the search range is large, it is also often more
efficient to use a hierarchical search strategy, which uses matches in lower-resolution images
to provide better initial guesses and hence speed up the search (Section 8.1.1). Alternatives
to this strategy involve learning what the appearance of the patch being tracked should be and
then searching for it in the vicinity of its predicted position (Avidan 2001; Jurie and Dhome