Page 213 -
P. 213
192 4 Feature detection and matching
Figure 4.10 Multi-scale oriented patches (MOPS) extracted at five pyramid levels (Brown, Szeliski, and Winder
2005) c 2005 IEEE. The boxes show the feature orientation and the region from which the descriptor vectors are
sampled.
point locations. Based on this work, Lowe (2004) proposed computing a set of sub-octave
Difference of Gaussian filters (Figure 4.11a), looking for 3D (space+scale) maxima in the re-
sulting structure (Figure 4.11b), and then computing a sub-pixel space+scale location using a
quadratic fit (Brown and Lowe 2002). The number of sub-octave levels was determined, after
careful empirical investigation, to be three, which corresponds to a quarter-octave pyramid,
which is the same as used by Triggs (2004).
As with the Harris operator, pixels where there is strong asymmetry in the local curvature
of the indicator function (in this case, the DoG) are rejected. This is implemented by first
computing the local Hessian of the difference image D,
D xx D xy
H = , (4.12)
D xy D yy
and then rejecting keypoints for which
Tr(H) 2
> 10. (4.13)
Det(H)
While Lowe’s Scale Invariant Feature Transform (SIFT) performs well in practice, it is not
based on the same theoretical foundation of maximum spatial stability as the auto-correlation-
based detectors. (In fact, its detection locations are often complementary to those produced
by such techniques and can therefore be used in conjunction with these other approaches.)
In order to add a scale selection mechanism to the Harris corner detector, Mikolajczyk and
Schmid (2004) evaluate the Laplacian of Gaussian function at each detected Harris point (in
a multi-scale pyramid) and keep only those points for which the Laplacian is extremal (larger
or smaller than both its coarser and finer-level values). An optional iterative refinement for
both scale and position is also proposed and evaluated. Additional examples of scale invariant