Page 213 -

P. 213

192 4 Feature detection and matching

Figure 4.10 Multi-scale oriented patches (MOPS) extracted at ﬁve pyramid levels (Brown, Szeliski, and Winder
2005) c 2005 IEEE. The boxes show the feature orientation and the region from which the descriptor vectors are
sampled.

point locations. Based on this work, Lowe (2004) proposed computing a set of sub-octave
Difference of Gaussian ﬁlters (Figure 4.11a), looking for 3D (space+scale) maxima in the re-
sulting structure (Figure 4.11b), and then computing a sub-pixel space+scale location using a
quadratic ﬁt (Brown and Lowe 2002). The number of sub-octave levels was determined, after
careful empirical investigation, to be three, which corresponds to a quarter-octave pyramid,
which is the same as used by Triggs (2004).
As with the Harris operator, pixels where there is strong asymmetry in the local curvature
of the indicator function (in this case, the DoG) are rejected. This is implemented by ﬁrst
computing the local Hessian of the difference image D,

D xx D xy
H = , (4.12)
D xy D yy
and then rejecting keypoints for which
Tr(H) 2
> 10. (4.13)
Det(H)

While Lowe’s Scale Invariant Feature Transform (SIFT) performs well in practice, it is not
based on the same theoretical foundation of maximum spatial stability as the auto-correlation-
based detectors. (In fact, its detection locations are often complementary to those produced
by such techniques and can therefore be used in conjunction with these other approaches.)
In order to add a scale selection mechanism to the Harris corner detector, Mikolajczyk and
Schmid (2004) evaluate the Laplacian of Gaussian function at each detected Harris point (in
a multi-scale pyramid) and keep only those points for which the Laplacian is extremal (larger
or smaller than both its coarser and ﬁner-level values). An optional iterative reﬁnement for
both scale and position is also proposed and evaluated. Additional examples of scale invariant

208 209 210 211 212 213 214 215 216 217 218