Page 213 -
P. 213

192                                                          4 Feature detection and matching




























                Figure 4.10 Multi-scale oriented patches (MOPS) extracted at five pyramid levels (Brown, Szeliski, and Winder
                2005) c   2005 IEEE. The boxes show the feature orientation and the region from which the descriptor vectors are
                sampled.


                                point locations. Based on this work, Lowe (2004) proposed computing a set of sub-octave
                                Difference of Gaussian filters (Figure 4.11a), looking for 3D (space+scale) maxima in the re-
                                sulting structure (Figure 4.11b), and then computing a sub-pixel space+scale location using a
                                quadratic fit (Brown and Lowe 2002). The number of sub-octave levels was determined, after
                                careful empirical investigation, to be three, which corresponds to a quarter-octave pyramid,
                                which is the same as used by Triggs (2004).
                                   As with the Harris operator, pixels where there is strong asymmetry in the local curvature
                                of the indicator function (in this case, the DoG) are rejected. This is implemented by first
                                computing the local Hessian of the difference image D,

                                                                  D xx  D xy
                                                           H =               ,                       (4.12)
                                                                  D xy  D yy
                                and then rejecting keypoints for which
                                                               Tr(H) 2
                                                                      > 10.                          (4.13)
                                                               Det(H)

                                   While Lowe’s Scale Invariant Feature Transform (SIFT) performs well in practice, it is not
                                based on the same theoretical foundation of maximum spatial stability as the auto-correlation-
                                based detectors. (In fact, its detection locations are often complementary to those produced
                                by such techniques and can therefore be used in conjunction with these other approaches.)
                                In order to add a scale selection mechanism to the Harris corner detector, Mikolajczyk and
                                Schmid (2004) evaluate the Laplacian of Gaussian function at each detected Harris point (in
                                a multi-scale pyramid) and keep only those points for which the Laplacian is extremal (larger
                                or smaller than both its coarser and finer-level values). An optional iterative refinement for
                                both scale and position is also proposed and evaluated. Additional examples of scale invariant
   208   209   210   211   212   213   214   215   216   217   218