Page 301 -

P. 301

280 6 Feature-based alignment

However, this is not optimal from a statistical point of view, since the denominator D, which
was used to multiply each equation, can vary quite a bit from point to point. 6
One way to compensate for this is to reweight each equation by the inverse of the current
estimate of the denominator, D,

⎡ ⎤
h 00

1 ˆ x − x 1 xy 1 0 0 0 −ˆx x −ˆx y .

= ⎢ . ⎦ . (6.22)
⎥

D ˆ y − y D 0 0 0 xy 1 −ˆy x −ˆy y ⎣ .

h 21
While this may at ﬁrst seem to be the exact same set of equations as (6.21), because least
squares is being used to solve the over-determined set of equations, the weightings do matter
and produce a different set of normal equations that performs better in practice.
The most principled way to do the estimation, however, is to directly minimize the squared
residual equations (6.13) using the Gauss–Newton approximation, i.e., performing a ﬁrst-
order Taylor series expansion in p, as shown in (6.14), which yields the set of equations
⎡ ⎤
Δh 00
ˆ x − ˜x 1 xy 1 0 0 0 −˜x x −˜x y .

= ⎢ . ⎦ . (6.23)
⎥
ˆ y − ˜y D 0 0 0 xy 1 −˜y x −˜y y ⎣ .

Δh 21
While these look similar to (6.22), they differ in two important respects. First, the left hand
side consists of unweighted prediction errors rather than point displacements and the solution
vector is a perturbation to the parameter vector p. Second, the quantities inside J involve

predicted feature locations (˜x , ˜y ) instead of sensed feature locations (ˆx , ˆy ). Both of these
differences are subtle and yet they lead to an algorithm that, when combined with proper
checking for downhill steps (as in the Levenberg–Marquardt algorithm), will converge to a
local minimum. Note that iterating Equations (6.22) is not guaranteed to converge, since it is
not minimizing a well-deﬁned energy function.
Equation (6.23) is analogous to the additive algorithm for direct intensity-based regis-
tration (Section 8.2), since the change to the full transformation is being computed. If we
prepend an incremental homography to the current homography instead, i.e., we use a com-
positional algorithm (described in Section 8.2), we get D =1 (since p =0) and the above
formula simpliﬁes to
⎡ ⎤
Δh 00
2

ˆ x − x xy 1 0 0 0 −x −xy .
= ⎢ . . ⎦ , (6.24)
⎥

ˆ y − y 0 0 0 xy 1 −xy −y 2 ⎣
Δh 21
where we have replaced (˜x , ˜y ) with (x, y) for conciseness. (Notice how this results in the

same Jacobian as (8.63).)
6 Hartley and Zisserman (2004) call this strategy of forming linear equations from rational equations the direct
linear transform, but that term is more commonly associated with pose estimation (Section 6.2). Note also that our
deﬁnition of the h ij parameters differs from that used in their book, since we deﬁne h ii to be the difference from
unity and we do not leave h 22 as a free parameter, which means that we cannot handle certain extreme homographies.

296 297 298 299 300 301 302 303 304 305 306