Page 307 -
P. 307
286 6 Feature-based alignment
2
d can computed as ratios of successive d 2n+2 /d 2n estimates and these can be averaged to
i
i
i
2
obtain a final estimate of d (and hence d i ).
i
Once the individual estimates of the d i distances have been computed, we can generate
a 3D structure consisting of the scaled point directions d i ˆ x i , which can then be aligned with
the 3D point cloud {p } using absolute orientation (Section 6.1.5) to obtained the desired
i
pose estimate. Quan and Lan (1999) give accuracy results for this and other techniques,
which use fewer points but require more complicated algebraic manipulations. The paper by
Moreno-Noguer, Lepetit, and Fua (2007) reviews more recent alternatives and also gives a
lower complexity algorithm that typically produces more accurate results.
Unfortunately, because minimal PnP solutions can be quite noise sensitive and also suffer
from bas-relief ambiguities (e.g., depth reversals) (Section 7.4.3), it is often preferable to use
the linear six-point algorithm to guess an initial pose and then optimize this estimate using
the iterative technique described in Section 6.2.2.
An alternative pose estimation algorithm involves starting with a scaled orthographic pro-
jection model and then iteratively refining this initial estimate using a more accurate perspec-
tive projection model (DeMenthon and Davis 1995). The attraction of this model, as stated
in the paper’s title, is that it can be implemented “in 25 lines of [Mathematica] code”.
6.2.2 Iterative algorithms
The most accurate (and flexible) way to estimate pose is to directly minimize the squared (or
robust) reprojection error for the 2D points as a function of the unknown pose parameters in
(R, t) and optionally K using non-linear least squares (Tsai 1987; Bogart 1991; Gleicher
and Witkin 1992). We can write the projection equations as
x i = f(p ; R, t, K) (6.42)
i
and iteratively minimize the robustified linearized reprojection errors
∂f ∂f ∂f
E NLP = ρ ΔR + Δt + ΔK − r i , (6.43)
∂R ∂t ∂K
i
where r i = ˜x i − ˆ x i is the current residual vector (2D error in predicted position) and the
partial derivatives are with respect to the unknown pose parameters (rotation, translation, and
optionally calibration). Note that if full 2D covariance estimates are available for the 2D
feature locations, the above squared norm can be weighted by the inverse point covariance
matrix, as in Equation (6.11).
An easier to understand (and implement) version of the above non-linear regression prob-
lem can be constructed by re-writing the projection equations as a concatenation of simpler
steps, each of which transforms a 4D homogeneous coordinate p by a simple transformation
i
such as translation, rotation, or perspective division (Figure 6.5). The resulting projection
equations can be written as
y (1) = f (p ; c j )= p − c j , (6.44)
T i i
y (2) = f (y (1) ; q )= R(q ) y (1) , (6.45)
R j j
y (2)
(3)
(2)
y = f (y )= , (6.46)
P
z (2)
= f (y (3) ; k)= K(k) y (3) . (6.47)
x i C