Page 63 -

P. 63

42 2 Image formation

easier to express exact rotations. When the angle is in radians, the derivatives of R with
respect to ω can easily be computed (2.36).
Quaternions, on the other hand, are better if you want to keep track of a smoothly moving
camera, since there are no discontinuities in the representation. It is also easier to interpolate
between rotations and to chain rigid transformations (Murray, Li, and Sastry 1994; Bregler
and Malik 1998).
My usual preference is to use quaternions, but to update their estimates using an incre-
mental rotation, as described in Section 6.2.2.

2.1.5 3D to 2D projections

Now that we know how to represent 2D and 3D geometric primitives and how to transform
them spatially, we need to specify how 3D primitives are projected onto the image plane. We
can do this using a linear 3D to 2D projection matrix. The simplest model is orthography,
which requires no division to get the ﬁnal (inhomogeneous) result. The more commonly used
model is perspective, since this more accurately models the behavior of real cameras.

Orthography and para-perspective
An orthographic projection simply drops the z component of the three-dimensional coordi-
nate p to obtain the 2D point x. (In this section, we use p to denote 3D points and x to denote
2D points.) This can be written as
x =[I 2×2 |0] p. (2.46)

If we are using homogeneous (projective) coordinates, we can write
⎡ ⎤
1000
˜ x = ⎣ 0100 ⎦ ˜ p, (2.47)
0001
i.e., we drop the z component but keep the w component. Orthography is an approximate
model for long focal length (telephoto) lenses and objects whose depth is shallow relative
to their distance to the camera (Sawhney and Hanson 1991). It is exact only for telecentric
lenses (Baker and Nayar 1999, 2001).
In practice, world coordinates (which may measure dimensions in meters) need to be
scaled to ﬁt onto an image sensor (physically measured in millimeters, but ultimately mea-
sured in pixels). For this reason, scaled orthography is actually more commonly used,

x =[sI 2×2 |0] p. (2.48)
This model is equivalent to ﬁrst projecting the world points onto a local fronto-parallel image
plane and then scaling this image using regular perspective projection. The scaling can be the
same for all parts of the scene (Figure 2.7b) or it can be different for objects that are being
modeled independently (Figure 2.7c). More importantly, the scaling can vary from frame to
frame when estimating structure from motion, which can better model the scale change that
occurs as an object approaches the camera.
Scaled orthography is a popular model for reconstructing the 3D shape of objects far away
from the camera, since it greatly simpliﬁes certain computations. For example, pose (camera

58 59 60 61 62 63 64 65 66 67 68