Page 63 -
P. 63

42                                                                        2 Image formation


                                easier to express exact rotations. When the angle is in radians, the derivatives of R with
                                respect to ω can easily be computed (2.36).
                                   Quaternions, on the other hand, are better if you want to keep track of a smoothly moving
                                camera, since there are no discontinuities in the representation. It is also easier to interpolate
                                between rotations and to chain rigid transformations (Murray, Li, and Sastry 1994; Bregler
                                and Malik 1998).
                                   My usual preference is to use quaternions, but to update their estimates using an incre-
                                mental rotation, as described in Section 6.2.2.

                                2.1.5 3D to 2D projections

                                Now that we know how to represent 2D and 3D geometric primitives and how to transform
                                them spatially, we need to specify how 3D primitives are projected onto the image plane. We
                                can do this using a linear 3D to 2D projection matrix. The simplest model is orthography,
                                which requires no division to get the final (inhomogeneous) result. The more commonly used
                                model is perspective, since this more accurately models the behavior of real cameras.


                                Orthography and para-perspective
                                An orthographic projection simply drops the z component of the three-dimensional coordi-
                                nate p to obtain the 2D point x. (In this section, we use p to denote 3D points and x to denote
                                2D points.) This can be written as
                                                              x =[I 2×2 |0] p.                       (2.46)

                                If we are using homogeneous (projective) coordinates, we can write
                                                              ⎡            ⎤
                                                                1000
                                                          ˜ x =  ⎣  0100   ⎦  ˜ p,                   (2.47)
                                                                0001
                                i.e., we drop the z component but keep the w component. Orthography is an approximate
                                model for long focal length (telephoto) lenses and objects whose depth is shallow relative
                                to their distance to the camera (Sawhney and Hanson 1991). It is exact only for telecentric
                                lenses (Baker and Nayar 1999, 2001).
                                   In practice, world coordinates (which may measure dimensions in meters) need to be
                                scaled to fit onto an image sensor (physically measured in millimeters, but ultimately mea-
                                sured in pixels). For this reason, scaled orthography is actually more commonly used,

                                                             x =[sI 2×2 |0] p.                       (2.48)
                                This model is equivalent to first projecting the world points onto a local fronto-parallel image
                                plane and then scaling this image using regular perspective projection. The scaling can be the
                                same for all parts of the scene (Figure 2.7b) or it can be different for objects that are being
                                modeled independently (Figure 2.7c). More importantly, the scaling can vary from frame to
                                frame when estimating structure from motion, which can better model the scale change that
                                occurs as an object approaches the camera.
                                   Scaled orthography is a popular model for reconstructing the 3D shape of objects far away
                                from the camera, since it greatly simplifies certain computations. For example, pose (camera
   58   59   60   61   62   63   64   65   66   67   68