Page 70 -
P. 70

2.1 Geometric primitives and transformations                                            49


                  The conversion between the various focal length representations is straightforward, e.g.,
               to go from a unitless f to one expressed in pixels, multiply by W/2, while to convert from an
               f expressed in pixels to the equivalent 35mm focal length, multiply by 35/W.


               Camera matrix
               Now that we have shown how to parameterize the calibration matrix K, we can put the
               camera intrinsics and extrinsics together to obtain a single 3 × 4 camera matrix


                                            P = K    R   t  .                       (2.63)
               It is sometimes preferable to use an invertible 4 × 4 matrix, which can be obtained by not
               dropping the last row in the P matrix,

                                            K   0      R  t
                                                                ˜
                                     ˜
                                     P =                     = KE,                  (2.64)
                                           0 T  1    0 T  1
                                                                  ˜
               where E is a 3D rigid-body (Euclidean) transformation and K is the full-rank calibration
                                           ˜
               matrix. The 4 × 4 camera matrix P can be used to map directly from 3D world coordinates
                ¯ p =(x w ,y w ,z w , 1) to screen coordinates (plus disparity), x s =(x s ,y s , 1,d),
                 w
                                                     ˜
                                               x s ∼ P ¯p ,                         (2.65)
                                                       w
                                                                             ˜
               where ∼ indicates equality up to scale. Note that after multiplication by P , the vector is
               divided by the third element of the vector to obtain the normalized form x s =(x s ,y s , 1,d).

               Plane plus parallax (projective depth)
                                                 ˜
               In general, when using the 4 × 4 matrix P , we have the freedom to remap the last row to
               whatever suits our purpose (rather than just being the “standard” interpretation of disparity as
                                                     ˜
               inverse depth). Let us re-write the last row of P as p = s 3 [ˆn 0 |c 0 ], where  ˆn 0   =1.We
                                                           3
               then have the equation
                                               s 3
                                           d =   (ˆn 0 · p + c 0 ),                 (2.66)
                                                       w
                                               z
               where z = p · ¯p = r z · (p − c) is the distance of p from the camera center C (2.25)
                          2
                                                             w
                                       w
                              w
               along the optical axis Z (Figure 2.11). Thus, we can interpret d as the projective disparity
               or projective depth of a 3D scene point p  from the reference plane ˆn 0 · p + c 0 =0
                                                  w                            w
               (Szeliski and Coughlan 1997; Szeliski and Golland 1999; Shade, Gortler, He et al. 1998;
               Baker, Szeliski, and Anandan 1998). (The projective depth is also sometimes called parallax
               in reconstruction algorithms that use the term plane plus parallax (Kumar, Anandan, and
               Hanna 1994; Sawhney 1994).) Setting ˆn 0 = 0 and c 0 =1, i.e., putting the reference plane
               at infinity, results in the more standard d =1/z version of disparity (Okutomi and Kanade
               1993).
                                                  ˜
                  Another way to see this is to invert the P matrix so that we can map pixels plus disparity
               directly back to 3D points,
                                                    ˜
                                               ˜ p = P  −1 x s .                    (2.67)
                                                w
                                      ˜
               In general, we can choose P to have whatever form is convenient, i.e., to sample space us-
               ing an arbitrary projection. This can come in particularly handy when setting up multi-view
   65   66   67   68   69   70   71   72   73   74   75