Page 67 -
P. 67

46                                                                        2 Image formation


                                   The combined 2D to 3D projection can then be written as

                                                             ⎡           ⎤
                                                               s x  0  0   ⎡    ⎤
                                                                             x s
                                                                0      0
                                                              ⎢          ⎥
                                                                   s y
                                               p =  R s  c s  ⎢          ⎥ ⎣  y s  ⎦  = M s ¯x s .   (2.53)
                                                                0   0  0
                                                             ⎣           ⎦
                                                                              1
                                                                0   0  1
                                The first two columns of the 3 × 3 matrix M s are the 3D vectors corresponding to unit steps
                                in the image pixel array along the x s and y s directions, while the third column is the 3D
                                image array origin c s .
                                   The matrix M s is parameterized by eight unknowns: the three parameters describing
                                the rotation R s , the three parameters describing the translation c s , and the two scale factors
                                (s x ,s y ). Note that we ignore here the possibility of skew between the two axes on the image
                                plane, since solid-state manufacturing techniques render this negligible. In practice, unless
                                we have accurate external knowledge of the sensor spacing or sensor orientation, there are
                                only seven degrees of freedom, since the distance of the sensor from the origin cannot be
                                teased apart from the sensor spacing, based on external image measurement alone.
                                   However, estimating a camera model M s with the required seven degrees of freedom
                                (i.e., where the first two columns are orthogonal after an appropriate re-scaling) is impractical,
                                so most practitioners assume a general 3 × 3 homogeneous matrix form.
                                   The relationship between the 3D pixel center p and the 3D camera-centered point p is
                                                                                                       c
                                given by an unknown scaling s, p = sp . We can therefore write the complete projection
                                                                 c
                                between p and a homogeneous version of the pixel address ˜x s as
                                        c
                                                                   −1
                                                           ˜ x s = αM s  p = Kp .                    (2.54)
                                                                      c
                                                                             c
                                The 3 × 3 matrix K is called the calibration matrix and describes the camera intrinsics (as
                                opposed to the camera’s orientation in space, which are called the extrinsics).
                                   From the above discussion, we see that K has seven degrees of freedom in theory and
                                eight degrees of freedom (the full dimensionality of a 3×3 homogeneous matrix) in practice.
                                Why, then, do most textbooks on 3D computer vision and multi-view geometry (Faugeras
                                1993; Hartley and Zisserman 2004; Faugeras and Luong 2001) treat K as an upper-triangular
                                matrix with five degrees of freedom?
                                   While this is usually not made explicit in these books, it is because we cannot recover
                                the full K matrix based on external measurement alone. When calibrating a camera (Chap-
                                ter 6) based on external 3D points or other measurements (Tsai 1987), we end up estimating
                                the intrinsic (K) and extrinsic (R, t) camera parameters simultaneously using a series of
                                measurements,

                                                        ˜ x s = K  R  t  p = Pp ,                    (2.55)
                                                                               w
                                                                        w
                                where p are known 3D world coordinates and
                                      w
                                                               P = K[R|t]                            (2.56)
                                is known as the camera matrix. Inspecting this equation, we see that we can post-multiply
                                                               T
                                K by R 1 and pre-multiply [R|t] by R , and still end up with a valid calibration. Thus, it
                                                               1
                                is impossible based on image measurements alone to know the true orientation of the sensor
                                and the true camera intrinsics.
   62   63   64   65   66   67   68   69   70   71   72