Page 67 -
P. 67
46 2 Image formation
The combined 2D to 3D projection can then be written as
⎡ ⎤
s x 0 0 ⎡ ⎤
x s
0 0
⎢ ⎥
s y
p = R s c s ⎢ ⎥ ⎣ y s ⎦ = M s ¯x s . (2.53)
0 0 0
⎣ ⎦
1
0 0 1
The first two columns of the 3 × 3 matrix M s are the 3D vectors corresponding to unit steps
in the image pixel array along the x s and y s directions, while the third column is the 3D
image array origin c s .
The matrix M s is parameterized by eight unknowns: the three parameters describing
the rotation R s , the three parameters describing the translation c s , and the two scale factors
(s x ,s y ). Note that we ignore here the possibility of skew between the two axes on the image
plane, since solid-state manufacturing techniques render this negligible. In practice, unless
we have accurate external knowledge of the sensor spacing or sensor orientation, there are
only seven degrees of freedom, since the distance of the sensor from the origin cannot be
teased apart from the sensor spacing, based on external image measurement alone.
However, estimating a camera model M s with the required seven degrees of freedom
(i.e., where the first two columns are orthogonal after an appropriate re-scaling) is impractical,
so most practitioners assume a general 3 × 3 homogeneous matrix form.
The relationship between the 3D pixel center p and the 3D camera-centered point p is
c
given by an unknown scaling s, p = sp . We can therefore write the complete projection
c
between p and a homogeneous version of the pixel address ˜x s as
c
−1
˜ x s = αM s p = Kp . (2.54)
c
c
The 3 × 3 matrix K is called the calibration matrix and describes the camera intrinsics (as
opposed to the camera’s orientation in space, which are called the extrinsics).
From the above discussion, we see that K has seven degrees of freedom in theory and
eight degrees of freedom (the full dimensionality of a 3×3 homogeneous matrix) in practice.
Why, then, do most textbooks on 3D computer vision and multi-view geometry (Faugeras
1993; Hartley and Zisserman 2004; Faugeras and Luong 2001) treat K as an upper-triangular
matrix with five degrees of freedom?
While this is usually not made explicit in these books, it is because we cannot recover
the full K matrix based on external measurement alone. When calibrating a camera (Chap-
ter 6) based on external 3D points or other measurements (Tsai 1987), we end up estimating
the intrinsic (K) and extrinsic (R, t) camera parameters simultaneously using a series of
measurements,
˜ x s = K R t p = Pp , (2.55)
w
w
where p are known 3D world coordinates and
w
P = K[R|t] (2.56)
is known as the camera matrix. Inspecting this equation, we see that we can post-multiply
T
K by R 1 and pre-multiply [R|t] by R , and still end up with a valid calibration. Thus, it
1
is impossible based on image measurements alone to know the true orientation of the sensor
and the true camera intrinsics.