Page 70 -
P. 70
2.1 Geometric primitives and transformations 49
The conversion between the various focal length representations is straightforward, e.g.,
to go from a unitless f to one expressed in pixels, multiply by W/2, while to convert from an
f expressed in pixels to the equivalent 35mm focal length, multiply by 35/W.
Camera matrix
Now that we have shown how to parameterize the calibration matrix K, we can put the
camera intrinsics and extrinsics together to obtain a single 3 × 4 camera matrix
P = K R t . (2.63)
It is sometimes preferable to use an invertible 4 × 4 matrix, which can be obtained by not
dropping the last row in the P matrix,
K 0 R t
˜
˜
P = = KE, (2.64)
0 T 1 0 T 1
˜
where E is a 3D rigid-body (Euclidean) transformation and K is the full-rank calibration
˜
matrix. The 4 × 4 camera matrix P can be used to map directly from 3D world coordinates
¯ p =(x w ,y w ,z w , 1) to screen coordinates (plus disparity), x s =(x s ,y s , 1,d),
w
˜
x s ∼ P ¯p , (2.65)
w
˜
where ∼ indicates equality up to scale. Note that after multiplication by P , the vector is
divided by the third element of the vector to obtain the normalized form x s =(x s ,y s , 1,d).
Plane plus parallax (projective depth)
˜
In general, when using the 4 × 4 matrix P , we have the freedom to remap the last row to
whatever suits our purpose (rather than just being the “standard” interpretation of disparity as
˜
inverse depth). Let us re-write the last row of P as p = s 3 [ˆn 0 |c 0 ], where ˆn 0 =1.We
3
then have the equation
s 3
d = (ˆn 0 · p + c 0 ), (2.66)
w
z
where z = p · ¯p = r z · (p − c) is the distance of p from the camera center C (2.25)
2
w
w
w
along the optical axis Z (Figure 2.11). Thus, we can interpret d as the projective disparity
or projective depth of a 3D scene point p from the reference plane ˆn 0 · p + c 0 =0
w w
(Szeliski and Coughlan 1997; Szeliski and Golland 1999; Shade, Gortler, He et al. 1998;
Baker, Szeliski, and Anandan 1998). (The projective depth is also sometimes called parallax
in reconstruction algorithms that use the term plane plus parallax (Kumar, Anandan, and
Hanna 1994; Sawhney 1994).) Setting ˆn 0 = 0 and c 0 =1, i.e., putting the reference plane
at infinity, results in the more standard d =1/z version of disparity (Okutomi and Kanade
1993).
˜
Another way to see this is to invert the P matrix so that we can map pixels plus disparity
directly back to 3D points,
˜
˜ p = P −1 x s . (2.67)
w
˜
In general, we can choose P to have whatever form is convenient, i.e., to sample space us-
ing an arbitrary projection. This can come in particularly handy when setting up multi-view