Page 308 -
P. 308

6.2 Pose estimation                                                                    287



                                           y (3)             y (2)             y (1)
                       x i    f C(x) = Kx       f P(x) = p/z      f R(x) = Rx       f T(x) = x-c  p i




                                  k                                  q j               c j


               Figure 6.5 A set of chained transforms for projecting a 3D point p to a 2D measurement x i through a series of
                                                                      i
               transformations f (k) , each of which is controlled by its own set of parameters. The dashed lines indicate the flow
               of information as partial derivatives are computed during a backward pass.


               Note that in these equations, we have indexed the camera centers c j and camera rotation
               quaternions q by an index j, in case more than one pose of the calibration object is being
                          j
               used (see also Section 7.4.) We are also using the camera center c j instead of the world
               translation t j , since this is a more natural parameter to estimate.
                  The advantage of this chained set of transformations is that each one has a simple partial
               derivative with respect both to its parameters and to its input. Thus, once the predicted value
               of ˜x i has been computed based on the 3D point location p and the current values of the pose
                                                              i
               parameters (c j , q , k), we can obtain all of the required partial derivatives using the chain
                             j
               rule
                                                    ∂r i ∂y (k)
                                            ∂r i
                                                =            ,                      (6.48)
                                           ∂p (k)  ∂y (k)  ∂p (k)
               where p (k)  indicates one of the parameter vectors that is being optimized. (This same “trick”
               is used in neural networks as part of the backpropagation algorithm (Bishop 2006).)
                  The one special case in this formulation that can be considerably simplified is the compu-
               tation of the rotation update. Instead of directly computing the derivatives of the 3×3 rotation
               matrix R(q) as a function of the unit quaternion entries, you can prepend the incremental ro-
               tation matrix ΔR(ω) given in Equation (2.35) to the current rotation matrix and compute the
               partial derivative of the transform with respect to these parameters, which results in a simple
               cross product of the backward chaining partial derivative and the outgoing 3D vector (2.36).


               6.2.3 Application: Augmented reality

               A widely used application of pose estimation is augmented reality, where virtual 3D images
               or annotations are superimposed on top of a live video feed, either through the use of see-
               through glasses (a head-mounted display) or on a regular computer or mobile device screen
               (Azuma, Baillot, Behringer et al. 2001; Haller, Billinghurst, and Thomas 2007). In some
               applications, a special pattern printed on cards or in a book is tracked to perform the aug-
               mentation (Kato, Billinghurst, Poupyrev et al. 2000; Billinghurst, Kato, and Poupyrev 2001).
               For a desktop application, a grid of dots printed on a mouse pad can be tracked by a camera
               embedded in an augmented mouse to give the user control of a full six degrees of freedom
               over their position and orientation in a 3D space (Hinckley, Sinclair, Hanson et al. 1999), as
               shown in Figure 6.6.
                  Sometimes, the scene itself provides a convenient object to track, such as the rectangle
               defining a desktop used in through-the-lens camera control (Gleicher and Witkin 1992). In
   303   304   305   306   307   308   309   310   311   312   313