Page 406 -
P. 406

Section 12.1  Registering Rigid Objects  374


























                            FIGURE 12.4: The two mountain images of Figure 12.3, now rectified with a homography.
                            Notice how well all features line up; this transformation involves more than just rotation
                            and translation, as you can see from the fact that the corner of the second image (which can
                            be seen in the middle, near the top), is no longer a right angle. Notice also that intensity
                            effects in the camera far field mean that the boundary where the two images overlap is
                            unpleasantly obvious. This figure was originally published as Figure 1 M. Brown and D.
                            Lowe, “Recognizing Panoramas,” Proc. ICCV 2003, c   IEEE, 2003.


                                 Registering images into mosaics gets more interesting when there are more
                            than two images. Imagine we have three images, I 1 , I 2 ,and I 3 . We could register
                            image one to image two, then image two to image three. But, if image three has
                            some features that match to features in image one, this might not be wise. Write
                            T 2→1 for the estimated transformation that takes image two into image one’s frame
                            (and so on). The problem is that T 2→1 ◦T 3→2 might not be a good estimate of
                            T 3→1 the transformation from image three’s frame to image one’s frame. The error
                            might not be all that large in the case of just three images, but it can accumulate.
                                 To deal with this accumulation, we need some method to estimate all registra-
                            tions in one go, using all error terms. Doing so is often called bundle adjustment,by
                            analogy with the relevant term in structure from motion (Section 8.3.3). A natural
                            method is to choose a coordinate frame within which to work—for example, the
                            frame of the first image—then search for a set of maps that take each other image
                            into that frame and minimize the sum of squared errors between all matching pairs
                                                         
  (i)  (k)
                            of points. For our example, write x , x  for the jth tuple consisting of a point
                                                                  j
                            x (i)  in image i that matches a point x (k)  in image k. We would estimate T 2→1 and
   401   402   403   404   405   406   407   408   409   410   411