Page 409 -
P. 409

Section 12.2  Model-based Vision: Registering Rigid Objects with Projection  377











                            Model                  Input image              Overlaid

                            FIGURE 12.6: A plane object registered to an image. On the left, an image of an object;
                            in the center, an image containing two instances of this object, along with some other
                            stuff (the popular term is clutter). Feature points are detected, and then correspondences
                            between groups—in this case, triples of points—are searched; each correspondence gives
                            rise to an affine transformation from the model to the image. Satisfactory correspondences
                            align many model edge points with image edge points, as in the figure on the left,which
                            is why the method is sometimes called alignment. The images in this figure come from one
                            of the earliest papers on the subject and are affected by the poor reproduction techniques
                            of the time. This figure was originally published as Figure 7 of “Object recognition using
                            alignment,” D.P. Huttenlocher and S. Ullman, Proc. IEEE ICCV, 1986. c   IEEE, 1986.



                     12.2.1 Verification: Comparing Transformed and Rendered Source to Target
                            The main difficulty with a RANSAC-style search for a transformation that registers
                            a 3D object with an image is that, in practical cases, a good score is difficult to
                            get. A strategy for computing a scoring function is straightforward, if we recall the
                            term render, a general-purpose description for producing an image from models,
                            encompassing everything from constructing line drawings to producing physically
                            accurate shaded images. We take the estimated transformation, apply it to the
                            object model, then render the transformed object model using our camera model.
                            We now take the rendering, and compare it to the image. The difficulty lies in the
                            form of the comparison (which will determine what we need to render).
                                 We need a scoring function that can take into account all available image evi-
                            dence. This could include tokens, which could be difficult to identify with certainty
                            (such as corners or edge points) or such evidence as image texture. If we know
                            all the lighting conditions under which the object is being viewed, we might even
                            be able to use pixel intensity (this hardly ever happens in practice). Usually, all
                            we know about the illumination is that it is bright enough that we can find some
                            tokens, which is why we have a registration hypothesis to test. This means that
                            comparisons should be robust to changes in illumination. By far the most impor-
                            tant test in practice is to render the silhouette of the object and then compare it
                            to edge points in an image.
                                 A natural test is to overlay object silhouette edges on the image using the
                            camera model, and then score the hypothesis by comparing these points with ac-
                            tual image edge points. The usual score is the fraction of the length of predicted
                            silhouette edges that lie nearby actual image edge points. This is invariant to rota-
                            tion and translation in the camera frame, which is a good thing, but changes with
                            scale, which might not be a bad thing. It is usual to allow edge points to contribute
   404   405   406   407   408   409   410   411   412   413   414