Page 229 -
P. 229

CH APT E R               7


                            Stereopsis




                            Fusing the pictures recorded by our two eyes and exploiting the difference (or dis-
                            parity) between them allows us to gain a strong sense of depth. This chapter is
                            concerned with the design and implementation of algorithms that mimic our ability
                            to perform this task, known as stereopsis. Reliable computer programs for stereo-
                            scopic perception are of course invaluable in visual robot navigation (Figure 7.1),
                            cartography, aerial reconnaissance, and close-range photogrammetry. They are also
                            of great interest in tasks such as image segmentation for object recognition or the
                            construction of three-dimensional scene models for computer graphics applications.

















                            FIGURE 7.1: Left: The Stanford cart sports a single camera moving in discrete increments
                            along a straight line and providing multiple snapshots of outdoor scenes. Center: The
                            INRIA mobile robot uses three cameras to map its environment. Right: The NYU
                            mobile robot uses two stereo cameras, each capable of delivering an image pair. As shown
                            by these examples, although two eyes are sufficient for stereo fusion, mobile robots are
                            sometimes equipped with three (or more) cameras. The bulk of this chapter is concerned
                            with binocular perception but stereo algorithms using multiple cameras are discussed in
                            Section 7.6. Photos courtesy of Hans Moravec, Olivier Faugeras, and Yann LeCun.

                                 Stereo vision involves two processes: The fusion of features observed by two
                            (or more) eyes and the reconstruction of their three-dimensional preimage. The
                            latter is relatively simple: The preimage of matching points can (in principle) be
                            found at the intersection of the rays passing through these points and the associ-
                            ated pupil centers (or pinholes; see Figure 7.2, left). Thus, when a single image
                            feature is observed at any given time, stereo vision is easy. However, each picture
                            typically consists of millions of pixels, with tens of thousands of image features
                            such as edge elements, and some method must be devised to establish the correct
                            correspondences and avoid erroneous depth measurements (Figure 7.2, right).
                                 We start this chapter by examining in Section 7.1 the geometric epipolar con-
                            straint associated with a pair of cameras, which is a key to controlling the cost
                            of the binocular fusion process. Next, we stay on the geometric side of things in
                            Section 7.2 as we present a number of methods for binocular reconstruction. After

                                                                                                 197
   224   225   226   227   228   229   230   231   232   233   234