Page 33 -
P. 33

12                                                                            1 Introduction


                                (Figure 1.7f) (Horn and Schunck 1981; Huang 1981; Lucas and Kanade 1981; Nagel 1986).
                                The early work in simultaneously recovering 3D structure and camera motion (see Chapter 7)
                                also began around this time (Ullman 1979; Longuet-Higgins 1981).
                                   A lot of the philosophy of how vision was believed to work at the time is summarized
                                                        8
                                in David Marr’s (1982) book. In particular, Marr introduced his notion of the three levels
                                of description of a (visual) information processing system. These three levels, very loosely
                                paraphrased according to my own interpretation, are:

                                   • Computational theory: What is the goal of the computation (task) and what are the
                                     constraints that are known or can be brought to bear on the problem?

                                   • Representations and algorithms: How are the input, output, and intermediate infor-
                                     mation represented and which algorithms are used to calculate the desired result?


                                   • Hardware implementation: How are the representations and algorithms mapped onto
                                     actual hardware, e.g., a biological vision system or a specialized piece of silicon? Con-
                                     versely, how can hardware constraints be used to guide the choice of representation
                                     and algorithm? With the increasing use of graphics chips (GPUs) and many-core ar-
                                     chitectures for computer vision (see Section C.2), this question is again becoming quite
                                     relevant.

                                As I mentioned earlier in this introduction, it is my conviction that a careful analysis of the
                                problem specification and known constraints from image formation and priors (the scientific
                                and statistical approaches) must be married with efficient and robust algorithms (the engineer-
                                ing approach) to design successful vision algorithms. Thus, it seems that Marr’s philosophy
                                is as good a guide to framing and solving problems in our field today as it was 25 years ago.



                                1980s. In the 1980s, a lot of attention was focused on more sophisticated mathematical
                                techniques for performing quantitative image and scene analysis.
                                   Image pyramids (see Section 3.5) started being widely used to perform tasks such as im-
                                age blending (Figure 1.8a) and coarse-to-fine correspondence search (Rosenfeld 1980; Burt
                                and Adelson 1983a,b; Rosenfeld 1984; Quam 1984; Anandan 1989). Continuous versions
                                of pyramids using the concept of scale-space processing were also developed (Witkin 1983;
                                Witkin, Terzopoulos, and Kass 1986; Lindeberg 1990). In the late 1980s, wavelets (see Sec-
                                tion 3.5.4) started displacing or augmenting regular image pyramids in some applications
                                (Adelson, Simoncelli, and Hingorani 1987; Mallat 1989; Simoncelli and Adelson 1990a,b;
                                Simoncelli, Freeman, Adelson et al. 1992).
                                   The use of stereo as a quantitative shape cue was extended by a wide variety of shape-
                                from-X techniques, including shape from shading (Figure 1.8b) (see Section 12.1.1 and Horn
                                1975; Pentland 1984; Blake, Zimmerman, and Knowles 1985; Horn and Brooks 1986, 1989),
                                photometric stereo (see Section 12.1.1 and Woodham 1981), shape from texture (see Sec-
                                tion 12.1.2 and Witkin 1981; Pentland 1984; Malik and Rosenholtz 1997), and shape from
                                focus (see Section 12.1.3 and Nayar, Watanabe, and Noguchi 1995). Horn (1986) has a nice
                                discussion of most of these techniques.

                                  8  More recent developments in visual perception theory are covered in (Palmer 1999; Livingstone 2008).
   28   29   30   31   32   33   34   35   36   37   38