Page 31 -

P. 31

10 1 Introduction

19770 1980 1 1990 2000

Digital image processing Blocks world, line labeling cylinders Generalized cylinders Generalized Pictorial structures Stereo correspondence Intrinsic images Optical flow Structure from motion Image pyramids Scale-space processing Shape from shading, texture, and focus Physically-based modeling Regularization Markov Random Fields Kalman filters 3D range data processing Projective invariants Factorization Factorization Physics-base

Face

Figure 1.6 A rough timeline of some of the most active topics of research in computer vision.

1.2 A brief history

In this section, I provide a brief personal synopsis of the main developments in computer
vision over the last 30 years (Figure 1.6); at least, those that I ﬁnd personally interesting
and which appear to have stood the test of time. Readers not interested in the provenance
of various ideas and the evolution of this ﬁeld should skip ahead to the book overview in
Section 1.3.

1970s. When computer vision ﬁrst started out in the early 1970s, it was viewed as the
visual perception component of an ambitious agenda to mimic human intelligence and to
endow robots with intelligent behavior. At the time, it was believed by some of the early
pioneers of artiﬁcial intelligence and robotics (at places such as MIT, Stanford, and CMU)
that solving the “visual input” problem would be an easy step along the path to solving more
difﬁcult problems such as higher-level reasoning and planning. According to one well-known
story, in 1966, Marvin Minsky at MIT asked his undergraduate student Gerald Jay Sussman
to “spend the summer linking a camera to a computer and getting the computer to describe
5
what it saw” (Boden 2006, p. 781). We now know that the problem is slightly more difﬁcult
than that. 6
What distinguished computer vision from the already existing ﬁeld of digital image pro-
cessing (Rosenfeld and Pfaltz 1966; Rosenfeld and Kak 1976) was a desire to recover the
three-dimensional structure of the world from images and to use this as a stepping stone to-
wards full scene understanding. Winston (1975) and Hanson and Riseman (1978) provide
two nice collections of classic papers from this early period.
Early attempts at scene understanding involved extracting edges and then inferring the
3D structure of an object or a “blocks world” from the topological structure of the 2D lines
5
Boden (2006) cites (Crevier 1993) as the original source. The actual Vision Memo was authored by Seymour
Papert (1966) and involved a whole cohort of students.
6 To see how far robotic vision has come in the last four decades, have a look at the towel-folding robot at
http://rll.eecs.berkeley.edu/pr/icra10/ (Maitin-Shepard, Cusumano-Towner, Lei et al. 2010).

26 27 28 29 30 31 32 33 34 35 36