Page 39 -
P. 39
18 1 Introduction
strictly necessary to read all of the material in sequence.
Figure 1.11 shows a rough layout of the contents of this book. Since computer vision
involves going from images to a structural description of the scene (and computer graphics
the converse), I have positioned the chapters horizontally in terms of which major component
they address, in addition to vertically according to their dependence.
Going from left to right, we see the major column headings as Images (which are 2D
in nature), Geometry (which encompasses 3D descriptions), and Photometry (which encom-
passes object appearance). (An alternative labeling for these latter two could also be shape
and appearance—see, e.g., Chapter 13 and Kang, Szeliski, and Anandan (2000).) Going
from top to bottom, we see increasing levels of modeling and abstraction, as well as tech-
niques that build on previously developed algorithms. Of course, this taxonomy should be
taken with a large grain of salt, as the processing and dependencies in this diagram are not
strictly sequential and subtle additional dependencies and relationships also exist (e.g., some
recognition techniques make use of 3D information). The placement of topics along the hor-
izontal axis should also be taken lightly, as most vision algorithms involve mapping between
at least two different representations. 9
Interspersed throughout the book are sample applications, which relate the algorithms
and mathematical material being presented in various chapters to useful, real-world applica-
tions. Many of these applications are also presented in the exercises sections, so that students
can write their own.
At the end of each section, I provide a set of exercises that the students can use to imple-
ment, test, and refine the algorithms and techniques presented in each section. Some of the
exercises are suitable as written homework assignments, others as shorter one-week projects,
and still others as open-ended research problems that make for challenging final projects.
Motivated students who implement a reasonable subset of these exercises will, by the end of
the book, have a computer vision software library that can be used for a variety of interesting
tasks and projects.
As a reference book, I try wherever possible to discuss which techniques and algorithms
work well in practice, as well as providing up-to-date pointers to the latest research results in
the areas that I cover. The exercises can be used to build up your own personal library of self-
tested and validated vision algorithms, which is more worthwhile in the long term (assuming
you have the time) than simply pulling algorithms out of a library whose performance you do
not really understand.
The book begins in Chapter 2 with a review of the image formation processes that create
the images that we see and capture. Understanding this process is fundamental if you want
to take a scientific (model-based) approach to computer vision. Students who are eager to
just start implementing algorithms (or courses that have limited time) can skip ahead to the
next chapter and dip into this material later. In Chapter 2, we break down image formation
into three major components. Geometric image formation (Section 2.1) deals with points,
lines, and planes, and how these are mapped onto images using projective geometry and other
models (including radial lens distortion). Photometric image formation (Section 2.2) covers
radiometry, which describes how light interacts with surfaces in the world, and optics, which
projects light onto the sensor plane. Finally, Section 2.3 covers how sensors work, including
9 For an interesting comparison with what is known about the human visual system, e.g., the largely parallel what
and where pathways, see some textbooks on human perception (Palmer 1999; Livingstone 2008).