Page 30 -
P. 30
1.1 What is computer vision? 9
problems to solutions is typical of an engineering approach to the study of vision and reflects
my own background in the field. First, I come up with a detailed problem definition and
decide on the constraints and specifications for the problem. Then, I try to find out which
techniques are known to work, implement a few of these, evaluate their performance, and
finally make a selection. In order for this process to work, it is important to have realistic test
data, both synthetic, which can be used to verify correctness and analyze noise sensitivity,
and real-world data typical of the way the system will finally be used.
However, this book is not just an engineering text (a source of recipes). It also takes a
scientific approach to basic vision problems. Here, I try to come up with the best possible
models of the physics of the system at hand: how the scene is created, how light interacts
with the scene and atmospheric effects, and how the sensors work, including sources of noise
and uncertainty. The task is then to try to invert the acquisition process to come up with the
best possible description of the scene.
The book often uses a statistical approach to formulating and solving computer vision
problems. Where appropriate, probability distributions are used to model the scene and the
noisy image acquisition process. The association of prior distributions with unknowns is often
called Bayesian modeling (Appendix B). It is possible to associate a risk or loss function with
mis-estimating the answer (Section B.2) and to set up your inference algorithm to minimize
the expected risk. (Consider a robot trying to estimate the distance to an obstacle: it is
usually safer to underestimate than to overestimate.) With statistical techniques, it often helps
to gather lots of training data from which to learn probabilistic models. Finally, statistical
approaches enable you to use proven inference techniques to estimate the best answer (or
distribution of answers) and to quantify the uncertainty in the resulting estimates.
Because so much of computer vision involves the solution of inverse problems or the esti-
mation of unknown quantities, my book also has a heavy emphasis on algorithms, especially
those that are known to work well in practice. For many vision problems, it is all too easy to
come up with a mathematical description of the problem that either does not match realistic
real-world conditions or does not lend itself to the stable estimation of the unknowns. What
we need are algorithms that are both robust to noise and deviation from our models and rea-
sonably efficient in terms of run-time resources and space. In this book, I go into these issues
in detail, using Bayesian techniques, where applicable, to ensure robustness, and efficient
search, minimization, and linear system solving algorithms to ensure efficiency. Most of the
algorithms described in this book are at a high level, being mostly a list of steps that have to
be filled in by students or by reading more detailed descriptions elsewhere. In fact, many of
the algorithms are sketched out in the exercises.
Now that I’ve described the goals of this book and the frameworks that I use, I devote the
rest of this chapter to two additional topics. Section 1.2 is a brief synopsis of the history of
computer vision. It can easily be skipped by those who want to get to “the meat” of the new
material in this book and do not care as much about who invented what when.
The second is an overview of the book’s contents, Section 1.3, which is useful reading for
everyone who intends to make a study of this topic (or to jump in partway, since it describes
chapter inter-dependencies). This outline is also useful for instructors looking to structure
one or more courses around this topic, as it provides sample curricula based on the book’s
contents.