Page 22 - Dynamic Vision for Perception and Control of Motion
P. 22
6 1 Introduction
processing algorithms for efficient evaluation, usually. Very often, lines of discon-
tinuity are encountered in images, which should be treated with special methods
differing essentially from those used in homogeneous parts. Object- and situation-
dependent methods and parameters should be used, controlled from higher evalua-
tion levels.
The question thus is, whether any basic feature extraction should be applied uni-
formly over the entire image region. In biological vision systems, this seems to be
the case, for example, in the striate cortex (V1) of vertebrates where oriented edge
elements are detected with the help of corresponding receptive fields. However,
vertebrate vision has nonhomogeneous resolution over the entire field of view. Fo-
veal vision with high resolution at the center of the retina is surrounded by recep-
tive fields of increasing spread and a lower density of receptors per unit of area in
the radial direction.
Vision of highly developed biological systems seems to ask three questions,
each of which is treated by a specific subsystem:
1. Is there something of special interest in a wide field of view?
2. What is it precisely, that attracted interest in question one? Can the individual
object be characterized and classified using background knowledge? What is its
relative state “here and now”?
3. What is the situation around me and how does it affect optimal decisions in be-
havior for achieving my goals? For this purpose, a relevant collection of objects
should be recognized and tracked, and the likely future behavior should be pre-
dicted.
To initialize the vision process at the beginning and to detect new objects later on,
it is certainly an advantage to have a bottom-up detection component available all
over the wide field of view. Maybe, just a few algorithms based on coarse resolu-
tion for detecting interesting groups of features will be sufficient to achieve this
goal. The question is, how much computing effort should be devoted to this bot-
tom-up component compared to more elaborate, model based top-down compo-
nents for objects already detected and being tracked. Usually, single objects cover
only a small area in an image of coarse resolution.
To answer question 2 above, biological vision systems direct the foveal area of
high resolution by so-called saccades, which are very fast gaze direction changes
with angular rates up to several hundred degrees per second, to the group of fea-
tures arousing most interest. Humans are able to perform up to five saccades per
second with intermediate phases of smooth pursuit (tracking) of these features, in-
dicating a very dynamic mode of perception (time-sliced parallel processing).
Tracking can be achieved much more efficiently with algorithms controlled by
prediction according to some model. Satisfactory solutions may be possible only in
special task domains for which experience is available from previous encounters.
Since prediction is a very powerful tool in a world with continuous processes,
the question arises: What is the proper framework for formulating the continuity
conditions? Is the image plane readily available as plane of reference? However, it
is known that the depth dimension in perspective mapping has been lost com-
pletely: All points on a ray have been mapped into a single point in the image
plane, irrespective of their distance, which has been lost. Would it be better to for-
mulate all continuity conditions in 3-D physical space and time? The correspond-