Page 19 - Dynamic Vision for Perception and Control of Motion
P. 19
1.2 Why Perception and Action? 3
Since images are only in two dimensions, the 2-D framework looks most natural
for image interpretation. This may be true for almost planar objects viewed ap-
proximately normal to their plane of appearance, like a landscape in a bird’s-eye
view. On the other hand, when a planar surface is viewed with the optical axis al-
most parallel to it from an elevation slightly above the ground, the situation is quite
different. In this case, each line in the image corresponds to a different distance on
the ground, and the same 3-D object on the surface looks quite different in size ac-
cording to where it appears in the image. This is the reason why homogeneously
distributed image processing by vector machines, for example, does have a hard
time in showing its efficiency; locally adapted methods in image regions seem
much more promising in this case and have proven their superiority. Interpreting
image sequences in 3-D space with corresponding knowledge bases right from the
beginning allows easy adaptation to range differences for single objects. Of course,
the analysis of situations encompassing several objects at various distances now
has to be done on a separate level, building on the results of all previous steps. This
has been one of the driving factors in designing the architecture for the Third-
generation “expectation-based, multi-focal saccadic” (EMS) vision system de-
scribed in this book. This corresponds to recent findings in well-developed biologi-
cal systems where for image processing and action planning based on the results of
visual perception, different areas light up in magnetic resonance images [Talati,
Hirsch 2005].
Understanding motion processes of 3-D objects in 3-D space while the body
carrying the cameras also moves in 3-D space, seems to be one of the most difficult
tasks in real-time vision. Without the help of inertial sensing for separating egomo-
tion from relative motion, this can hardly be done successfully, at least in dynamic
situations.
Direct range measurement by special sensors such as radar or laser range finders
(LRF) would alleviate the vision task. Because of their relative simplicity and low
demand of computing power, these systems have found relatively widespread ap-
plication in the automotive field. However, with respect to resolution and flexibil-
ity of data exploitation as well as hardware cost and installation volume required,
they have much less potential than passive cameras in the long run with computing
power available in abundance. For this reason, these systems are not included in
this book.
1.2 Why Perception and Action?
For technical systems which are intended to find their way on their own in an ever
changing world, it is impossible to foresee every possible event and to program all
required capabilities for appropriate reactions into its software from the beginning.
To be flexible in dealing with situations actually encountered, the system should
have perceptual and behavioral capabilities which it may expand on its own in re-
sponse to new requirements. This means that the system should be capable of judg-
ing the value of control outputs in response to measured data; however, since out-
puts of control affect state variables over a certain amount of time, ensuing time