Page 23 - Dynamic Vision for Perception and Control of Motion
P. 23
1.4 What are Appropriate Interpretation Spaces? 7
ing models are available from the natural sciences since Newton and Leibnitz have
found that differential equations are the proper tools for representing these continu-
ity conditions in generic form; over the last decades, simulation technology has
provided the methods for dealing with these representations on digital computers.
In communication technology and in the field of pattern recognition, video
processing in the image plane may be the best way to go since no understanding of
the content of the scene is required. However, for orienting oneself in the real
world through image sequence analysis, early transition to the physical interpreta-
tion space is considered highly advantageous because it is in this space that occlu-
sions become easily understandable and motion continuity persists. Also, it is in
this space that inertial signals have to be interpreted and that integrals of accelera-
tions yield 3-D velocity components; integrals of these velocities yield the corre-
sponding positions and angular orientations for the rotational degrees of freedom.
Therefore, for visual dynamic scene understanding, images are considered inter-
mediate carriers of data containing information about the spatiotemporal environ-
ment. To recover this information most efficiently, all internal modeling in the in-
terpretation process is done in 3-D space and time, and the transition to this
representation should take place as early as possible. Knowledge for achieving this
goal is specific to single objects and the generic classes to which they belong.
Therefore, to answer question 2 above, specialist processes geared to classes of ob-
jects and individuals of these classes observed in the image sequence should be de-
signed for direct interpretation in 3-D space and time.
Only these spatiotemporal representations then allow answering question 3 by
looking at these data of all relevant objects in the near environment for a more ex-
tended period of time. To be able to understand motion processes of objects more
deeply in our everyday environment, a distinction has to be made between classes
of objects. Those obeying simple laws of motion from physics are the ones most
easily handled (e.g., by some version of Newton’s law). Light objects, easily
moved by stochastically appearing (even light) winds become difficult to grasp be-
cause of the variable properties of wind fields and gusts.
Another large class of objects – with many different subclasses – is formed by
those able to sense properties of their environment and to initiate movements on
their own, based on a combination of the data sensed and background knowledge
internally stored. These special objects will be called subjects; all animals includ-
ing humans belong to this (super-) class as well as autonomous agents created by
technical means (like robots or autonomous vehicles). The corresponding sub-
classes are formed by combinations of perceptual and behavioral capabilities and,
of course, their shapes. Beside their shapes, individuals of subclasses may be rec-
ognized also by stereotypical motion patterns (like a hopping kangaroo or a wind-
ing snake).
Road vehicles (independent of control by a human driver or a technical subsys-
tem) exhibit typical behaviors depending on the situation encountered. For exam-
ple, they follow lanes and do convoy driving, perform lane changes, pass other ve-
hicles, turn off onto a crossroad or slow down for parking. All of the maneuvers
mentioned are well known to human drivers, and they recognize the intention of
performing one of those by its typical onset of motion over a short period of time.
For example, a car leaving the center of its lane and moving consistently toward