Page 23 - Dynamic Vision for Perception and Control of Motion
P. 23

1.4  What are Appropriate Interpretation Spaces?      7


            ing models are available from the natural sciences since Newton and Leibnitz have
            found that differential equations are the proper tools for representing these continu-
            ity conditions  in generic form; over the  last decades, simulation technology has
            provided the methods for dealing with these representations on digital computers.
              In communication technology and in the  field of  pattern recognition, video
            processing in the image plane may be the best way to go since no understanding of
            the content  of the scene is  required.  However, for  orienting oneself in the real
            world through image sequence analysis, early transition to the physical interpreta-
            tion space is considered highly advantageous because it is in this space that occlu-
            sions become easily understandable and motion continuity persists. Also, it is in
            this space that inertial signals have to be interpreted and that integrals of accelera-
            tions yield 3-D velocity components; integrals of these velocities yield the corre-
            sponding positions and angular orientations for the rotational degrees of freedom.
            Therefore, for visual dynamic scene understanding, images are considered inter-
            mediate carriers of data containing information about the spatiotemporal environ-
            ment. To recover this information most efficiently, all internal modeling in the in-
            terpretation  process is  done in 3-D  space and time, and the transition to this
            representation should take place as early as possible. Knowledge for achieving this
            goal is specific to single objects and the  generic classes to  which they belong.
            Therefore, to answer question 2 above, specialist processes geared to classes of ob-
            jects and individuals of these classes observed in the image sequence should be de-
            signed for direct interpretation in 3-D space and time.
              Only these spatiotemporal representations then allow answering question 3 by
            looking at these data of all relevant objects in the near environment for a more ex-
            tended period of time. To be able to understand motion processes of objects more
            deeply in our everyday environment, a distinction has to be made between classes
            of objects. Those obeying simple laws of motion from physics are the ones most
            easily handled (e.g.,  by some version of Newton’s law).  Light objects, easily
            moved by stochastically appearing (even light) winds become difficult to grasp be-
            cause of the variable properties of wind fields and gusts.
              Another large class of objects – with many different subclasses – is formed by
            those able to sense properties of their environment and to initiate movements on
            their own, based on a combination of the data sensed and background knowledge
            internally stored. These special objects will be called subjects; all animals includ-
            ing humans belong to this (super-) class as well as autonomous agents created by
            technical means (like  robots or autonomous  vehicles). The corresponding sub-
            classes are formed by combinations of perceptual and behavioral capabilities and,
            of course, their shapes. Beside their shapes, individuals of subclasses may be rec-
            ognized also by stereotypical motion patterns (like a hopping kangaroo or a wind-
            ing snake).
              Road vehicles (independent of control by a human driver or a technical subsys-
            tem) exhibit typical behaviors depending on the situation encountered. For exam-
            ple, they follow lanes and do convoy driving, perform lane changes, pass other ve-
            hicles, turn off onto a crossroad or slow down for parking. All of the maneuvers
            mentioned are well known to human drivers, and they recognize the intention of
            performing one of those by its typical onset of motion over a short period of time.
            For example, a car leaving the center of its lane and moving consistently toward
   18   19   20   21   22   23   24   25   26   27   28