Page 19 - Dynamic Vision for Perception and Control of Motion
P. 19

1.2  Why Perception and Action?      3


              Since images are only in two dimensions, the 2-D framework looks most natural
            for image interpretation. This may be true for almost planar  objects viewed ap-
            proximately normal to their plane of appearance, like a landscape in a bird’s-eye
            view. On the other hand, when a planar surface is viewed with the optical axis al-
            most parallel to it from an elevation slightly above the ground, the situation is quite
            different. In this case, each line in the image corresponds to a different distance on
            the ground, and the same 3-D object on the surface looks quite different in size ac-
            cording to where it appears in the image. This is the reason why homogeneously
            distributed image processing by vector machines, for example, does have a hard
            time in showing its efficiency; locally adapted methods in image regions seem
            much more promising in this case and have proven their superiority. Interpreting
            image sequences in 3-D space with corresponding knowledge bases right from the
            beginning allows easy adaptation to range differences for single objects. Of course,
            the analysis of situations encompassing  several objects at various distances now
            has to be done on a separate level, building on the results of all previous steps. This
            has been one of the driving factors in designing the architecture for the Third-
            generation “expectation-based, multi-focal saccadic” (EMS) vision  system de-
            scribed in this book. This corresponds to recent findings in well-developed biologi-
            cal systems where for image processing and action planning based on the results of
            visual perception, different areas light up in magnetic resonance images [Talati,
            Hirsch 2005].
              Understanding  motion  processes of 3-D  objects in 3-D  space  while the body
            carrying the cameras also moves in 3-D space, seems to be one of the most difficult
            tasks in real-time vision. Without the help of inertial sensing for separating egomo-
            tion from relative motion, this can hardly be done successfully, at least in dynamic
            situations.
              Direct range measurement by special sensors such as radar or laser range finders
            (LRF) would alleviate the vision task. Because of their relative simplicity and low
            demand of computing power, these systems have found relatively widespread ap-
            plication in the automotive field. However, with respect to resolution and flexibil-
            ity of data exploitation as well as hardware cost and installation volume required,
            they have much less potential than passive cameras in the long run with computing
            power available in abundance. For this reason, these systems are not included in
            this book.


            1.2  Why Perception and Action?



            For technical systems which are intended to find their way on their own in an ever
            changing world, it is impossible to foresee every possible event and to program all
            required capabilities for appropriate reactions into its software from the beginning.
            To be flexible in dealing with situations actually encountered, the system should
            have perceptual and behavioral capabilities which it may expand on its own in re-
            sponse to new requirements. This means that the system should be capable of judg-
            ing the value of control outputs in response to measured data; however, since out-
            puts of control affect state variables over a certain amount of time, ensuing time
   14   15   16   17   18   19   20   21   22   23   24