Page 17 - Dynamic Vision for Perception and Control of Motion
P. 17

1  Introduction












            The field of “vision” is so diverse and there are so many different approaches to
            the widespread realms of application that it seems reasonable first to inspect it and
            to specify the area to which the book intends to contribute. Many approaches to
            machine vision have started with the paradigm that easy things should be tackled
            first, like single snapshot image interpretation in unlimited time; an extension to
            more complex applications may later on build on the experience gained. Our ap-
            proach on the contrary was to separate the field of dynamic vision from its (quasi-)
            static counterpart right from the beginning and to derive adequate methods for this
            specific domain. To prepare the ground for success, sufficiently capable methods
            and knowledge representations have to be introduced from the beginning.


            1.1  Different Types of Vision Tasks and Systems



            Figure 1.1 shows juxtapositions of several vision tasks occurring in everyday life.
            For humans, snapshot interpretation seems easy, in general, when the domain is
            well known in which the image has been taken. We tend to imagine the temporal
            context and the time when the image has been shot. From motion smear and un-
            usual poses, the embedding of the snapshot in a well-known maneuver is con-
            cluded. So in general, even single images require background knowledge on mo-
            tion processes in space for more in-depth understanding; this is often overlooked in
            machine or computer vision. The approach discussed in this book (bold italic let-
            ters in Figure 1.1) takes motion processes in “3-D space and time” as basic knowl-
            edge  required  for understanding image sequences in an  approach similar to our
            own way of image interpretation. This yields a natural framework for using lan-
            guage and terms in the common sense.
              Another big difference in methods and approaches required stems from the fact
            that the camera yielding the video stream is either stationary or moving itself. If
            moving, linear or/and rotational motion also may require special treatment. Sur-
            veillance is done, usually, from a stationary position while the camera may pan (ro-
            tation around a vertical axis, often also called yaw) and tilt (rotation around the
            horizontal axis, also called pitch) to increase its total field of view. In this case,
            motion is introduced purposely and is well controlled, so that it can be taken into
            account during image evaluation. If egomotion is to be controlled based on vision,
            the body carrying the camera(s) may be subject to strong perturbations, which can-
            not be predicted, in general.
   12   13   14   15   16   17   18   19   20   21   22