Page 71 - Dynamic Vision for Perception and Control of Motion
P. 71

2.4 Spatiotemporal Embedding and First-order Approximations      55



            Therefore, the general task of real-time vision is to achieve a compact internal rep-
            resentation of motion processes of several objects observed in parallel by evaluat-
            ing feature flows in the image sequence. Since egomotion also enters the content of
            images, the state of the vehicle carrying the cameras has to be observed simultane-
            ously. However, vision gives information on relative motion only between objects,
            unfortunately, in addition, with appreciable time delay (several tenths of a second)
            and no immediate correlation to inertial space. Therefore, conventional sensors on
            the body yielding relative motion to the stationary environment (like odometers) or
            inertial accelerations and rotational rates (from inertial sensors like accelerometers
            and angular rate sensors) are very valuable for perceiving egomotion and for telling
            this apart from the visual effects of motion of other objects. Inertial sensors have
            the additional advantage of picking up perturbation effects from the environment
            before they show up as unexpected deviations in the integrals (speed components
            and  pose changes). All these  measurements with differing delay times and trust
            values have to be interpreted in conjunction to arrive at a consistent interpretation
            of the situation for making decisions on appropriate behavior.
              Before this can be achieved, perceptual and behavioral capabilities have to be
            defined and represented (Chapters 3 to 6). Road recognition as indicated in Figures
            2.7 and 2.9 while driving on the road will be the application area in Chapters 7 to
            10. The approach is similar to the human one: Driven by the optical input from the
            image sequence, an internal animation process in 3-D space and time is started
            with members of generically known object and subject classes that are to duplicate
            the visual appearance  of “the world”  by prediction-error feedback. For the next
            time for measurement taking (corrected for time delay effects), the expected values
            in each measurement modality are predicted. The prediction errors are then used to
            improve the internal state representation, taking the Jacobian matrices and the con-
            fidence in the  models for the  motion processes as well  as for the measurement
            processes involved into account (error covariance matrices).
              For vision,  the  concatenation process  with HCTs for  each object-sensor  pair
            (Figure 2.7) as part of the physical world provides the means for achieving our
            goal of understanding  dynamic processes in an integrated approach. Since the
            analysis of the next image of a sequence should take advantage of all information
            collected up to this time, temporal prediction is performed based on the actual best
            estimates available for all objects involved and based on the dynamic models as
            discussed.  Note that no storage of image data is required in this approach, but only
            the parameters and state variables of those objects instantiated need be stored to
            represent the scene observed; usually, this reduces storage requirements by several
            orders of magnitude.
              Figure 2.9 showed a road scene with one vehicle on a curved road (upper right)
            in the viewing range of the egovehicle (left); the connecting object is the curved
            road with several lanes, in general. The mounting conditions for the camera in the
            vehicle (lower left) on a platform are shown in an exploded view on top for clarity.
            The coordinate systems define the different locations and aspect conditions for ob-
            ject mapping. The trouble in vision (as opposed to computer graphics) is that the
            entries in most of the HCT-matrices are the unknowns of the vision problem (rela-
            tive distances and angles). In a tree representation of this arrangement of objects
            (Figure 2.7), each edge between circles represents an HCT and each node (circle)
   66   67   68   69   70   71   72   73   74   75   76