Page 72 - Dynamic Vision for Perception and Control of Motion
P. 72

56     2  Basic Relations: Image Sequences – “the World”


            represents an object or sub–object as a movable or functionally separate part. Ob-
            jects may be inserted or deleted from one frame to the next (dynamic scene tree).
              This scene tree represents the mapping process of features on the surface of ob-
            jects in the real world up to hundreds of meters away into the image of one or more
            camera(s). They finally have an extension of several pixels on the camera chip (a
            few dozen micrometers with today’s technology). Their motion on the chip is to be
            interpreted as body motion in the real world of the object carrying these features,
            taking  body  motion affecting the mapping process properly into account. Since
            body motions are smooth, in general, spatiotemporal embedding and first-order ap-
            proximations help making visual interpretation more efficient, especially at high
            image rates as in video sequences.


            2.4.1 Gain by Multiple Images in Space and/or Time for Model Fitting

            High–frequency temporal embedding alleviates the correspondence problem be-
            tween features from one frame to the next, since they will have moved only by a
            small amount. This reduces the search range in a top-down feature extraction mode
            like the one used for tracking. Especially, if there are stronger, unpredictable per-
            turbations, their effect on feature position is minimized by frequent measurements.
            Doubling the  sampling rate, for example,  allows detecting a perturbation onset
            much earlier (on average). Since tracking in the image has to be done in two di-
            mensions, the search area may be reduced by a square effect relative to the one-
            dimensional (linear) reduction in time available for evaluation. As mentioned pre-
            viously for reference, humans cannot tell the correct sequence of two events if they
            are less than 30 ms apart, even though they can perceive that there are two separate
            events [Pöppel, Schill 1995]. Experimental experience with technical vision systems
            has shown that using every frame of a 25 Hz image sequence (40 ms cycle time)
            allows object tracking of high quality  if proper feature  extraction  algorithms  to
            subpixel accuracy and well-tuned recursive estimation processes are applied. This
            tuning has to be adapted by knowledge components taking the situation of driving
            a vehicle and the lighting conditions into account.
              This does not include, however, that all processing on the higher levels has to
            stick to this high rate. Maneuver recognition  of  other subjects, situation assess-
            ment, and behavior decision for locomotion can be performed on a (much) lower
            scale without sacrificing quality of performance, in general. This may partly be due
            to the biological nature of humans. It is almost impossible for humans to react in
            less than several hundred milliseconds  response time. As mentioned  before, the
            unit “second” may have been chosen as the basic timescale for this reason.
              However, high image rates provide the opportunity both for early detection of
            events and for data smoothing on the timescale with regard to motion processes of
            interest. Human extremities like arms or legs can hardly be activated at more than
            2  Hz corner  frequency. Therefore, efficient vision systems should  concentrate
            computing resources to where information can be gained best (at expected feature
            locations of known objects/subjects of interest) and to regions where new objects
            may occur. Foveal–peripheral differentiation  of spatial resolution in connection
            with fast gaze control may be considered an optimal vision system design found in
   67   68   69   70   71   72   73   74   75   76   77