Page 21 - Dynamic Vision for Perception and Control of Motion
P. 21

1.4  What are Appropriate Interpretation Spaces?      5


            derived from vision will hit the real world. For precise control of highly dynamic
            systems, this time delay has to be taken into account.
              Since perturbations should be counteracted as soon as possible, and since visu-
            ally  measurable results of  perturbations  are the second integral  of accelerations
            with corresponding delay times, it is advisable to have inertial sensors in the sys-
            tem for early pickup  of  perturbations. Because long-term stabilization may be
            achieved using vision, it is not necessary to resort to expensive inertial sensors; on
            the contrary, when jointly used with vision, inexpensive inertial sensors with good
            properties for the medium- to high-frequency part are sufficient as demonstrated by
            the vestibular systems in vertebrates.
              Accelerometers are able to measure rather directly the effects of most control
            outputs; this alleviates system identification and finding the control outputs for re-
            flex-like counteraction of perturbations. Cross-correlation of inertial signals with
            visually determined signals allows temporally deeper understanding of what in the
            natural sciences is called “time integrals” of input functions.
              For all these reasons, the joint use of visual and inertial signals is considered
            mandatory for achieving efficient autonomously  mobile platforms. Similarly, if
            special velocity components can be measured easily by conventional  devices, it
            does not make sense to try to recover these from vision in a “purist” approach.
            These conventional signals may alleviate perception of the environment considera-
            bly since the corresponding sensors are mounted  onto the body in a fixed  way,
            while in vision the measured feature values have to be assigned to some object in
            the environment according to just visual evidence. There is no constantly estab-
            lished link for each measurement value in vision as is the case for conventional
            sensors.



            1.4  What are Appropriate Interpretation Spaces?

            Images are two-dimensional arrays of data; the usual array size today is from about
            64 × 64 for special “vision” chips to about 770 × 580 for video cameras (special
            larger sizes are available but only at much higher cost, e.g., for space or military
            applications). A digitized video data stream is a fast sequence of these images with
            data rates up to ~ 11 MB/s for black and white and up to three times this amount
            for color.
              Frequently, only fields of 320 × 240 pixels (either only the odd or the even lines
            with corresponding reduction of the resolution within the lines) are being evaluated
            because of computing power missing. This results in a data stream per camera of
            about 2 MB/s. Even at this reduced data rate, the processing power of a single mi-
            croprocessor available today is not yet sufficient for interpreting several video sig-
            nals in parallel in real time. High-definition TV signals of the future may have up
            to 1080 lines and 1920 pixels in each line at frame rates of up to 75 Hz; this corre-
            sponds to data rates of more than 155 MB/s. Machine vision with this type of reso-
            lution is way out in the future.
              Maybe, uniform processing of entire images is not desirable at all, since differ-
            ent objects will be seen in different parts of the images, requiring specific image
   16   17   18   19   20   21   22   23   24   25   26