Page 38 - Dynamic Vision for Perception and Control of Motion
P. 38

22     2  Basic Relations: Image Sequences – “the World”



               Looking at 2-D data arrays generated by several hundred thousands of sen-
               sor elements, come up with a distribution of objects in the real world and of
               their relative motion. The sensor elements are arranged in a uniform array
               on the chip, usually. Onboard vehicles, it cannot be assumed that the sensor
               orientation is known beforehand or even stationary. However, inertial sen-
               sors for linear acceleration components and rotational rates are available for
               sensing ego-motion.

              It is immediately clear that knowledge about object classes and the way their
            visible features are mapped into the image plane is of great importance for image
            sequence  understanding. These  objects  may be grouped in  classes with  similar
            functionality and/or appearance. The body of the vehicle carrying the sensors and
            providing the  means for locomotion is,  of course, of  utmost importance. The
            lengthy description of the previous sentence will be abbreviated by the term:  the
            “own” body. To understand its motion directly and independently of vision, signals
            from other sensors such as odometers, inertial angular rate sensors and linear ac-
            celerometers as well as GPS (from the “Global Positioning System” providing geo-
            graphic coordinates) are widely used.
              Image data points carry no direct information on the distance at which their light
            sources, which have stimulated the sensor signal are in the real world; the third di-
            mension (range) is completely lost in a single image (except maybe for intensity at-
            tenuation over longer distances). In addition, since perturbations may invalidate the
            information content of a single pixel almost completely, useful image features con-
            sist of signals from groups of sensor elements where local perturbations tend to be
            leveled out. In biological systems, these are the receptive fields; in technical sys-
            tems, these are evaluation masks of various sizes. This now allows a more precise
            statement of the vision task:
               By looking at the responses of feature extraction algorithms, try to find ob-
               jects and subjects in the real world and their relative state to the own body.
               When knowledge about motion characteristics or typical behaviors is avail-
               able, exploit this in order to achieve better results and deeper understanding
               by filtering the measurement data over time.

              For simple massive objects (e.g., a stone, our sun and moon) and man-made ve-
            hicles, good “dynamic models” describing motion constraints are known very of-
            ten. To describe relative or absolute motion of objects precisely, suitable reference
            coordinate systems have to be introduced. According to the wide scale of space ac-
            cessible by vision, certain scales of representation are advantageous:
              Sensor elements have dimensions in the micrometer range (Pm).
              Humans  operate directly in  the meter  (m) range:  reaching space, single step
              (body size).
              For projectiles and fast vehicles, the range of immediate reactions extends to
              several hundred meters or kilometers (km).
              Missions may span several hundred to thousands of kilometers, even one-third
              to one-half around the globe in direct flight.
   33   34   35   36   37   38   39   40   41   42   43