Page 21 - Dynamic Vision for Perception and Control of Motion
P. 21
1.4 What are Appropriate Interpretation Spaces? 5
derived from vision will hit the real world. For precise control of highly dynamic
systems, this time delay has to be taken into account.
Since perturbations should be counteracted as soon as possible, and since visu-
ally measurable results of perturbations are the second integral of accelerations
with corresponding delay times, it is advisable to have inertial sensors in the sys-
tem for early pickup of perturbations. Because long-term stabilization may be
achieved using vision, it is not necessary to resort to expensive inertial sensors; on
the contrary, when jointly used with vision, inexpensive inertial sensors with good
properties for the medium- to high-frequency part are sufficient as demonstrated by
the vestibular systems in vertebrates.
Accelerometers are able to measure rather directly the effects of most control
outputs; this alleviates system identification and finding the control outputs for re-
flex-like counteraction of perturbations. Cross-correlation of inertial signals with
visually determined signals allows temporally deeper understanding of what in the
natural sciences is called “time integrals” of input functions.
For all these reasons, the joint use of visual and inertial signals is considered
mandatory for achieving efficient autonomously mobile platforms. Similarly, if
special velocity components can be measured easily by conventional devices, it
does not make sense to try to recover these from vision in a “purist” approach.
These conventional signals may alleviate perception of the environment considera-
bly since the corresponding sensors are mounted onto the body in a fixed way,
while in vision the measured feature values have to be assigned to some object in
the environment according to just visual evidence. There is no constantly estab-
lished link for each measurement value in vision as is the case for conventional
sensors.
1.4 What are Appropriate Interpretation Spaces?
Images are two-dimensional arrays of data; the usual array size today is from about
64 × 64 for special “vision” chips to about 770 × 580 for video cameras (special
larger sizes are available but only at much higher cost, e.g., for space or military
applications). A digitized video data stream is a fast sequence of these images with
data rates up to ~ 11 MB/s for black and white and up to three times this amount
for color.
Frequently, only fields of 320 × 240 pixels (either only the odd or the even lines
with corresponding reduction of the resolution within the lines) are being evaluated
because of computing power missing. This results in a data stream per camera of
about 2 MB/s. Even at this reduced data rate, the processing power of a single mi-
croprocessor available today is not yet sufficient for interpreting several video sig-
nals in parallel in real time. High-definition TV signals of the future may have up
to 1080 lines and 1920 pixels in each line at frame rates of up to 75 Hz; this corre-
sponds to data rates of more than 155 MB/s. Machine vision with this type of reso-
lution is way out in the future.
Maybe, uniform processing of entire images is not desirable at all, since differ-
ent objects will be seen in different parts of the images, requiring specific image