Page 58 - Dynamic Vision for Perception and Control of Motion

P. 58

42 2 Basic Relations: Image Sequences – “the World”

thus is about 18 orders of magnitude. However, the different scales are not of equal
and simultaneous interest.
í5
Looking at the mapping conditions for perspective imaging, the 10 m range
has immediate importance as the basic grid size of the sensor. For remote sensing
of the environment, when the characteristic speed is in the order of magnitude of
tens of m/s, several hundred meters may be considered a reasonable viewing range
yielding about 3 to 10 seconds reaction time until the vehicle may reach the loca-
tion inspected. If objects with a characteristic dimension of about 5 cm = 0.05 m
should just fill a single pixel in the image when seen at the maximum distance of,
í5
say 200 m, the focal length f required is f /10 = 200/0.05 or f = 0.04 m or 40 mm.
If one would like to have a 1 cm wide line mapped onto 2 pixel at 10 m distance
(e.g., to be able to read the letters on a license plate), the focal length needed is f =
20 mm. To recognize lane markings 12 cm wide at 6m distance with 6 pixels on
this width, a focal length of 3 mm would be sufficient. This shows that for practical
purposes, focal lengths in the millimeter to decimeter range are adequate. This also
happens to be the physical dimension of modern CCD-TV cameras.
Because the wheel diameters of typical vehicles are of the order of magnitude of
about 1 m, objects become serious obstacles if their height exceeds about 0.1 m.
Therefore, the 0.1 to 100 m range (typical look-ahead distance) is the most impor-
tant and most frequently used one for ground vehicles. Entire missions, usually,
measure 1 to 100 km in range. For air vehicles, several thousand km are typical
travel distances since the Earth radius is about 6 371 km.
It may be interesting to note that the basic scale “1 m” was defined initially as
í7
10 of one quarter of the circumference around the globe via both poles about 2
centuries ago.

Inverse use of multiple space scales in vision: In a visual scene, the same object
may be a few meters or a few hundred meters away; the system should be able to
recognize the object as the same unit independent of the distance viewed. To
achieve this more easily, multiple focal lengths for a set of cameras will help since
a larger focal length directly counteracts the downscaling of the image size due to
increased range. This is the main reason for using multifocal camera arrangements
in EMS vision. With a spacing of focal lengths by a factor of 4 (corresponding to
the second pyramid stage each time), a numerical range of 16 may be bridged with
three cameras.
In practical applications, a new object is most likely picked up in the wide field
of view having least resolution. As few as four pixels (2 × 2) may be sufficient for
detecting a new object reliably without being able to classify it. Performing a sac-
cade to bring the object into the field of view of a camera with higher resolution,
would result in suddenly having many pixels available, additionally. In a bifocal
system with a focal length ratio of 4, the resolution would increase to 8 × 8 (64
pixels); in a trifocal system it would even go up to 32 × 32 (i.e., 1K pixels) on the
same area in the real world. Now the object may be analyzed on these scales in
parallel. The coarse space scale may be sufficient for tracking the object with high
temporal resolution up to video rate. On the high-resolution space scale, the object
may then be analyzed with respect to its detailed shape, possibly on a lower time-

53 54 55 56 57 58 59 60 61 62 63