Page 25 - Dynamic Vision for Perception and Control of Motion

P. 25

1.4 What are Appropriate Interpretation Spaces? 9

such as spatial or temporal change rates, spatial gradients, or directions of extreme
values such as intensity gradients are typical examples.
These differentials have shown to be powerful concepts for representing knowl-
edge about physical properties of classes of objects. Differential equations repre-
sent the natural mathematical element for coding knowledge about motion proc-
esses in the real world. With the advent of the Kalman filter [Kalman 1960], they
have become the key element for obtaining the best state estimate of the variables
describing the system, based on recursive methods implementing a least-squares
model fit. Real-time visual perception of moving objects is hardly possible without
this very efficient approach.

1.4.2 Local Integrals as Central Elements for Perception

Note that the precise definition of what is local depends on the problem domain in-
vestigated and may vary in a wide range. The third column and row in Figure 1.2
are devoted to “local integrals”; this term again is rather fuzzy and will be defined
more precisely in the task context. On the timescale, it means the transition from
analog (continuous, differential) to digital (sampled, discrete) representations. In
the spatial domain, typical local integrals are rigid bodies, which may move as a
unit without changing their 3-D shape.
These elements are defined such that the intersection in field (3, 3) in Figure 1.2
becomes the central hub for data interpretation and data fusion: it contains the in-
dividual objects as units to which humans attach most of their knowledge about the
real world. Abstraction of properties has lead to generic classes which allow sub-
suming a large variety of single cases into one generic concept, thereby leading to
representational efficiency.

1.4.2.1 Where is the Information in an Image?
It is well known that information in an image is contained in local intensity
changes: A uniformly gray image has only a few bits of information, namely, (1)
the gray value and (2) uniform distribution of this value over the entire image. The
image may be completely described by three bytes, even though the amount of data
may be about 400 000 bytes in a TV frame or even 4 MB (2k × 2k pixels). If there
are certain areas of uniform gray values, the boundary lines of these areas plus the
internal gray values contain all the information in the image. This object in the im-
age plane may be described with much less data than the pixel values it encom-
passes.
In a more general form, image areas defined by a set of properties (shape, texture,
color, joint motion, etc.) may be considered image objects, which originated from
3-D objects by perspective mapping. Due to the numerous aspect conditions, which
such an object may adopt relative to the camera, its potential appearances in the
image plane are very diverse. Their representation will require orders of magnitude
more data for an exhaustive description than its representation in 3-D space plus
the laws of perspective mapping, which are the same for all objects. Therefore, an
object is defined by its 3-D shape, which may be considered a local spatial integral

20 21 22 23 24 25 26 27 28 29 30