Page 56 - Dynamic Vision for Perception and Control of Motion

P. 56

40 2 Basic Relations: Image Sequences – “the World”

depends on the application area; with humans as the main partner in dealing with
the real world, their characteristic timescale will also predominate for the technical
systems under investigation here.
Due to the fact that humans need at least 30 ms between two signals sensed, to
be able to tell their correct sequence (independent of the sensory modality: tactile,
auditory, or visual) [Pöppel et al. 1991; Pöppel, Schill 1995], this time-window of 30
ms is considered the “window of simultaneity”. It is the basic temporal unit within
which all signals are treated as simultaneous [Ruhnau 1994a, b]. This fact has also
been the decisive factor in fixing the video frame rate. (More precisely, the subdi-
vision into fields of interleaved odd and even lines and the reduced field rate by a
factor of 2 was introduced to cheat human perception because of missing techno-
logical performance levels at the time of definition in the 1930s). This was done to
achieve the impression of smooth analog motion for the observer, even though the
fields are discrete and do represent jumps. When looking at field sequences of
video signals from a static scene, taken at a large angular rate of the camera in the
direction of image lines, a noticeable shift between frames can be observed. For
precise interpretation and early detection of an onset of motion, therefore, the al-
ternating fields at twice the frame rate (frequency of 50, respectively, 60 Hz)
should be analyzed.
Since today’s machine vision very often relies on the old standard video equip-
ment, the basic cycle time for full images is adopted for dynamic machine vision
and for control output. The sampling periods are 16 2/3 ms in the US (33 1/3 ms
for full images or for each odd or even field) and 20 ms (40 ms) in Europe. The de-
cision is justified by the fact that the corner frequency of human extremities for
control actuation is about 2 Hz (arms and legs). In sampled control theory, a dozen
samplings per period are considered sufficient to achieve analogue-like overall be-
havior. Therefore, constant control outputs over one video period are acceptable
from this point of view. Note that the transition to fully digital image sensors in the
near future will allow more freedom in the choice of frame rates.
Processes in the real world are described most compactly by relating temporal
change rates of state variables to the values of the state variables, to the control
variables involved, and to additional perturbations, which can hardly be modeled;
these relations are called differential equations.
They can be transformed into difference equations according to sampled data
theory with constant control output over the sampling period by numerical (or ana-
lytical) integration; perturbations will show up as added accumulated values with
similar statistical properties, as in the analog case. The standard forms for (lin-
earized) state transitions over a time period T are the state transition matrix A(T)
and the control effect matrix B(T). A(T) multiplied by the old state vector yields the
homogeneous part of the new state vector; B(T) describes the effect of constant
unit control inputs onto the new state; multiplying B(T) with the actual control out-
put and adding this to the homogeneous part yields the new state.
Using this knowledge about motion processes of 3-D objects in 3-D space for
image sequence interpretation is the core of the 4-D approach to dynamic vision
developed by [Dickmanns, Wuensche 1987, 1999]. Combining temporal prediction
with the first-order derivative matrix of perspective projection (the “Jacobian ma-
trix” of spatial vision discussed in previous sections) allows bypassing perspective

51 52 53 54 55 56 57 58 59 60 61