Page 56 - Dynamic Vision for Perception and Control of Motion
P. 56

40     2  Basic Relations: Image Sequences – “the World”


            depends on the application area; with humans as the main partner in dealing with
            the real world, their characteristic timescale will also predominate for the technical
            systems under investigation here.
              Due to the fact that humans need at least 30 ms between two signals sensed, to
            be able to tell their correct sequence (independent of the sensory modality: tactile,
            auditory, or visual) [Pöppel et al. 1991; Pöppel, Schill 1995], this time-window of 30
            ms is considered the “window of simultaneity”. It is the basic temporal unit within
            which all signals are treated as simultaneous [Ruhnau 1994a, b].  This fact has also
            been the decisive factor in fixing the video frame rate. (More precisely, the subdi-
            vision into fields of interleaved odd and even lines and the reduced field rate by a
            factor of 2 was introduced to cheat human perception because of missing techno-
            logical performance levels at the time of definition in the 1930s). This was done to
            achieve the impression of smooth analog motion for the observer, even though the
            fields are discrete and do represent jumps. When looking at field sequences of
            video signals from a static scene, taken at a large angular rate of the camera in the
            direction of image lines, a noticeable shift between frames can be observed. For
            precise interpretation and early detection of an onset of motion, therefore, the al-
            ternating fields at twice the frame rate (frequency of 50,  respectively, 60  Hz)
            should be analyzed.
              Since today’s machine vision very often relies on the old standard video equip-
            ment, the basic cycle time for full images is adopted for dynamic machine vision
            and for control output. The sampling periods are 16 2/3 ms in the US (33 1/3 ms
            for full images or for each odd or even field) and 20 ms (40 ms) in Europe. The de-
            cision is justified by the fact that the corner frequency of human extremities for
            control actuation is about 2 Hz (arms and legs). In sampled control theory, a dozen
            samplings per period are considered sufficient to achieve analogue-like overall be-
            havior. Therefore, constant control outputs over one video period are acceptable
            from this point of view. Note that the transition to fully digital image sensors in the
            near future will allow more freedom in the choice of frame rates.
              Processes in the real world are described most compactly by relating temporal
            change rates of state variables to the values of the state variables, to the control
            variables involved, and to additional perturbations, which can hardly be modeled;
            these relations are called differential equations.
              They can be transformed into difference equations according to sampled data
            theory with constant control output over the sampling period by numerical (or ana-
            lytical) integration; perturbations will show up as added accumulated values with
            similar statistical properties, as in the analog case. The  standard forms for  (lin-
            earized) state transitions over a time period T are the state transition matrix A(T)
            and the control effect matrix B(T). A(T) multiplied by the old state vector yields the
            homogeneous part of  the new state vector; B(T) describes the effect of constant
            unit control inputs onto the new state; multiplying B(T) with the actual control out-
            put and adding this to the homogeneous part yields the new state.
              Using this knowledge about motion processes of 3-D objects in 3-D space for
            image sequence interpretation is the core of the 4-D approach to dynamic vision
            developed by  [Dickmanns, Wuensche  1987, 1999]. Combining temporal prediction
            with the first-order derivative matrix of perspective projection (the “Jacobian ma-
            trix” of spatial vision discussed in previous sections) allows bypassing perspective
   51   52   53   54   55   56   57   58   59   60   61