Page 101 - Biomimetics : Biologically Inspired Technologies
P. 101
Bar-Cohen : Biomimetics: Biologically Inspired Technologies DK3163_c003 Final Proof page 87 21.9.2005 11:40pm
Mechanization of Cognition 87
can also produce it (in some way). Undoubtedly, this will hold for systems of the type considered in
this section. Thus, to solve the whole speech language understanding problem, we must also solve
the speech language production problem.
In summary, the confabulation theory of vertebrate cognition seems to provide the basis for
mechanizing sound cognition in a manner that has the familiar characteristics of human sound
cognition.
3.5 VISUAL COGNITION
As with sound, the key challenge of vision is to usefully transduce incoming image information into
symbolic form. Another key part of vision is to build symbolic representations of individual visual
objects that are invariant to useful combinations of selected visual attributes such as pose, lighting,
color, and form. These are the main topics of this section. Ancillary subjects, such as the highly
specialized visual human face recognition system and binocular vision are not discussed. Readers
are expected to have a solid understanding of traditional machine vision.
3.5.1 Building an Eyeball Vision Sensor and its Gaze Controller
Vertebrate vision is characterized by the use of eyeballs. A gaze controller is used to direct the
eye(s) to (roughly repeatable) key points on objects of interest. In this section, we will consider only
monocular, panchromatic, visual cognition in detail.
Figure 3.9 illustrates the basic elements of the confabulation-based vision architecture that
will be discussed in this section. For simplicity, the subject of how pointing of the video camera
sensor will be controlled is ignored. It is assumed that the wide-angle large image camera is fixed
and that everything we want to see and visually analyze is within this sensor’s fixed visual field of
view and is of sufficient size (number of pixels) to make its attributes visible at the sensor’s
resolution. For example, imagine a wide-angle, high-resolution video camera positioned about 8 ft
above the pavement at a busy downtown street intersection, pointed diagonally across the inter-
section, viewing the people on the sidewalks and the vehicles on the streets.
Assume that the visual sensor (i.e., video camera) gathers digital image frames, each with many
millions of pixels, at a rate of 30 frames per second. For simplicity, each pixel will be assumed to
have its panchromatic (grayscale) brightness measured on a 16-bit linear digital scale.
The gaze controller of this visual system (see Figure 3.9) is provided with all of the pixels
of each individual frame of imagery. Using this input, it decides whether to select a fixation
point (a particular pixel of the frame) for that frame (it can select at most one). The manner
in which a gaze controller can be built (my laboratory has built one [Hecht-Nielsen and Zhou,
1995] and so have a number of others) is described next. To make the discussion which
follows concrete, consider a situation where our video camera sensor is monitoring a street scene
in a busy downtown area. Each still frame of video contains tens of people and a number of cars
driving by.
The basic idea of designing a gaze controller is to mimic human performance. Let an attentive
human visual observer watch the output of the video sensor on a computer screen. Attach an eye
tracker to the screen to monitor the human’s eye movements. These movements will typically be
saccades — jumps of the eye position between one fixation point and the next. At each fixation
point, the human eye gathers image data from a region surrounding the fixation point. This can
be viewed as taking a ‘‘snapshot’’ or ‘‘eyeball’’ image centered at that fixation point. The human
visual system then processes that eyeball image and jumps to the next fixation point selected by
its gaze director (a function which is implemented, in part, by the superior colliculus of the
brainstem). Visual processing is not carried out during these eyeball jumps. While the human