Page 101 - Biomimetics : Biologically Inspired Technologies

P. 101

Bar-Cohen : Biomimetics: Biologically Inspired Technologies DK3163_c003 Final Proof page 87 21.9.2005 11:40pm

Mechanization of Cognition 87

can also produce it (in some way). Undoubtedly, this will hold for systems of the type considered in
this section. Thus, to solve the whole speech language understanding problem, we must also solve
the speech language production problem.
In summary, the confabulation theory of vertebrate cognition seems to provide the basis for
mechanizing sound cognition in a manner that has the familiar characteristics of human sound
cognition.

3.5 VISUAL COGNITION

As with sound, the key challenge of vision is to usefully transduce incoming image information into
symbolic form. Another key part of vision is to build symbolic representations of individual visual
objects that are invariant to useful combinations of selected visual attributes such as pose, lighting,
color, and form. These are the main topics of this section. Ancillary subjects, such as the highly
specialized visual human face recognition system and binocular vision are not discussed. Readers
are expected to have a solid understanding of traditional machine vision.

3.5.1 Building an Eyeball Vision Sensor and its Gaze Controller

Vertebrate vision is characterized by the use of eyeballs. A gaze controller is used to direct the
eye(s) to (roughly repeatable) key points on objects of interest. In this section, we will consider only
monocular, panchromatic, visual cognition in detail.
Figure 3.9 illustrates the basic elements of the confabulation-based vision architecture that
will be discussed in this section. For simplicity, the subject of how pointing of the video camera
sensor will be controlled is ignored. It is assumed that the wide-angle large image camera is ﬁxed
and that everything we want to see and visually analyze is within this sensor’s ﬁxed visual ﬁeld of
view and is of sufﬁcient size (number of pixels) to make its attributes visible at the sensor’s
resolution. For example, imagine a wide-angle, high-resolution video camera positioned about 8 ft
above the pavement at a busy downtown street intersection, pointed diagonally across the inter-
section, viewing the people on the sidewalks and the vehicles on the streets.
Assume that the visual sensor (i.e., video camera) gathers digital image frames, each with many
millions of pixels, at a rate of 30 frames per second. For simplicity, each pixel will be assumed to
have its panchromatic (grayscale) brightness measured on a 16-bit linear digital scale.
The gaze controller of this visual system (see Figure 3.9) is provided with all of the pixels
of each individual frame of imagery. Using this input, it decides whether to select a ﬁxation
point (a particular pixel of the frame) for that frame (it can select at most one). The manner
in which a gaze controller can be built (my laboratory has built one [Hecht-Nielsen and Zhou,
1995] and so have a number of others) is described next. To make the discussion which
follows concrete, consider a situation where our video camera sensor is monitoring a street scene
in a busy downtown area. Each still frame of video contains tens of people and a number of cars
driving by.
The basic idea of designing a gaze controller is to mimic human performance. Let an attentive
human visual observer watch the output of the video sensor on a computer screen. Attach an eye
tracker to the screen to monitor the human’s eye movements. These movements will typically be
saccades — jumps of the eye position between one ﬁxation point and the next. At each ﬁxation
point, the human eye gathers image data from a region surrounding the ﬁxation point. This can
be viewed as taking a ‘‘snapshot’’ or ‘‘eyeball’’ image centered at that ﬁxation point. The human
visual system then processes that eyeball image and jumps to the next ﬁxation point selected by
its gaze director (a function which is implemented, in part, by the superior colliculus of the
brainstem). Visual processing is not carried out during these eyeball jumps. While the human

96 97 98 99 100 101 102 103 104 105 106