Page 101 - Biomimetics : Biologically Inspired Technologies
P. 101

Bar-Cohen : Biomimetics: Biologically Inspired Technologies DK3163_c003 Final Proof page 87 21.9.2005 11:40pm




                    Mechanization of Cognition                                                   87

                    can also produce it (in some way). Undoubtedly, this will hold for systems of the type considered in
                    this section. Thus, to solve the whole speech language understanding problem, we must also solve
                    the speech language production problem.
                      In summary, the confabulation theory of vertebrate cognition seems to provide the basis for
                    mechanizing sound cognition in a manner that has the familiar characteristics of human sound
                    cognition.



                                               3.5  VISUAL COGNITION

                    As with sound, the key challenge of vision is to usefully transduce incoming image information into
                    symbolic form. Another key part of vision is to build symbolic representations of individual visual
                    objects that are invariant to useful combinations of selected visual attributes such as pose, lighting,
                    color, and form. These are the main topics of this section. Ancillary subjects, such as the highly
                    specialized visual human face recognition system and binocular vision are not discussed. Readers
                    are expected to have a solid understanding of traditional machine vision.


                    3.5.1 Building an Eyeball Vision Sensor and its Gaze Controller

                    Vertebrate vision is characterized by the use of eyeballs. A gaze controller is used to direct the
                    eye(s) to (roughly repeatable) key points on objects of interest. In this section, we will consider only
                    monocular, panchromatic, visual cognition in detail.
                      Figure 3.9 illustrates the basic elements of the confabulation-based vision architecture that
                    will be discussed in this section. For simplicity, the subject of how pointing of the video camera
                    sensor will be controlled is ignored. It is assumed that the wide-angle large image camera is fixed
                    and that everything we want to see and visually analyze is within this sensor’s fixed visual field of
                    view and is of sufficient size (number of pixels) to make its attributes visible at the sensor’s
                    resolution. For example, imagine a wide-angle, high-resolution video camera positioned about 8 ft
                    above the pavement at a busy downtown street intersection, pointed diagonally across the inter-
                    section, viewing the people on the sidewalks and the vehicles on the streets.
                      Assume that the visual sensor (i.e., video camera) gathers digital image frames, each with many
                    millions of pixels, at a rate of 30 frames per second. For simplicity, each pixel will be assumed to
                    have its panchromatic (grayscale) brightness measured on a 16-bit linear digital scale.
                      The gaze controller of this visual system (see Figure 3.9) is provided with all of the pixels
                    of each individual frame of imagery. Using this input, it decides whether to select a fixation
                    point (a particular pixel of the frame) for that frame (it can select at most one). The manner
                    in which a gaze controller can be built (my laboratory has built one [Hecht-Nielsen and Zhou,
                    1995] and so have a number of others) is described next. To make the discussion which
                    follows concrete, consider a situation where our video camera sensor is monitoring a street scene
                    in a busy downtown area. Each still frame of video contains tens of people and a number of cars
                    driving by.
                      The basic idea of designing a gaze controller is to mimic human performance. Let an attentive
                    human visual observer watch the output of the video sensor on a computer screen. Attach an eye
                    tracker to the screen to monitor the human’s eye movements. These movements will typically be
                    saccades — jumps of the eye position between one fixation point and the next. At each fixation
                    point, the human eye gathers image data from a region surrounding the fixation point. This can
                    be viewed as taking a ‘‘snapshot’’ or ‘‘eyeball’’ image centered at that fixation point. The human
                    visual system then processes that eyeball image and jumps to the next fixation point selected by
                    its gaze director (a function which is implemented, in part, by the superior colliculus of the
                    brainstem). Visual processing is not carried out during these eyeball jumps. While the human
   96   97   98   99   100   101   102   103   104   105   106