Page 105 - Biomimetics : Biologically Inspired Technologies
P. 105
Bar-Cohen : Biomimetics: Biologically Inspired Technologies DK3163_c003 Final Proof page 91 21.9.2005 11:40pm
Mechanization of Cognition 91
Training is continued until the perceptron learning curve (as calculated by considering the
performance of the perceptron when tested on, say, the last 1000 training examples) reaches a
sufficiently low value (say, 80% of the training example pairs would be declared as fixation points
and not fixation points, respectively). Final testing is carried out on hundreds of fresh examples not
used in testing. If, say, 70% of the final testing examples are classified correctly, then the gaze
controller is frozen and ready for service. If not, then additional training is called for. After training,
the outputs are scaled to reflect operational class a priori probabilities (Hecht-Nielsen, 2004).
It is natural to doubt that the above procedure would produce a functional gaze controller that
would mimic the performance of a skilled human. But it can! The reason is probably that the human
superior colliculus is essentially a fixed neuronal machine (at least in autonomous operational mode
where no external control is exerted — there are several brain nuclei that can send ‘‘commands’’ to
the superior colliculus which override its indigenous decisions) that is not all that ‘‘smart’’ (it
operates very fast, in what looks like a ‘‘flow through’’ processing mode). Thus, its natural internal
function is capable of being fairly accurately mimicked by a perceptron.
3.5.2 Building the Primary Visual Lexicons and Knowledge Bases
After the gaze controller has finished its training, it is time to build the rest of the visual system (and
link it up with the language module). The first step is to set up the camera and start feeding frames to
the gaze controller. Every time it chooses a fixation point (which is of necessity, a grid point), the V
components of the gridpoints lying within the eyeball image centered at that fixation gridpoint
(Figure 3.9) are gathered to form the eyeball image description vector (or just eyeball vector)U.
Just as in the design of mammalian primary visual cortex, each of the primary visual lexicons is
responsible for monitoring a small local neighborhood of the eyeball image (these neighborhoods
are all regularly spaced, they overlap somewhat, and they completely cover the eyeball image). For
example, using the example numbers provided above, each primary visual lexicon (of which, for
illustration purposes, Figure 3.9 shows 36, but there might actually be say, 81) would monitor the U
components from say, 4900 gridpoints within and adjacent to its neighborhood of the eyeball
image. The vector formed by these selected U components constitutes the input vector to that
lexicon.
Now comes the tricky part! In order to train the primary visual lexicons, it is essential that, while
this training is underway, the images being gathered by the high-resolution video camera have only
ONE VISUAL OBJECT (an object of operational interest) in them; and nothing else. Further, all
visual objects that will ever be of interest to the system must be presented in this manner during this
training phase. As mentioned in the Appendix, in humans, this requirement is met by physically
altering the characteristics of the baby’s eyes after it passes through this stage (during which its
vision is limited in range out to about arm length). Similarly in other mammals. For artificial visual
cognition, a way must be found to meet this critical requirement. For many applications, motion
segmentation, and rejecting eyeball images with more than one object fragment in them (as
determined by a human educator supervising visual knowledge acquisition), will work.
The symbols of each lexicon are built by collecting input vectors from a huge collection of
eyeball images (selected by the gaze controller from images gathered in the operational visual
environment), but where each eyeball image contains only one object (as described in the previous
paragraph). These input vectors for each lexicon are then used to build a VQ codebook for that
lexicon (Zador, 1963) which is sufficiently large so that, as training progresses, very few input
vectors are relatively far (more than the local intra-codebook vector distance) from a codebook
vector. Once this criterion is met, the codebook is frozen and one symbol is created for, and
uniquely associated with, each codebook vector. This is how the primary visual lexicon symbol sets
are developed. As discussed in the Appendix, it can also be useful (but it is not essential) to develop
‘‘complex feature detector’’ symbols and invoke the precedence principle, as in mammalian
primary visual cortex. However, this possibility will be largely ignored here.