Page 106 - Biomimetics : Biologically Inspired Technologies
P. 106

Bar-Cohen : Biomimetics: Biologically Inspired Technologies DK3163_c003 Final Proof page 92 21.9.2005 11:40pm




                    92                                      Biomimetics: Biologically Inspired Technologies

                       Once the primary lexicon symbol sets are developed, the next step is to develop the knowledge
                    bases between these lexicons. For simplicity, we can assume that every primary visual lexicon is
                    connected to every other by a knowledge base.
                       The primary visual layer (i.e., the primary visual lexicons and the knowledge bases linking
                    them) knowledge bases are trained using large quantities of new video gathered from the oper-
                    ational source, with the gaze controller selecting fixation points. Again, it is somehow arranged that
                    each eyeball image contains only an object of operational interest at the fixation point and no visual
                    elements of other objects (i.e., the rest of the eyeball image is blank).
                       As each eyeball image vector U is created and its selected subsidiary components (making up
                    the 81 primary visual lexicon input vectors) are sent to the primary visual lexicons, each lexicon
                    expresses an expectation with the, say, 10 symbols whose associated codebook vectors lie closest to
                    its input vector. Count accumulation then takes place for all (unidirectional) links between pairs of
                    these expectation symbols lying on different lexicons.
                       The idea of using the ten closest symbols is based upon the discovery (Caid and Hecht-
                    Nielsen, 2001, 2004) that jet correlation vectors which are near to one another in the Euclidean
                    metric (i.e., in the VQ space of a lexicon) represent local visual appearances that are (to a human
                    observer) visually similar to each other; AND VICE VERSA. This valuable fact was pointed out
                    in the 1980s by John Daugman (Daugman, 1985, 1987, 1988a,b; Daugman and Kammen, 1987)
                    (Daugman also invented the iris scan biometric signature). This way, symbols which could
                    reasonably occur together meaningfully within the same object become linked. This is much
                    more efficient and effective than if each lexicon simply expressed the one closest symbol;
                    and yet, because of Daugman’s important principle, no harm can come of this expansion to
                    multiple symbols. The key point is that counts are kept between each of the combinatorially
                    many ordered excited symbol pairs (of symbols on different lexicons) involved. The process of
                    deriving the p(cjl) knowledge link strengths ensures that only the meaningful links are retained
                    in the end.
                       As training progresses, the p(cjl) knowledge link strengths are periodically calculated from the
                    symbol co-occurrence count matrices (of which there is one for each knowledge base). When the
                    meaningful p(cjl) values stop changing much, training is ended. The primary visual layer is now
                    complete.

                    3.5.3 Building the Secondary and Tertiary Visual Layers

                    After completion of the primary visual layer, it is time to build the secondary and tertiary visual
                    layers. However, this process again requires that the primary visual layer representation of each
                    eyeball image pertain to only one object — which can now be accomplished using the primary
                    layer’s knowledge bases, as described next.
                       Figure 3.11 shows a portion of a frame from the wide-angle high-resolution panchromatic video
                    camera containing an eyeball image that has been selected by the gaze controller. Each of the 81
                    primary visual lexicons shown is receiving its input vector from this eyeball image. The first thing
                    that happens is that each lexicon expresses an expectation consisting of those (again, say, 10)
                    symbols which were closest to that lexicon’s input vector. (Note: This is similar to a C1F effect,
                    except that the inputs are not coming from knowledge links, but from ‘‘extra-cortical sensory
                    afferents.’’ This illustrates, as does the handling of the S vector by primary sound lexicons discussed
                    in Section 3.4, how the handling of these special external sensory inputs is very similar to the
                    handling of knowledge link inputs.)
                       Once the primary visual lexicon expectations are established, knowledge links proceeding from
                    the central lexicon of the primary layer, and its immediate neighboring lexicons, outward are
                    enabled (allowing all symbols of all expectations of those lexicons to transmit) and the distal
                    lexicons that these links target receive C1F commands. Those distal lexicons that do not receive
                    links to symbols of their (previously established and frozen) expectations describing their portion of
   101   102   103   104   105   106   107   108   109   110   111