Page 108 - Biomimetics : Biologically Inspired Technologies
P. 108

Bar-Cohen : Biomimetics: Biologically Inspired Technologies DK3163_c003 Final Proof page 94 21.9.2005 11:40pm




                    94                                      Biomimetics: Biologically Inspired Technologies

                    process will work. Thus, segmentation might work even better if we had 625 primary visual
                    lexicons (a 25   25 array). The use of more ‘‘complex’’ features, based upon more localized
                    ‘‘simple’’ features (see Appendix), and the precedence principle is one design approach to achiev-
                    ing some of the benefits of more, smaller, lexicons without actually building them.
                       Multiple natural questions arise at this point. First, how well does this design actually work in
                    practice? In other words, how thoroughly does this segmentation process null lexicons coding other
                    objects and how reliably are the lexicons that code the attended object retained? The short answer is
                    that I don’t know. The only evidence I have is based upon experiments done in my lab with a very
                    simple image environment (images of capital Latin alphabetical characters moving about, with
                    slowly randomly changing spatial and angular velocities, on a plane). In this case, a segmentation
                    scheme of this basic type worked very well.
                       In reality, probably not all fixation point objects will segment cleanly. Sometimes irrelevant
                    lexicons will not be nulled, and relevant lexicons will be. However, because of the nature of
                    development process for the secondary and tertiary visual lexicon symbol sets, which is described
                    next, such errors will not matter; as long as these lapses occur randomly and as long as the general
                    quality of the segmentation is fairly good. We will proceed with the assumption that these
                    conditions are satisfied.
                       The goal for the secondary lexicon symbols is twofold. First, each such symbol should be
                    somewhat pose insensitive (i.e., if it responds strongly to an object at one pose it will respond
                    strongly to the same object at nearby poses). Also, each secondary lexicon symbol should represent
                    a larger ‘‘chunk’’ of an object than any primary symbol. Such symbols are said to be more holistic
                    than primary lexicon symbols. Tertiary layer lexicon symbols are to be even more holistic than
                    secondary layer symbols.
                       For secondary and tertiary layer development, sequences of camera images containing the same
                    (operationally relevant) visual object are used. At the beginning of each sequence, we assume that
                    the gaze director has selected a fixation point on the object. In the subsequent frames of the
                    sequence, we check to see that in each, one point near the initial fixation point is also given a
                    high score by the gaze director. If this is true for a significant sequence of frames (say, 10 to 20 or
                    more), then these nearby points on the subsequent frames are designated as the fixation points for
                    those frames and this sequence of eyeball images is added to our training set for layers two and
                    three. It is assumed that this set of sequences provides good statistical coverage of the set of all
                    operationally relevant objects, and that each object is seen in many different, operationally
                    characteristic poses in the sequences. It is also assumed that the poses of the fixated object in
                    each sequence are dynamically changing. (Note: This dynamic variation in pose is needed for
                    training, but is not a requirement for operational use; where objects can be stationary, and yet can
                    still usually be described with a single look.)
                       As shown in Figure 3.9, the secondary layer lexicons receive knowledge links from primary
                    layer lexicons. The arrangement of these links is that a secondary lexicon symbol can only receive a
                    link from symbols lying on primary lexicons surrounding the position of the secondary lexicon in
                    the second layer lexicon array (i.e., like the primary lexicon array, the secondary array is envisioned
                    as also representing, with a regular ‘‘tiling,’’ the eyeball image content of the attended object, but
                    with each secondary lexicon representing a larger ‘‘chunk’’ of this object than a primary layer
                    lexicon — since the secondary layer has fewer lexicons than the primary layer). These knowledge
                    links connect every symbol belonging to each primary lexicon within the ‘‘field of view’’ of a
                    secondary lexicon to every symbol of that secondary lexicon. For each such forward knowledge
                    link, a link between the same two symbols in the reverse (secondary to primary) direction is also
                    created. All of these links start out with zero strength.
                       As mentioned in the Appendix, not all knowledge bases need to have graded p(cjl) strengths.
                    For many purposes in cognition, it is sufficient for knowledge links to simply be present (essentially
                    with strength 1) or absent (with effective ‘‘strength’’ 0). These inter-visual-level knowledge links
                    are of this binary character.
   103   104   105   106   107   108   109   110   111   112   113