Page 80 - Biomimetics : Biologically Inspired Technologies
P. 80

Bar-Cohen : Biomimetics: Biologically Inspired Technologies DK3163_c003 Final Proof page 66 21.9.2005 11:40pm




                    66                                      Biomimetics: Biologically Inspired Technologies

                    3.3.1 Phrase Completion and Sentence Continuation

                    This discussion of language cognition begins with consideration of a class of confabulation
                    architectures for dealing with single English sentences. These architectures address the problems
                    of phrase completion and sentence continuation; simple subcases of language generation.This
                    subsection expands upon on the brief introduction to phrase completion provided in Hecht-Nielsen
                    (2005). These architectures provide a good introduction to the ‘‘look and feel’’ of cognitive
                    information processing — which is completely different than the familiar computer paradigm.
                       Figure 3.1 illustrates a confabulation architecture for phrase completion and sentence contin-
                    uation in a single sentence of up to 20 words. Each lexicon has about 63,000 symbols; including
                    symbols for the 63,000 most common words in English (as reflected in the training corpus) and
                    eight punctuations (period, comma, semicolon, etc.), which are treated as separate words. Capital
                    letters are used when they appear in words in the training corpus selected for representation within
                    the word lexicons (i.e., mark and Mark are different words with different symbols). Thus, many
                    of the words in the lexicon are represented twice — once capitalized and once not; some have
                    even more than two representations, e.g., EXIT, Exit, and exit; and some, such as e.g., and the
                    punctuations are never capitalized and only have one representation.
                       Once a suitably ‘‘clean’’ huge proper English text training corpus (typically containing billions
                    of words) has been created, each successive sentence in the corpus is entered, in sequence, into the
                    architecture of Figure 3.1. The first word of the sentence is entered into the leftmost lexicon (i.e.,
                    the symbol representing this word is made active) and the remaining words of the sentence (or
                    punctuations — which, again, are treated as separate words) are entered successively until
                    the ending period. If the sentence has more than 20 words, those words beyond the first 20 are
                    discarded. Because of the positioning of the words of each sentence in order, this architecture is
                    termed position-dependent.
                       It is also possible to use hierarchical ring architectures for representing strings of words; which
                    I believe is probably how the human cortical language architecture is organized. As the words are
                    loaded into the ring of lexicons, they are quickly removed in groups (phrases) and re-represented in
                    lexicons at a higher conceptual level — leaving the lower-level lexicons free for capturing
                    additional words. I believe that this is why humans can only instantly remember ‘‘about 7 things
                    +2’’ (Miller, 1956) — we physically only have about seven lexicons at the word level. When
                    required to remember a sequence of things, we repeatedly rehearse the sequence (to firmly store it in
                    short-term memory) by traversing the ring from the beginning lexicon (which is always the same
                    one for each sentence or word sequence) to the last item and then back to the beginning. However,
                    given the lack of limitations of computer implementations of confabulation architectures (at least
                    conceptually), there is no need for us to use these more complicated ring architectures for this
                    chapter’s introductory discussion.
                       The knowledge bases of the architecture of Figure 3.1 are all causal; meaning that the symbols
                    of each lexicon are only linked to symbols of later lexicons (i.e., those that lie to the right of it);













                    Figure 3.1  Naı ¨ve single-sentence confabulation architecture for proper English phrase completion or sentence
                    continuation. Knowledge bases link each of the first 19 of the 20 lexicons to all of the lexicons to their right.
                    Sentences are represented with the first word in the first lexicon on the left; and so on in sequence. This architecture
                    has a total of 19 þ 18 þ ... þ 1 ¼ 190 knowledge bases.
   75   76   77   78   79   80   81   82   83   84   85