Page 75 - Biomimetics : Biologically Inspired Technologies
P. 75

Bar-Cohen : Biomimetics: Biologically Inspired Technologies DK3163_c003 Final Proof page 61 21.9.2005 11:40pm




                    Mechanization of Cognition                                                   61

                    However, before discussing language cognition, the next section discusses the currently available
                    general methods of antecedent support knowledge acquisition: training and education.


                                           3.2  TRAINING AND EDUCATION

                    As discussed above, current confabulation technology is limited to development of knowledge
                    using some externally guided process; not via dynamic, autonomous goal and drive satisfaction-
                    driven memory formation, as in brains. This section discusses the two main processes currently
                    used in knowledge development: training and education. When dynamic memory formation
                    eventually arrives, training and education will still be important learning processes (but no longer
                    the only ones).

                    3.2.1 Training

                    Training is a knowledge acquisition process that is carried out in a batch mode without any
                    significant active supervision or conditional intervention. It is a learning mode that can only be
                    applied when the data set to be used has been carefully prepared. For example, in learning proper
                    English language structure it is possible to take a huge (multi-gigaword) proper text corpus and
                    train knowledge bases between lexicons representing the words in English (e.g., Hecht-Nielsen,
                    2005 presented an example of this). The corpus used must be near-perfect. It must be purged of
                    words, punctuation, and characters that are not within the selected word list and must not have any
                    strange annotation text, embedded tables, or markup headers that will be inadvertently used for
                    learning. Achieving this level of cleanliness in a huge training corpus which, necessarily — for
                    diversity, is drawn from many sources, is expensive and time consuming.
                      Once a suitably clean text corpus has been created, each sentence is considered as a whole item
                    (up to a chosen maximum allowed number of words — e.g., 20 — after which the sentence is
                    simply truncated). The confabulation architecture to be trained has as many word lexicons (in a
                    linear sequence) as the maximum number of allowed words in a sentence. The words of the
                    sentence are represented by active symbols on the corresponding lexicons of the architecture (see
                    Section 3.3 for more details). Co-occurrence counts are then recorded for each causal pair of
                    symbols (i.e., between each symbol and each of the symbols on lexicons further down the temporal
                    sequence of lexicons). Once these counts are recorded, the process moves on to the next sentence of
                    the training corpus.
                      A beautiful thing about training is that the result is knowledge that presumably has the same
                    origin and legal standing as knowledge obtained from material that a person has read; but which
                    they do not remember in detail. Namely, this knowledge is presumably not subject to source
                    copyright restrictions or other source intellectual property restrictions. Use of raw data for training
                    probably falls under the category of ‘‘fair use,’’ which eliminates any need to pay royalties.
                    Confabulation-based systems may thus be able to absorb whole libraries of knowledge without
                    cost. This is fair use because the content of the work is not stored and cannot be recalled. (How
                    much does your library charge you in royalties for reading a book? Answer: Absolutely nothing,
                    because reading a library book is fair use.) This fortuitous loophole may allow cognitive machines
                    to rapidly and efficiently accumulate almost all human knowledge; without having to pay any
                    royalties and without the delays associated with working through legal and bureaucratic objections.
                    Mechanizers of cognition may want to expose their systems to the available libraries of written
                    knowledge at the first possible opportunity; before legal innovators find ways of closing this
                    loophole. It may not be long before intelligent machines are as unwelcome at libraries as blackjack
                    card counters are at casinos.
                      In the near term, early confabulation entrepreneurs will probably use libraries, web scrapers, or
                    informally obtained e-mail message examples (for text knowledge), informal public volunteer web
   70   71   72   73   74   75   76   77   78   79   80