Page 75 - Biomimetics : Biologically Inspired Technologies
P. 75
Bar-Cohen : Biomimetics: Biologically Inspired Technologies DK3163_c003 Final Proof page 61 21.9.2005 11:40pm
Mechanization of Cognition 61
However, before discussing language cognition, the next section discusses the currently available
general methods of antecedent support knowledge acquisition: training and education.
3.2 TRAINING AND EDUCATION
As discussed above, current confabulation technology is limited to development of knowledge
using some externally guided process; not via dynamic, autonomous goal and drive satisfaction-
driven memory formation, as in brains. This section discusses the two main processes currently
used in knowledge development: training and education. When dynamic memory formation
eventually arrives, training and education will still be important learning processes (but no longer
the only ones).
3.2.1 Training
Training is a knowledge acquisition process that is carried out in a batch mode without any
significant active supervision or conditional intervention. It is a learning mode that can only be
applied when the data set to be used has been carefully prepared. For example, in learning proper
English language structure it is possible to take a huge (multi-gigaword) proper text corpus and
train knowledge bases between lexicons representing the words in English (e.g., Hecht-Nielsen,
2005 presented an example of this). The corpus used must be near-perfect. It must be purged of
words, punctuation, and characters that are not within the selected word list and must not have any
strange annotation text, embedded tables, or markup headers that will be inadvertently used for
learning. Achieving this level of cleanliness in a huge training corpus which, necessarily — for
diversity, is drawn from many sources, is expensive and time consuming.
Once a suitably clean text corpus has been created, each sentence is considered as a whole item
(up to a chosen maximum allowed number of words — e.g., 20 — after which the sentence is
simply truncated). The confabulation architecture to be trained has as many word lexicons (in a
linear sequence) as the maximum number of allowed words in a sentence. The words of the
sentence are represented by active symbols on the corresponding lexicons of the architecture (see
Section 3.3 for more details). Co-occurrence counts are then recorded for each causal pair of
symbols (i.e., between each symbol and each of the symbols on lexicons further down the temporal
sequence of lexicons). Once these counts are recorded, the process moves on to the next sentence of
the training corpus.
A beautiful thing about training is that the result is knowledge that presumably has the same
origin and legal standing as knowledge obtained from material that a person has read; but which
they do not remember in detail. Namely, this knowledge is presumably not subject to source
copyright restrictions or other source intellectual property restrictions. Use of raw data for training
probably falls under the category of ‘‘fair use,’’ which eliminates any need to pay royalties.
Confabulation-based systems may thus be able to absorb whole libraries of knowledge without
cost. This is fair use because the content of the work is not stored and cannot be recalled. (How
much does your library charge you in royalties for reading a book? Answer: Absolutely nothing,
because reading a library book is fair use.) This fortuitous loophole may allow cognitive machines
to rapidly and efficiently accumulate almost all human knowledge; without having to pay any
royalties and without the delays associated with working through legal and bureaucratic objections.
Mechanizers of cognition may want to expose their systems to the available libraries of written
knowledge at the first possible opportunity; before legal innovators find ways of closing this
loophole. It may not be long before intelligent machines are as unwelcome at libraries as blackjack
card counters are at casinos.
In the near term, early confabulation entrepreneurs will probably use libraries, web scrapers, or
informally obtained e-mail message examples (for text knowledge), informal public volunteer web