Page 82 - Biomimetics : Biologically Inspired Technologies
P. 82
Bar-Cohen : Biomimetics: Biologically Inspired Technologies DK3163_c003 Final Proof page 68 21.9.2005 11:40pm
68 Biomimetics: Biologically Inspired Technologies
the basis that such weak links reflect random and meaningless symbol co-occurrences. It is
important to state that this policy (and the policy of zeroing out co-occurrence counts below
some set number) is arbitrary and definitely subject to refinement (e.g., in the case of high-
frequency target symbols, we sometimes accept values below 0.0001 because these low-probability
links can still be quite meaningful). The final result of this training process is the formation
of 190 knowledge bases, each containing an average of a million or so individual items of
knowledge.
Given this architecture, with its 20 lexicons and 190 knowledge bases, we can now consider
some thought processes using it. The simplest is phrase completion. First, we take a coherent,
meaningful, contiguous string of fewer than 20 words, and represent them on the lexicons of the
architecture; beginning with the first lexicon. The goal is to use these words as context for selecting
the next word in the string (which might be a punctuation; since these are represented in each of the
lexicons). To be concrete, consider a situation where the first three words of a sentence are
provided.
The three words are considered to be assumed facts (Hecht-Nielsen, 2005). They must be
coherent and ‘‘make sense,’’ else the confabulation process will yield no answers. To find the
phrase completion, we use the knowledge bases from the first, second, and third lexicons to
the fourth. The completion is obtained by carrying out confabulation on the fourth lexicon using
a W3. The answer, if there is one, is then the symbol expressed on the fourth lexicon after
confabulation.
With only three words of context (e.g., The only acceptable), the answer that is obtained will
often be one of a huge number of viable possibilities (alternative, person, solution, flight, car,
seasoning, etc., etc., etc. — which can be obtained as an expectation by simply performing a
C3). Language generation usually involves invoking longer-range or abstract context (expressed in
some manner as a set, or multiple sets of assumed facts that act as constraints on the completions or
continuations) to more precisely focus the meaning content of the language construction (which by
the inherent nature of confabulation, is generally automatically grammatical and syntactically
consistent). This context can arise from the same sentence (e.g., by supplying more or more specific
words as assumed facts) or from external bodies of language (e.g., from previous sentences; as
considered in Section 3.3.3).
If we supply more assumed facts or more narrowly specific assumed facts, confabulation can
then supply the best answer from a much more restricted expectation. For example, Mickey and
Minnie will yield only one answer: mouse.
However, using more words in phrase completion (or in sentence continuation; where multiple
successive words are added onto a starting string) introduces some new dilemmas. In particular,
beyond a range of two or three words, the string of words that emerges is likely to be novel in the
sense that some of the early assumed facts may not have knowledge links to distant, newly selected
words in the word string. The design of confabulation architectures and thought processes to handle
this common situation is a key problem that my research group has solved; at least in a preliminary
way. As always, there is no software involved; just proper sequences of thought actions (lexicon
confabulations and knowledge base enablements) that are invoked by the conclusions of previous
confabulations.
For example, consider the assumed facts The canoe trip was going smoothly when all of a
sudden. Such partial sentences will almost certainly not have a next-word symbol that receives
knowledge links from all of the preceding assumed fact symbols. So what procedure shall we use to
select the next word? One answer is to simply go on the preponderance of evidence: select that 12th
lexicon symbol that has the highest input intensity among those symbols which have the maximum
available number of knowledge links. This is accomplished by W. This approach can yield
acceptable answers some of the time; but it does not work as well as one would like. If we were
to attempt sentence continuation with this approach (i.e., adding multiple words), the results are
awful.