Page 111 - Biomimetics : Biologically Inspired Technologies

P. 111

Bar-Cohen : Biomimetics: Biologically Inspired Technologies DK3163_c003 Final Proof page 97 21.9.2005 11:40pm

Mechanization of Cognition 97

object. These sentences are designed to convey to the blind person useful information about the
nature of the object and its visual attributes (information that can be extracted by the human
educator just by looking at the visual representation of the object).
To train the links from the vision module to the language module (every visual lexicon is
afforded a knowledge base to every phrase lexicon), the educator’s sentences are entered, in order,
into the word lexicons of the sentence modules (each of which represents one sentence — see
Figure 3.12); each sentence is parsed into phrases (see Section 3.4); and these phrases are
represented on the sentence summary lexicon of each sentence. Counts are accumulated between
the symbols active on the visual module’s tertiary lexicons and those active on the summary
lexicons. If the educator wishes to describe speciﬁc visual subcomponents of the object, they
may designate a local window in the eyeball image for each subcomponent and supply the
sentence(s) describing each such subcomponent. The secondary and tertiary lexicon symbols
representing the subcomponents within each image are then linked to the summary lexicons of
the associated sentences. Before being used in this application, all of the internal knowledge bases
of the language module have already been trained using a huge text training corpus.
After a sufﬁcient number of education examples have been accumulated (as determined by ﬁnal
performance — described below), the link use counts are converted into p(cjl) probabilities and
frozen. The knowledge bases from the visual module’s lexicons to all of the sentence summary
lexicons are then combined (so that the available long-range context can be exploited by a sentence
in any position in the sequence of sentences to be generated). The annotation system is now ready
for testing.
The testing phase is carried out by having a sighted evaluator walk down the street wearing the
system (yes, the idea is that the entire system is in the form of a pair of glasses!). As the visual
module selects and describes each object, knowledge link inputs are sent to the language module.
These inputs are used, much as in the example of Section 3.3: as context that drives formation of
a sentence (only now there is no starter). Using consensus building (and separate sentence starter
generator and sentence terminator subsystems — not shown in Figure 3.12 and not discussed here

Figure 3.12 Image text annotation. A simple example of linking a visual module with a (text) language module.
See text for description.

106 107 108 109 110 111 112 113 114 115 116