Page 98 - Biomimetics : Biologically Inspired Technologies

P. 98

Bar-Cohen : Biomimetics: Biologically Inspired Technologies DK3163_c003 Final Proof page 84 21.9.2005 11:40pm

84 Biomimetics: Biologically Inspired Technologies

As each S vector arrives at the architecture of Figure 3.7, it is sent to the proper lexicon in
sequence. For simplicity, let us assume that the ﬁrst S vector associated with the initial sound
content of the next word is sent to the ﬁrst primary sound lexicon (if it goes to the ‘‘wrong’’ lexicon
or is missed altogether, it does not matter much — as will be explained below). Given that the ﬁrst
primary sound lexicon has an expectation, and that the only symbols in this expectation are those
that represent sounds that a speaker of this type would issue (we each have hundreds of ‘‘canonical
models’’ of speakers having different accents and vocal apparati, and most of us add to this store
throughout life) when speaking early parts of one of the words we are expecting. Again note that,
because of the orthogonalized nature of the S vector and the pure-signal nature of the primary
feature symbols, each of the symbols in this expectation will typically represent sounds having only
a tiny number of S vector components that are nonzero. Each symbol in a primary sound lexicon is
expressed as a unit vector having these small number of components with coefﬁcients near 1, and
all other components at zero. The lexicon takes the inner product of each symbol’s vector
expression with S and this is then used as that symbol’s initial input excitation (this is how symbols
get excited by sensory input signals; in contrast to how symbols get excited by knowledge
links from other symbols, which was discussed in Section 3.1). We have now completed the
transition from acoustic space to symbol space.
Notice that the issue of signal level of the attended source has not been discussed. As described
in Section 3.3.1, each S vector component has its amplitude expressed on a logarithmic scale
(based on ‘‘sound power amplitudes’’ ranging across many orders of magnitude). Thus, on this
scale, the inner product of S with a particular symbol’s unit vector will still (because of the linear
nature of the inner product) be substantial, even if the attended source sounds are tens of dB below
those of some individual interferers. Thus, with this design, attending to weak, but distinct, sources
is generally possible. These are, of course, the characteristics we as humans experience in our own
hearing. Further, in auditory neuroscience, such logarithmic coding of sound feature response
signals (in particular, those from the brainstem auditory nuclei to the medial geniculate nucleus,
which are the auditory signals analogous to the components of S) is well established (Oertel
et al., 2002).
During the entire time of the word detection processes, all of the lexicons of the Figure 3.7
architecture are operated in a consensus building mode. Thus, as soon as the S-input excitations are
established on the expectation element symbols of the ﬁrst primary sound lexicon, only those
symbols which received these expectations remain in the expectation (the consensus building is run
faster on the primary sound lexicons, somewhat slower on the sound phrase lexicons, and even
slower on the next-word acoustic lexicon). This process of expectation reﬁnement that occurs
during consensus building is termed honing.
After acoustic input has arrived at each subsequent primary sound lexicon (the pace of
the switching is set by a separate part of the auditory system, which will not be discussed further
here, which synchronizes the pace of S vector formation — no it is not always exactly every 10 ms
— to the recent pace of speech production of the attended speaker), that lexicon’s expectation
is thereby honed and this revised expectation is then automatically transferred to all of the
sound phrase regions that are not on its right (during consensus building, all of the involved
knowledge bases remain operational). This has the effect of honing some of the sound phrase
lexicon expectations, which then are transferred to the next-word acoustic lexicon; honing its
expectation.
This process works in reverse also. As higher-level lexicon expectations are honed, these are
transferred to lower levels, thereby reﬁning those lower-level expectations. Note that if occasional
erroneous symbols are transferred up to the sound phrase lexicons, or even from the phrase lexicons
to the next-word acoustic lexicon, this will not have much effect. That is because the process of
consensus building effectively ‘‘integrates’’ the impact of all of the incoming transfers on the
symbols of the original expectation. Only when a phrase region has honed its symbol list down to

93 94 95 96 97 98 99 100 101 102 103