Page 98 - Biomimetics : Biologically Inspired Technologies
P. 98

Bar-Cohen : Biomimetics: Biologically Inspired Technologies DK3163_c003 Final Proof page 84 21.9.2005 11:40pm




                    84                                      Biomimetics: Biologically Inspired Technologies

                       As each S vector arrives at the architecture of Figure 3.7, it is sent to the proper lexicon in
                    sequence. For simplicity, let us assume that the first S vector associated with the initial sound
                    content of the next word is sent to the first primary sound lexicon (if it goes to the ‘‘wrong’’ lexicon
                    or is missed altogether, it does not matter much — as will be explained below). Given that the first
                    primary sound lexicon has an expectation, and that the only symbols in this expectation are those
                    that represent sounds that a speaker of this type would issue (we each have hundreds of ‘‘canonical
                    models’’ of speakers having different accents and vocal apparati, and most of us add to this store
                    throughout life) when speaking early parts of one of the words we are expecting. Again note that,
                    because of the orthogonalized nature of the S vector and the pure-signal nature of the primary
                    feature symbols, each of the symbols in this expectation will typically represent sounds having only
                    a tiny number of S vector components that are nonzero. Each symbol in a primary sound lexicon is
                    expressed as a unit vector having these small number of components with coefficients near 1, and
                    all other components at zero. The lexicon takes the inner product of each symbol’s vector
                    expression with S and this is then used as that symbol’s initial input excitation (this is how symbols
                    get excited by sensory input signals; in contrast to how symbols get excited by knowledge
                    links from other symbols, which was discussed in Section 3.1). We have now completed the
                    transition from acoustic space to symbol space.
                       Notice that the issue of signal level of the attended source has not been discussed. As described
                    in Section 3.3.1, each S vector component has its amplitude expressed on a logarithmic scale
                    (based on ‘‘sound power amplitudes’’ ranging across many orders of magnitude). Thus, on this
                    scale, the inner product of S with a particular symbol’s unit vector will still (because of the linear
                    nature of the inner product) be substantial, even if the attended source sounds are tens of dB below
                    those of some individual interferers. Thus, with this design, attending to weak, but distinct, sources
                    is generally possible. These are, of course, the characteristics we as humans experience in our own
                    hearing. Further, in auditory neuroscience, such logarithmic coding of sound feature response
                    signals (in particular, those from the brainstem auditory nuclei to the medial geniculate nucleus,
                    which are the auditory signals analogous to the components of S) is well established (Oertel
                    et al., 2002).
                       During the entire time of the word detection processes, all of the lexicons of the Figure 3.7
                    architecture are operated in a consensus building mode. Thus, as soon as the S-input excitations are
                    established on the expectation element symbols of the first primary sound lexicon, only those
                    symbols which received these expectations remain in the expectation (the consensus building is run
                    faster on the primary sound lexicons, somewhat slower on the sound phrase lexicons, and even
                    slower on the next-word acoustic lexicon). This process of expectation refinement that occurs
                    during consensus building is termed honing.
                       After acoustic input has arrived at each subsequent primary sound lexicon (the pace of
                    the switching is set by a separate part of the auditory system, which will not be discussed further
                    here, which synchronizes the pace of S vector formation — no it is not always exactly every 10 ms
                    — to the recent pace of speech production of the attended speaker), that lexicon’s expectation
                    is thereby honed and this revised expectation is then automatically transferred to all of the
                    sound phrase regions that are not on its right (during consensus building, all of the involved
                    knowledge bases remain operational). This has the effect of honing some of the sound phrase
                    lexicon expectations, which then are transferred to the next-word acoustic lexicon; honing its
                    expectation.
                       This process works in reverse also. As higher-level lexicon expectations are honed, these are
                    transferred to lower levels, thereby refining those lower-level expectations. Note that if occasional
                    erroneous symbols are transferred up to the sound phrase lexicons, or even from the phrase lexicons
                    to the next-word acoustic lexicon, this will not have much effect. That is because the process of
                    consensus building effectively ‘‘integrates’’ the impact of all of the incoming transfers on the
                    symbols of the original expectation. Only when a phrase region has honed its symbol list down to
   93   94   95   96   97   98   99   100   101   102   103