Page 311 - Concise Encyclopedia of Robotics
P. 311

Speech Recognition
                            base with the incoming audio signals.This system has two parts: a memory,
                            in which various speech patterns are stored; and a comparator, which
                            compares these stored patterns with the data coming in. For each syllable
                            or word, the circuit checks through its vocabulary until a match is found.
                            This is done very quickly, so the delay is not noticeable. The size of the
                            computer’s vocabulary is related directly to its memory capacity. An
                            advanced speech-recognition system requires a large amount of memory.
                              The output of the comparator must be processed in some way, so that
                            the machine knows the difference between words or syllables that sound
                            alike. Examples are “two/too,”“way/weigh,” and “not/knot.” For this to be
                            possible,the context and syntax must be examined.There must also be some
                            way for the computer to tell whether a group of syllables constitutes one
                            word, two words, three words, or more. The more complicated the voice
                            input, the greater is the chance for confusion. Even the most advanced
                            speech-recognition system makes mistakes, just as people sometimes mis-
                            interpret what you say. Such errors will become less frequent as computer
                            memory capacity and operating speed increase.
                            Insinuations and emotions
                            The ADC in a speech-recognition system removes some of the inflections
                            from a voice. In the extreme, all of the tonal changes are lost, and the
                            voice is reduced to “audible text.” For most robot-control purposes, this
                            is adequate. If a system could be 100 percent reliable in just getting each
                            word right, speech-recognition engineers would be very pleased. However,
                            when accuracy does approach 100 percent, there is increasing interest in
                            getting some of the subtler meanings across, too. Consider the sentence,
                            “You will go to the store after midnight,” and say it with the emphasis on
                            each word in turn (eight different ways).The meaning changes dramatically
                            depending on the prosodic features of your voice: which word or words
                            you emphasize. Tone is important for another reason, too: a sentence
                            might be a statement or a question. Thus, “You will go to the store after
                            midnight?” represents something completely different from “You will go
                            to the store after midnight!”Even if all the tones are the same, the meaning
                            can vary depending on how quickly something is said. Even the timing of
                            breaths can make a difference.
                            For further information
                            Speech recognition is a rapidly advancing technology. The best source of
                            up-to-date information is a good college library. Ask the librarian for ref-
                            erence books, and for articles in engineering journals, concerning the
                            most recent developments. A search on the phrases “speech recognition”
                            and “voice  recognition” can  be  conducted  on  the  Web  using  Google




                                                   
   306   307   308   309   310   311   312   313   314   315   316