Page 311 - Concise Encyclopedia of Robotics

P. 311

Speech Recognition
base with the incoming audio signals.This system has two parts: a memory,
in which various speech patterns are stored; and a comparator, which
compares these stored patterns with the data coming in. For each syllable
or word, the circuit checks through its vocabulary until a match is found.
This is done very quickly, so the delay is not noticeable. The size of the
computer’s vocabulary is related directly to its memory capacity. An
advanced speech-recognition system requires a large amount of memory.
The output of the comparator must be processed in some way, so that
the machine knows the difference between words or syllables that sound
alike. Examples are “two/too,”“way/weigh,” and “not/knot.” For this to be
possible,the context and syntax must be examined.There must also be some
way for the computer to tell whether a group of syllables constitutes one
word, two words, three words, or more. The more complicated the voice
input, the greater is the chance for confusion. Even the most advanced
speech-recognition system makes mistakes, just as people sometimes mis-
interpret what you say. Such errors will become less frequent as computer
memory capacity and operating speed increase.
Insinuations and emotions
The ADC in a speech-recognition system removes some of the inflections
from a voice. In the extreme, all of the tonal changes are lost, and the
voice is reduced to “audible text.” For most robot-control purposes, this
is adequate. If a system could be 100 percent reliable in just getting each
word right, speech-recognition engineers would be very pleased. However,
when accuracy does approach 100 percent, there is increasing interest in
getting some of the subtler meanings across, too. Consider the sentence,
“You will go to the store after midnight,” and say it with the emphasis on
each word in turn (eight different ways).The meaning changes dramatically
depending on the prosodic features of your voice: which word or words
you emphasize. Tone is important for another reason, too: a sentence
might be a statement or a question. Thus, “You will go to the store after
midnight?” represents something completely different from “You will go
to the store after midnight!”Even if all the tones are the same, the meaning
can vary depending on how quickly something is said. Even the timing of
breaths can make a difference.
For further information
Speech recognition is a rapidly advancing technology. The best source of
up-to-date information is a good college library. Ask the librarian for ref-
erence books, and for articles in engineering journals, concerning the
most recent developments. A search on the phrases “speech recognition”
and “voice recognition” can be conducted on the Web using Google



306 307 308 309 310 311 312 313 314 315 316