Page 158 - The Definitive Guide to Building Java Robots

P. 158

Preston_5564C05.fm Page 139 Tuesday, September 20, 2005 5:13 AM

CHAPTER 5 ■ SPEECH 139

Figure 5-2. The Microsoft Text To Speech tab

What Is Speech Technology?

Speech technology consists of speech synthesis and speech recognition. The speech recognition
engines are responsible for converting acoustical signals to digital signals, and then to text.
Two modes of speech recognition are available:
• Dictation: Users read data directly into a microphone. The range of words the engine
can recognize is limited to the recognizers, grammar, or dictionary of recognizable words.
• Command and control: Users speak commands or ask questions. The range of words the
engine can recognize in this case is usually defined by a limited grammar. This mode
often eliminates the need to “train” the recognizers.

The speech synthesizer engines are responsible for converting text to a spoken language.
This process first breaks the words into phonemes, which are then transformed to a digital
audio signal for playback.
In this chapter, I’ll introduce two types of speech recognition engines: one for continuous
dictation using JNI (see the following section), and one using command and control. I’ll also
introduce three different speech synthesizers: two in Java and one using JNI.
Before I start with speech recognition or synthesis, the following is a quick-start reference
to the Java Native Interface or JNI.

153 154 155 156 157 158 159 160 161 162 163