Page 213 - Designing Sociable Robots

P. 213

breazeal-79017 book March 18, 2002 14:16

194 Chapter 11

11.3 Implementation Overview

This section overviews the process for generating an expressive utterance and having Kismet
say it with lip synchronization and facial animation. Figure 11.1 shows the controls of the
expressive speech synthesizer as it appears on the NT machine. The active emotion is
established by the emotion system as described in chapter 8. Currently, Kismet can vocalize
an utterance with one of seven expressive states (anger, disgust, fear, happy, sad, surprise,
and neutral). The decision to engage in vocal behavior is determined by the behavior system
(chapter 9). This information is passed to the motor skills system where the request to speak
with a given affective state is issued to the vocalization system. In the remainder of this
chapter, I present how the vocalization system processes this request.
The algorithm for generating and performing an expressive Kismet-like vocalization is
as follows:

1. Update vocal affect parameters based on current emotion.
2. Map from vocal affect parameters to synthesizer settings.
3. Generate the utterance to speak.
4. Assemble the full command and send it to the synthesizer.
5. Extract features from speech signal for lip synchronization.
6. Send the speech signal to the sound card.
7. Execute lip synchronization movements.
Mapping Vocal Affect Parameters to Synthesizer Settings

The vocal affect parameters outlined in section 11.2 are derived from the acoustic correlates
of emotion in human speech. To have DECtalk produce these effects in synthesized speech,
these vocal affect parameters must be computationally mapped to the underlying synthe-
sizer settings. There is a single ﬁxed mapping per emotional quality. With some minor
modiﬁcations, Cahn’s mapping functions are adapted to Kismet’s implementation.
The vocal affect parameters can assume integer values within the range of (−10, 10).
Negative numbers correspond to lesser effects, positive numbers correspond to greater
effects, and zero is the neutral setting. These values are set according to the current speciﬁed
emotion as shown in table 11.5.
Linear changes in these parameter values result in a non-linear change in synthesizer
settings. Furthermore, the mapping between parameters and synthesizer settings is not
necessarily one-to-one. Each parameter affects a percent of the ﬁnal synthesizer setting’s
value (table 11.4). When a synthesizer setting is modulated by more than one parameter, its

208 209 210 211 212 213 214 215 216 217 218