Page 208 - Designing Sociable Robots

P. 208

breazeal-79017 book March 18, 2002 14:16

Expressive Vocalization System 189

Table 11.2
A description of the DECtalk synthesizer settings (see the DECtalk Software Reference Guide). Figure 11.3
illustrates the nominal pitch contour for neutral speech, and the net effect of changing these values for different
expressive states. Cahn (1990) presents a detailed description of how each of these settings alters the pitch contour.

DECtalk Synthesizer Setting Description
average pitch (Hz) The average pitch of the pitch contour.
assertiveness (%) The degree to which the voice tends to end statements with a conclusive fall.
baseline fall (Hz) The desired fall (in Hz) of the baseline. The reference pitch contour
around which all rule governed dynamic swings in pitch are about.
breathiness (dB) Speciﬁes the breathy quality of the voice due to the vibration of the vocal folds.
comma pause (ms) Duration of pause due to a comma.
gain of frication Gain of frication sound source.
gain of aspiration Gain of aspiration sounds source.
gain of voicing Gain of voicing sound source.
hat rise (Hz) Nominal hat rise to the pitch contour plateau upon the ﬁrst stressed syllable
of the phrase. The hat-rise inﬂuence lasts throughout the phrase.
laryngealization (%) Creaky voice. Results when the glottal pulse is narrow and the fundamental
period is irregular.
loudness (dB) Controls amplitude of speech waveform.
lax breathiness (%) Speciﬁes the amount of breathiness applied to the end of a sentence when
going from voiced to voiceless sounds.
period pause (ms) Duration of pause due to period.
pitch range (%) Sets the range about the average pitch that the pitch contour expands
and contracts. Speciﬁed in terms of percent of the nominal pitch range.
quickness (%) Controls the speed of response to sudden requests to change pitch
(due to pitch accents). Models the response time of the larynx.
speech rate (wpm) Rate of speech in words per minute.
richness (%) Controls the spectral change at lower frequencies (enhances the lower
frequencies). Rich and brillant voices are more forceful.
smoothness (%) Controls the amound of high frequency energy. There is less high frequency
energy in a smooth voice. Varies inversely with brillance. Smoother voices
sound friendlier.
stress rise (Hz) The nominal height of the pitch rise and fall on each stressed syllable.
This has a local inﬂuence on the contour about the stressed syllable.

speech.ThesevocalaffectparametersmodifytheDECtalksynthesizersettings(summarized
in table 11.2) according to the emotional quality to be expressed. The default values and
max/min bounds for these settings are given in table 11.3. There is currently a single ﬁxed
mapping per emotional quality. Table 11.4 along with the equations presented in section 11.3
summarize how the vocal affect parameters are mapped to the DECtalk synthesizer settings.
Table 11.5 summarizes how each emotional quality of voice is mapped onto the VAPs. Slight
modiﬁcations in Cahn’s speciﬁcations were made for Kismet—this should not be surprising
as a different, more child-like voice was used. The discussion below motivates the mappings
from VAPs to synthesizer settings as shown in ﬁgure 11.4. Cahn (1990) presents a detailed
discussion of how these mappings were derived.

203 204 205 206 207 208 209 210 211 212 213