Page 208 - Designing Sociable Robots
P. 208

breazeal-79017  book  March 18, 2002  14:16





                       Expressive Vocalization System                                       189





                       Table 11.2
                       A description of the DECtalk synthesizer settings (see the DECtalk Software Reference Guide). Figure 11.3
                       illustrates the nominal pitch contour for neutral speech, and the net effect of changing these values for different
                       expressive states. Cahn (1990) presents a detailed description of how each of these settings alters the pitch contour.

                       DECtalk Synthesizer Setting  Description
                       average pitch (Hz)   The average pitch of the pitch contour.
                       assertiveness (%)    The degree to which the voice tends to end statements with a conclusive fall.
                       baseline fall (Hz)   The desired fall (in Hz) of the baseline. The reference pitch contour
                                            around which all rule governed dynamic swings in pitch are about.
                       breathiness (dB)     Specifies the breathy quality of the voice due to the vibration of the vocal folds.
                       comma pause (ms)     Duration of pause due to a comma.
                       gain of frication    Gain of frication sound source.
                       gain of aspiration   Gain of aspiration sounds source.
                       gain of voicing      Gain of voicing sound source.
                       hat rise (Hz)        Nominal hat rise to the pitch contour plateau upon the first stressed syllable
                                            of the phrase. The hat-rise influence lasts throughout the phrase.
                       laryngealization (%)  Creaky voice. Results when the glottal pulse is narrow and the fundamental
                                            period is irregular.
                       loudness (dB)        Controls amplitude of speech waveform.
                       lax breathiness (%)  Specifies the amount of breathiness applied to the end of a sentence when
                                            going from voiced to voiceless sounds.
                       period pause (ms)    Duration of pause due to period.
                       pitch range (%)      Sets the range about the average pitch that the pitch contour expands
                                            and contracts. Specified in terms of percent of the nominal pitch range.
                       quickness (%)        Controls the speed of response to sudden requests to change pitch
                                            (due to pitch accents). Models the response time of the larynx.
                       speech rate (wpm)    Rate of speech in words per minute.
                       richness (%)         Controls the spectral change at lower frequencies (enhances the lower
                                            frequencies). Rich and brillant voices are more forceful.
                       smoothness (%)       Controls the amound of high frequency energy. There is less high frequency
                                            energy in a smooth voice. Varies inversely with brillance. Smoother voices
                                            sound friendlier.
                       stress rise (Hz)     The nominal height of the pitch rise and fall on each stressed syllable.
                                            This has a local influence on the contour about the stressed syllable.


                       speech.ThesevocalaffectparametersmodifytheDECtalksynthesizersettings(summarized
                       in table 11.2) according to the emotional quality to be expressed. The default values and
                       max/min bounds for these settings are given in table 11.3. There is currently a single fixed
                       mapping per emotional quality. Table 11.4 along with the equations presented in section 11.3
                       summarize how the vocal affect parameters are mapped to the DECtalk synthesizer settings.
                       Table 11.5 summarizes how each emotional quality of voice is mapped onto the VAPs. Slight
                       modifications in Cahn’s specifications were made for Kismet—this should not be surprising
                       as a different, more child-like voice was used. The discussion below motivates the mappings
                       from VAPs to synthesizer settings as shown in figure 11.4. Cahn (1990) presents a detailed
                       discussion of how these mappings were derived.
   203   204   205   206   207   208   209   210   211   212   213