Page 217 - Designing Sociable Robots
P. 217

breazeal-79017  book  March 18, 2002  14:16





                       198                                                             Chapter 11





                       •  GetWordLength() randomly chooses a number between (1, 3). This specifies the number
                       of syllables in a given proto-word.
                       •  GetPunctuation() randomly chooses one of end syntax markers as shown in table 11.7.
                       This is biased by emotional state to influence the end of the pitch contour.

                       •  GetAccent() randomly choose one of six accents (including no accent) as shown in
                       table 11.7.
                       •  assignStress() selects which syllable receives primary stress.
                       •  getVowel() randomly choose one of eighteen vowel phonemes as shown in table 11.6.
                       •  getConsonant() randomly chooses one of twenty-six consonant phonemes as shown in
                       table 11.6.
                       •  getStress() gets the primary stress accent.
                       •  getDuration() randomly chooses a number between (100, 500) that specifies the vowel
                       duration in msec. This selection is biased by the emotional state where lower arousal vowels
                       tend to have longer duration, and high arousal states have shorter duration.

                       11.4  Kismet’s Expressive Utterances


                       Given the phonemic string to be spoken and the updated synthesizer settings, Kismet can
                       vocally express itself with different emotional qualities. To evaluate Kismet’s speech, the
                       produced utterances are analyzed with respect to the acoustical correlates of emotion. This
                       will reveal if the implementation produces similar acoustical changes to the speech wave-
                       form given a specified emotional state. It is also important to evaluate how the affective
                       modulations of the synthesized speech are perceived by human listeners.
                       Analysis of Speech

                       To analyze the performance of the expressive vocalization system, the dominant acoustic
                       features that are highly correlated with emotive state were extracted. The acoustic features
                       and their modulation with emotion are summarized in table 11.1. Specifically, these are
                       average pitch, pitch range, pitch variance, and mean energy. To measure speech rate, the
                       overall time to speak and the total time of voiced segments were determined.
                         These features were extracted from three phrases:

                       •  Look at that picture
                       •  Go to the city
                       •  It’s been moved already
   212   213   214   215   216   217   218   219   220   221   222