Page 222 - Designing Sociable Robots
P. 222
breazeal-79017 book March 18, 2002 14:16
Expressive Vocalization System 203
problematic. For all other expressive qualities, the performance was significantly above
random. Furthermore, misclassifications were highly correlated to similar emotions. For
instance, “anger” was sometimes confused with “disgust” (sharing negative valence) or
“surprise/excitement” (both sharing high arousal). “Disgust” was confused with other
negative emotions. “Fear” was confused with other high arousal emotions (with “sur-
prise/excitement” in particular). The distribution for “happy” was more spread out, but
it was most often confused with “surprise/excitement,” with which it shares high arousal.
Kismet’s “sad” speech was confused with other negative emotions. The distribution for
“surprise/excitement” was broad, but it was most often confused for “fear.”
Since this study, the vocal affect parameter values have been adjusted to improve the
distinction between “fear” and “surprise.” Kismet’s fearful affect has gained a more appre-
hensive quality by lowering the volume and giving the voice a slightly raspy quality (this
was the version that was analyzed in section 11.4). In a previous study I found that peo-
ple often associated the raspy vocal quality with whispering and apprehension. “Surprise”
has also been enhanced by increasing the amount of stress rise on the stressed syllable of
the final word. Cahn analyzed the sentence structure to introduce irregular pauses into her
implementation of “fear.” This makes a significant contribution to the interpretation of this
emotional state. In practice, however, Kismet only babbles, so modifying the pausing via
analysis of sentence structure is premature as sentences do not exist.
Given the number and homogeneity of subjects, I cannot make strong claims regarding
Kismet’s ability to convey emotion through expressive speech. More extensive studies need
to be carried out, yet, for the purposes of evaluation, the current set of data is promising.
Misclassifications are particularly informative. The mistakes are highly correlated with
similar emotions, which suggests that arousal and valence are conveyed to people (arousal
being more consistently conveyed than valence). I am using the results of this study to
improve Kismet’s expressive qualities. In addition, Kismet expresses itself through multiple
modalities, not just through voice. Kismet’s facial expression and body posture should help
resolve the ambiguities encountered through voice alone.
11.5 Real-Time Lip Synchronization and Facial Animation
Given Kismet’s ability to express itself vocally, it is important that the robot also be able to
support this vocal channel with coordinated facial animation. This includes synchronized lip
movements to accompany speech along with facial animation to lend additional emphasis to
the stressed syllables. These complementary motor modalities greatly enhance the robot’s
delivery when it speaks, giving the impression that the robot “means” what it says. This
makes the interaction more engaging for the human and facilitates proto-dialogue.

