Page 228 - Designing Sociable Robots
P. 228
breazeal-79017 book March 18, 2002 14:16
Expressive Vocalization System 209
By analyzing sentence structure, several more influences can be introduced. For instance,
carefully selecting the types of stress placed on emphasized and de-emphasized words, as
well as introducing different kinds of pausing, can be used to strengthen the perception of
negative emotions such as fear, sadness, and disgust. Given the immediate goal of
proto-language, there is no sentence structure to analyze. Nonetheless, to extend Kismet’s
expressive abilities to English sentences, the grammatical and lexical constraints must be
carefully considered.
On a slightly different vein, emotive sounds such as laughter, cries, coos, gurgles, screams,
shrieks, yawns, and so forth could be introduced. DECtalk supports the ability to play pre-
recorded sound files. An initial set of emotive sounds could be modulated to add variability.
Extensions to Utterance Generation
Kismet’s current manner of speech has wide appeal to those who have interacted with the
robot. There is sufficient variability in phoneme, accent, and end syntax choice to permit an
engaging proto-dialogue. If Kismet’s utterance has the intonation of a question, people will
treat it as such—often “re-stating” the question as an English sentence and then answering
it. If Kismet’s intonation has the intonation of a statement, they respond accordingly. They
may say something such as, “Oh, I see,” or perhaps issue another query such as, “So then
what did you do?” The utterances are complex enough to sound as if the robot is speaking
a different language.
Even so, the current utterance generation algorithm is really intended as a placeholder for
a more sophisticated generation algorithm. There is interest in computationally modeling
canonicalbabblingsothattherobotmakesvocalizationscharacteristicofaneight-month-old
child (de Boysson-Bardies, 1999). This would significantly limit the range of the utterances
the robot currently produces, but would facilitate the acquisition of proto-language. Kismet
varies many parameters at once, so the learning space is quite large. By modeling canonical
babbling, the robot can systematically explore how a limited set of parameters modulates
the way its voice sounds. Introducing variations upon a theme during vocal games with the
caregiver as well as on its own could simplify the learning process (see chapters 2 and 3).
By interfacing what the robot vocally generates with what it hears, the robot could begin
to explore its vocal capabilities, how to produce targeted effects, and how these utterances
influence the caregiver’s behavior.
Improvements to Lip Synchronization
Kismet’s lip synchronization and facial animation are compelling and well-matched to
Kismet’s behavior and appearance. The current implementation, however, could be im-
proved upon and extended in a couple of ways. First, the latencies throughout the system

