Page 228 - Designing Sociable Robots
P. 228

breazeal-79017  book  March 18, 2002  14:16





                       Expressive Vocalization System                                       209





                       By analyzing sentence structure, several more influences can be introduced. For instance,
                       carefully selecting the types of stress placed on emphasized and de-emphasized words, as
                       well as introducing different kinds of pausing, can be used to strengthen the perception of
                       negative emotions such as fear, sadness, and disgust. Given the immediate goal of
                       proto-language, there is no sentence structure to analyze. Nonetheless, to extend Kismet’s
                       expressive abilities to English sentences, the grammatical and lexical constraints must be
                       carefully considered.
                         On a slightly different vein, emotive sounds such as laughter, cries, coos, gurgles, screams,
                       shrieks, yawns, and so forth could be introduced. DECtalk supports the ability to play pre-
                       recorded sound files. An initial set of emotive sounds could be modulated to add variability.
                       Extensions to Utterance Generation

                       Kismet’s current manner of speech has wide appeal to those who have interacted with the
                       robot. There is sufficient variability in phoneme, accent, and end syntax choice to permit an
                       engaging proto-dialogue. If Kismet’s utterance has the intonation of a question, people will
                       treat it as such—often “re-stating” the question as an English sentence and then answering
                       it. If Kismet’s intonation has the intonation of a statement, they respond accordingly. They
                       may say something such as, “Oh, I see,” or perhaps issue another query such as, “So then
                       what did you do?” The utterances are complex enough to sound as if the robot is speaking
                       a different language.
                         Even so, the current utterance generation algorithm is really intended as a placeholder for
                       a more sophisticated generation algorithm. There is interest in computationally modeling
                       canonicalbabblingsothattherobotmakesvocalizationscharacteristicofaneight-month-old
                       child (de Boysson-Bardies, 1999). This would significantly limit the range of the utterances
                       the robot currently produces, but would facilitate the acquisition of proto-language. Kismet
                       varies many parameters at once, so the learning space is quite large. By modeling canonical
                       babbling, the robot can systematically explore how a limited set of parameters modulates
                       the way its voice sounds. Introducing variations upon a theme during vocal games with the
                       caregiver as well as on its own could simplify the learning process (see chapters 2 and 3).
                       By interfacing what the robot vocally generates with what it hears, the robot could begin
                       to explore its vocal capabilities, how to produce targeted effects, and how these utterances
                       influence the caregiver’s behavior.

                       Improvements to Lip Synchronization
                       Kismet’s lip synchronization and facial animation are compelling and well-matched to
                       Kismet’s behavior and appearance. The current implementation, however, could be im-
                       proved upon and extended in a couple of ways. First, the latencies throughout the system
   223   224   225   226   227   228   229   230   231   232   233