Page 223 - Designing Sociable Robots
P. 223

breazeal-79017  book  March 18, 2002  14:16





                       204                                                             Chapter 11





                       Guidelines from Animation

                       The earliest examples of lip synchronization for animated characters dates back to the
                       1940’s in classical animation (Blair, 1949), and back to the 1970s for computer-animated
                       characters (Parke, 1972). In these early works, all of the lip animation was crafted by hand
                       (a very time-consuming process). Over time, a set of guidelines evolved that are largely
                       adhered to by animation artists today (Madsen, 1969).
                         According to Madsen, simplicity is the secret to successful lip animation. Extreme ac-
                       curacy for cartoon animation often looks forced or unnatural. Thus, the goal in animation
                       is not to always imitate realistic lip motions, but to create a visual shorthand that passes
                       unchallenged by the viewer (Madsen, 1969). As the realism of the character increases,
                       however, the accuracy of the lip synchronization follows.
                         Kismet is a fanciful and cartoon-like character, so the guidelines for cartoon animation
                       apply. In this case, the guidelines suggest that the animator focus on vowel lip motions
                       (especially o and w) accented with consonant postures (m, b, p) for lip closing. Precision
                       of these consonants gives credibility to the generalized patterns of vowels. The transitions
                       between vowels and consonants should be reasonable approximations of lip and jaw move-
                       ment. Fortunately, more latitude is granted for more fanciful characters. The mechanical
                       response time of Kismet’s lip and jaw motors places strict constraints on how fast the lips
                       and jaw can transition from posture to posture. Madsen also stresses that care must be taken
                       in conveying emotion, as the expression of voice and face can change dramatically.
                       Extracting Lip Synch Info

                       To implement lip synchronization on Kismet, a variety of information must be computed
                       in real-time from the speech signal. By placing DECtalk in memory mode and issuing the
                       command string (utterance with synthesizer settings), the DECtalk software generates the
                       speech waveform and writes it to memory (a 11.025 kHz waveform). In addition, DECtalk
                       extracts time-stamped phoneme information. From the speech waveform, one can compute
                       its time-varying energy over a window size of 335 samples, taking care to synchronize
                       the phoneme and energy information, and send (phoneme[t], energy[t]) pairs to the QNX
                       machine at 33 Hz to coordinate jaw and lip motor control. A similar technique using
                       DECtalk’s phoneme extraction capability is reported by Waters and Levergood (1993) for
                       real-time lip synchronization for computer-generated facial animation.
                         To control the jaw, the QNX machine receives the phoneme and energy information and
                       updates the commanded jaw position at 10 Hz. The mapping from energy to jaw opening is
                       linear, bounded within a range where the minimum position corresponds to a closed mouth,
                       and the maximum position corresponds to an open mouth characteristic of surprise. Using
                       only energy to control jaw position produces a lively effect but has its limitations (Parke &
                       Waters, 1996). For Kismet, the phoneme information is used to make sure that the jaw is
   218   219   220   221   222   223   224   225   226   227   228