Page 224 - Designing Sociable Robots
P. 224

breazeal-79017  book  March 18, 2002  14:16





                       Expressive Vocalization System                                       205





                       closed when either a m, p, or b is spoken or there is silence. This may not necessarily be
                       the case if only energy were used.
                         Upon receiving the phoneme and energy information from the vocalization system, the
                       QNX vocal communication process passes this information to the motor skill system via the
                       DPRAM. The motor skill system converts the energy information into a measure of facial
                       emphasis (linearly scaling the energy), which is then passed onto the lip synchronization
                       and facial animation processes of the face control motor system. The motor skill system
                       also maps the phoneme information onto lip postures and passes this information to the lip
                       synchronization and facial animation processes of the motor system that controls the face
                       (described in chapter 10). Figure 11.4 illustrates the stages of computation from the raw
                       speech signal to lip posture, jaw opening, and facial emphasis.


                                           speech data: "Why do you think that"
                       50

                       0

                       50
                        0      2000     4000    6000    8000    10000    12000   14000
                                                    energy
                       20

                       0
                       20
                        0       2000    4000     6000     8000    10000    12000
                                               lip posture and phoneme

                       60        d                               k          t
                                                    th              dh
                       40                                      nx
                         w               yx                                     ix
                       20  ay        uw     uw           ih         ae
                       0
                        0       2000    4000     6000     8000    10000    12000
                                                  facial emphasis
                       80
                       60
                       40
                       20
                       0
                        0       2000    4000     6000     8000    10000    12000
                       Figure 11.4
                       Plot of speech signal, energy, phonemes/lip posture, and facial emphasis for the phrase “Why do you think that?”
                       Time is in 0.1 ms increments. The total amount of time to vocalize the phrase is 1.4 sec.
   219   220   221   222   223   224   225   226   227   228   229