Page 225 - Designing Sociable Robots
P. 225

breazeal-79017  book  March 18, 2002  14:16





                       206                                                             Chapter 11





                                           energy &
                                           phoneme                poll at 40 Hz
                             NT                    delay  < 1 ms
                                     1 ms          100 ms
                           DECtalk         QNX                      L
                           speech                       DPRAM    motor skill
                                           jaw ctrl
                          synthesizer                             system
                                                       energy &
                        speech  250 ms         delay   phoneme
                        signal  latency       250 ms
                            sound           jaw                         emphasis &
                            card           motor             < 1 ms  DPRAM  lip posture


                           speaker                                  L
                                                                 face control
                                                                  system


                                                                 lips & face
                                                                  motors
                       Figure 11.5
                       Schematic of the flow of information for lip synchronization. This figure illustrates the latencies of the system and
                       the compensatory delays to maintain synchrony.


                         The computer network involved in lip synchronization is a bit convoluted, but supports
                       real-time performance. Figure 11.5 illustrates the information flow through the system and
                       denotes latencies. Within the NT machine, there is a latency of approximately 250 ms from
                       the time the synthesizer generates the speech signal and extracts phoneme information
                       until that speech signal is sent to the sound card. Immediately following the generation
                       and feature extraction phase, the NT machine sends this information to the QNX node that
                       controls the jaw motor. The latency of this stage is less than 1 ms. Within QNX, the energy
                       signal and phoneme information are used to compute the jaw position. To synchronize jaw
                       movement with sound production from the sound card, the jaw command position is delayed
                       by250ms.Forthesamereason,theQNXmachinedelaysthetransferofenergyandphoneme
                       information by 100 ms to the L-based machines. Dual-ported RAM communication is sub-
                       millisecond. The lip synchronization processes running on L polls and updates their energy
                       and phoneme values at 40 Hz, much faster than the phoneme information is changing
                       and much faster than the actuators can respond. Energy is scaled to control the amount
                       of facial emphasis, and the phonemes are mapped to lip postures. The lip synchronization
                       performance is well-coordinated with speech output since the delays and latencies are fairly
                       consistent.
   220   221   222   223   224   225   226   227   228   229   230