Page 225 - Designing Sociable Robots
P. 225
breazeal-79017 book March 18, 2002 14:16
206 Chapter 11
energy &
phoneme poll at 40 Hz
NT delay < 1 ms
1 ms 100 ms
DECtalk QNX L
speech DPRAM motor skill
jaw ctrl
synthesizer system
energy &
speech 250 ms delay phoneme
signal latency 250 ms
sound jaw emphasis &
card motor < 1 ms DPRAM lip posture
speaker L
face control
system
lips & face
motors
Figure 11.5
Schematic of the flow of information for lip synchronization. This figure illustrates the latencies of the system and
the compensatory delays to maintain synchrony.
The computer network involved in lip synchronization is a bit convoluted, but supports
real-time performance. Figure 11.5 illustrates the information flow through the system and
denotes latencies. Within the NT machine, there is a latency of approximately 250 ms from
the time the synthesizer generates the speech signal and extracts phoneme information
until that speech signal is sent to the sound card. Immediately following the generation
and feature extraction phase, the NT machine sends this information to the QNX node that
controls the jaw motor. The latency of this stage is less than 1 ms. Within QNX, the energy
signal and phoneme information are used to compute the jaw position. To synchronize jaw
movement with sound production from the sound card, the jaw command position is delayed
by250ms.Forthesamereason,theQNXmachinedelaysthetransferofenergyandphoneme
information by 100 ms to the L-based machines. Dual-ported RAM communication is sub-
millisecond. The lip synchronization processes running on L polls and updates their energy
and phoneme values at 40 Hz, much faster than the phoneme information is changing
and much faster than the actuators can respond. Energy is scaled to control the amount
of facial emphasis, and the phonemes are mapped to lip postures. The lip synchronization
performance is well-coordinated with speech output since the delays and latencies are fairly
consistent.

