Page 75 - Designing Sociable Robots
P. 75
breazeal-79017 book March 18, 2002 14:1
56 Chapter 5
lip actuators, two for the upper lip corners and two for the lower lip corners. Each actuator
moves a lip corner either up (to smile), or down (to frown). There is also a single degree of
freedom jaw that is driven by a high performance DC servo motor from the MEI card. This
level of performance is important for real-time lip synchronization with speech.
The face control software runs on a Motorola 68332 node running L. This processor is
responsible for arbitrating between facial expression, real-time lip synchronization, com-
municative social displays, as well as behavioral responses. It communicates to other 68332
nodes through a 16 KByte dual-ported RAM (DPRAM).
High-Level Perception, Behavior, Motivation, and Motor Skills
The high-level perception system, the behavior system, the motivation system, and the
motor skills system run on the network of Motorola 68332 micro-controllers. Each of
these systems communicates with the others by using threads if they are implemented on
the same processor, or via DPRAM communication if implemented on different processors.
Currently, each 68332 node can hook up to at most eight DPRAMs. Another single DPRAM
tethers the 68332 network to the network of PC machines via a QNX node.
The Vocalization System
The robot’s vocalization capabilities are generated through an articulatory synthesizer. The
software, DECtalk v4.5 sold by Digital Equipment Corporation, is based on the Klatt artic-
ulation synthesizer and runs on a PC under Windows NT with a Creative Labs sound card.
The parameters of the model are based on the physiological characteristics of the human ar-
ticulatory tract. Although typically used as a text-to-speech system, it was chosen over other
systems because it gives the user low-level control over the vocalizations through physio-
logically based parameter settings. These parameters make it possible to convey affective
information through vocalizations (Cahn, 1990), and to convey personality by designing a
custom voice for the robot. As such, Kismet’s voice is that of a young child. The system
also has the ability to play back files in a .wav format, so the robot could in principle
produce infant-like vocalizations (laughter, coos, gurgles, etc.) that the synthesizer itself
cannot generate.
Instead of relying on written text as an interface to the synthesizer, the software can accept
strings of phonemes along with commands to specify the pitch and timing of the utterance.
Hence,Kismet’svocalizationsystemgeneratesbothphonemestringsandcommandsettings,
andsaystheminnearreal-time.Thesynthesizeralsoextractsphonemeandpitchinformation
that are used to coordinate real-time lip synchronization. Ultimately, this capability would
permit the robot to play and experiment with its own vocal tract, and to learn the effect
these vocalizations have on human behavior. Kismet’s voice is one of the most versatile

