Page 172 - Designing Sociable Robots
P. 172
breazeal-79017 book March 18, 2002 14:7
The Behavior System 153
and begins to search for a face, which it re-acquires when the caregiver returns (t ≈ 42).
Eventually, the robot habituates to the interaction with the caregiver and begins to attend
to a toy that the caregiver has provided (60 < t < 75). While interacting with the toy, the
robot displays interest and moves its eyes to follow the moving toy. Kismet soon habituates
to this stimulus and returns to its play-dialogue with the caregiver (75 < t < 100). A final
disengagement phase occurs (t ≈ 100) when the robot’s attention shifts back to the toy.
Regulating Vocal Exchanges
Kismet employs different social cues to regulate the rate of vocal exchanges. These in-
clude both eye movements as well as postural and facial displays. These cues encourage
the subjects to slow down and shorten their speech. This benefits the auditory processing
capabilities of the robot.
To investigate Kismet’s performance in engaging people in proto-dialogues, I invited
three naive subjects to interact with Kismet. They ranged in age from 25 to 28 years of
age. There were one male and two females, all professionals. They were asked simply to
talk to the robot. Their interactions were videorecorded for further analysis. (Similar video
interactions can be viewed on the accompanying CD-ROM.)
Often the subjects begin the session by speaking longer phrases and only using the
robot’s vocal behavior to gauge their speaking turn. They also expect the robot to respond
immediately after they finish talking. Within the first couple of exchanges, they may notice
that the robot interrupts them, and they begin to adapt to Kismet’s rate. They start to use
shorter phrases, wait longer for the robot to respond, and more carefully watch the robot’s
turn-taking cues. The robot prompts the other for his/her turn by craning its neck forward,
raising its brows, and looking at the person’s face when it’s ready for him/her to speak. It
will hold this posture for a few seconds until the person responds. Often, within a second
of this display, the subject does so. The robot then leans back to a neutral posture, assumes
a neutral expression, and tends to shift its gaze away from the person. This cue indicates
that the robot is about to speak. The robot typically issues one utterance, but it may issue
several. Nonetheless, as the exchange proceeds, the subjects tend to wait until prompted.
Before the subjects adapt their behavior to the robot’s capabilities, the robot is more likely
to interrupt them. There tends to be more frequent delays in the flow of “conversation,” where
the human prompts the robot again for a response. Often these “hiccups” in the flow appear
in short clusters of mutual interruptions and pauses (often over two to four speaking turns)
before the turns become coordinated and the flow smoothes out. By analyzing the video of
these human-robot “conversations,” there is evidence that people entrain to the robot (see
table 9.1). These “hiccups” become less frequent. The human and robot are able to carry
on longer sequences of clean turn transitions. At this point the rate of vocal exchange is
well-matched to the robot’s perceptual limitations. The vocal exchange is reasonably fluid.

