Page 120 - Designing Sociable Robots
P. 120
breazeal-79017 book March 18, 2002 14:54
The Auditory System 101
prohibit Kismet with a lower and lower voice until Kismet eventually frowned. Only then
did the subject stop her prohibitions.
During course of the interaction, several interesting dynamic social phenomena arose.
Often these occurred in the context of prohibiting the robot. For instance, several of the
subjects reported experiencing a very strong emotional response immediately after “suc-
cessfully” prohibiting the robot. In these cases, the robot’s saddened face and body posture
was enough to arouse a strong sense of empathy. The subject would often immediately
stop and look to the experimenter with an anguished expression on her face, claiming
to feel “terrible” or “guilty.” Subjects were often very apologetic throughout their prohi-
bition session. In this “emotional” feedback cycle, the robot’s own affective response to
the subject’s vocalizations evoked a strong and similar emotional response in the subject
as well.
Another interesting social dynamic I observed involved affective mirroring between robot
and human. In this situation, the subject might first issue a medium-strength prohibition to
the robot, which causes it to dip its head. The subject responds by lowering her own head
and reiterating the prohibition, this time a bit more foreboding. This causes the robot to dip
its head even further and look more dejected. The cycle continues to increase in intensity
until it bottoms out with both subject and robot having dramatic body postures and facial
expressions that mirror the other. This technique was employed to modulate the degree to
which the strength of the message was “communicated” to the robot.
7.7 Limitations and Extensions
The ability of naive subjects to interact with Kismet in this affective and dynamic manner
suggests that its response rate is acceptable. The timing delays in the system can and should
be improved, however. There is about a 500 ms delay from the time speech ends to receiving
an output from the classifier. Much of this delay is due to the underlying speech recognition
system, where there is a trade-off between shipping out the speech features to the NT
machine immediately after a pause in speech, and waiting long enough during that pause to
make sure that speech has completed. There is another delay of approximately one second
associated with interpreting the classifier in affective terms and feeding it through to an
emotional response. The subject will typically issue one to three short utterances during
this time (of a consistent affective content). It is interesting that people rarely seem to
issue just one short utterance and wait for a response. Instead, they prefer to communicate
affective meanings in a sequence of a few closely related utterances (“That’s right, Kismet.
Very good! Good robot!”). In practice, people do not seem to be bothered by or notice the
delay. The majority of delays involve waiting for a sufficiently strong vocalization to be
spoken, since only these are recognized by the system.

