Page 117 - Designing Sociable Robots
P. 117
breazeal-79017 book March 18, 2002 14:54
98 Chapter 7
7.6 Affective Human-Robot Communication
I have shown that the implemented classifier performs well on the primary caregivers’
utterances. Essentially, the classifier is trained to recognize the caregivers’ different prosodic
contours, which are shown to coincide with Fernald’s prototypical patterns. In order to
extend the use of the affective intent recognizer, I would like to evaluate the following issues:
• Will naive subjects speak to the robot in an exaggerated manner (in the same way as the
caregivers)? Will Kismet’s infant-like appearance urge the speakers to use motherese?
• If so, will the classifier be able to recognize the utterances, or will it be hindered by
variations in individual’s style of speaking or language?
• How will the speakers react to Kismet’s expressive feedback, and will the cues encourage
them to adjust their speech in a way they think that Kismet will understand?
Five female subjects, ranging from 23 to 54 years old, were asked to interact with Kismet
in different languages (English, Russian, French, German, and Indonesian). One of the
subjects was a caregiver of Kismet, who spoke to the robot in either English or Indonesian
for this experiment. Subjects were instructed to express each affective intent (approval,
attention, prohibition, and soothing) and signal when they felt that they had communicated
it to the robot. It was expected that many neutral utterances would be spoken during the
experiment. All sessions were recorded on video for further evaluations. (Note that similar
demonstrations to these experiments can be viewed in the first demonstration, “Recognition
of Affective Intent in Robot-Directed Speech,” on the included CD-ROM.)
Results
A set of 266 utterances were collected from the experiment sessions. Very long and empty
utterances (those containing no voiced segments) were not included. An objective observer
was asked to label these utterances and to rate them based on the perceived strength of their
affective message (except for neutral). As shown in the classification results (see table 7.6),
compared to the caregiver test set, the classifier performs almost as well on neutral, and
performs decently well on all the strong classes, except for soothing and attentional bids.
As expected, the performance reduces as the perceived strength of the utterance decreases.
A closer look at the misclassified soothing utterances showed that a high number of
utterances were actually soft approvals. The pitch contours contained a rise-fall segment,
but the energy level was low. A linear fit on these contours generates a flat slope, resulting
in a neutral classification. A few soothing utterances were confused for neutral despite
having the down-sweep frequency characteristic because they contained too many words
and coarse pitch contours. Attentional bids generated the worst classification performance

