Page 117 - Designing Sociable Robots
P. 117

breazeal-79017  book  March 18, 2002  14:54





                       98                                                               Chapter 7





                       7.6  Affective Human-Robot Communication

                       I have shown that the implemented classifier performs well on the primary caregivers’
                       utterances. Essentially, the classifier is trained to recognize the caregivers’ different prosodic
                       contours, which are shown to coincide with Fernald’s prototypical patterns. In order to
                       extend the use of the affective intent recognizer, I would like to evaluate the following issues:

                       •  Will naive subjects speak to the robot in an exaggerated manner (in the same way as the
                       caregivers)? Will Kismet’s infant-like appearance urge the speakers to use motherese?

                       •  If so, will the classifier be able to recognize the utterances, or will it be hindered by
                       variations in individual’s style of speaking or language?
                       •  How will the speakers react to Kismet’s expressive feedback, and will the cues encourage
                       them to adjust their speech in a way they think that Kismet will understand?

                         Five female subjects, ranging from 23 to 54 years old, were asked to interact with Kismet
                       in different languages (English, Russian, French, German, and Indonesian). One of the
                       subjects was a caregiver of Kismet, who spoke to the robot in either English or Indonesian
                       for this experiment. Subjects were instructed to express each affective intent (approval,
                       attention, prohibition, and soothing) and signal when they felt that they had communicated
                       it to the robot. It was expected that many neutral utterances would be spoken during the
                       experiment. All sessions were recorded on video for further evaluations. (Note that similar
                       demonstrations to these experiments can be viewed in the first demonstration, “Recognition
                       of Affective Intent in Robot-Directed Speech,” on the included CD-ROM.)
                       Results

                       A set of 266 utterances were collected from the experiment sessions. Very long and empty
                       utterances (those containing no voiced segments) were not included. An objective observer
                       was asked to label these utterances and to rate them based on the perceived strength of their
                       affective message (except for neutral). As shown in the classification results (see table 7.6),
                       compared to the caregiver test set, the classifier performs almost as well on neutral, and
                       performs decently well on all the strong classes, except for soothing and attentional bids.
                       As expected, the performance reduces as the perceived strength of the utterance decreases.
                         A closer look at the misclassified soothing utterances showed that a high number of
                       utterances were actually soft approvals. The pitch contours contained a rise-fall segment,
                       but the energy level was low. A linear fit on these contours generates a flat slope, resulting
                       in a neutral classification. A few soothing utterances were confused for neutral despite
                       having the down-sweep frequency characteristic because they contained too many words
                       and coarse pitch contours. Attentional bids generated the worst classification performance
   112   113   114   115   116   117   118   119   120   121   122