Page 105 - Designing Sociable Robots
P. 105

breazeal-79017  book  March 18, 2002  14:54





                       86                                                               Chapter 7





                       Voice as saliency marker  This raises a related issue, which is the caregiver’s ability to
                       use their affective speech as a means of marking a particular event as salient. This implies
                       that the robot should only recognize a vocalization as having affective content in the cases
                       where the caregiver specifically intends to praise, prohibit, soothe, or get the attention of
                       the robot. The robot should be able to recognize neutral robot-directed speech, even if it
                       is somewhat tender or friendly in nature (as is often the case with motherese). For this
                       reason, the recognizer only categorizes sufficiently exaggerated prosody such as as praise,
                       prohibition, attention, and soothing (i.e., the caregiver has to say it as if she really means
                       it). Vocalizations with insufficient exaggeration are classified as neutral.
                       Acceptable versus unacceptable misclassification  Given that humans are not perfect
                       at recognizing the affective content in speech, the robot is sure to make mistakes as well.
                       However,somefailuremodesaremoreacceptablethanothers.Forateachingtask,confusing
                       strongly valenced intent for neutrally valenced intent is better than confusing oppositely
                       valenced intents. For instance, confusing approval for an attentional bid, or prohibition for
                       neutral speech, is better than interpreting prohibition for praise. Ideally, the recognizer’s
                       failure modes will minimize these sorts of errors.

                       Expressive feedback  Nonetheless, mistakes in communication will be made. This mo-
                       tivates the need for feedback from the robot back to the caregiver. Fundamentally, the
                       caregiver is trying to communicate his/her intent to the robot. The caregiver has no idea
                       whether or not the robot interpreted the intent correctly without some form of feedback. By
                       interfacing the output of the recognizer to Kismet’s emotional system, the robot’s ability to
                       express itself through facial expression, voice quality, and body posture conveys the robot’s
                       affective interpretation of the message. This allows people to reiterate themselves until they
                       believe they have been properly understood. It also enables the caregiver to reiterate the
                       message until the intent is communicated strongly enough (perhaps what the robot just did
                       was very good, and the robot should be really happy about it).

                       Speaker dependence versus independence  An interesting question is whether the recog-
                       nizer should be speaker-dependent or speaker-independent. There are obviously advantages
                       and disadvantages to both, and the appropriate choice depends on the application. Typically,
                       it is easier to get higher recognition performance from a speaker-dependent system. In the
                       case of a personal robot, this is a good alternative since the robot should be personalized to
                       a particular human over time, not preferentially tuned to others. If the robot must interact
                       with a wide variety of people, then the speaker-independent system is preferable. The un-
                       derlying question in both cases is what level of performance is necessary for people to feel
                       that the robot is responsive and understands them well enough so that it is not challenging
                       or frustrating to communicate with it and train it.
   100   101   102   103   104   105   106   107   108   109   110