Page 105 - Designing Sociable Robots
P. 105
breazeal-79017 book March 18, 2002 14:54
86 Chapter 7
Voice as saliency marker This raises a related issue, which is the caregiver’s ability to
use their affective speech as a means of marking a particular event as salient. This implies
that the robot should only recognize a vocalization as having affective content in the cases
where the caregiver specifically intends to praise, prohibit, soothe, or get the attention of
the robot. The robot should be able to recognize neutral robot-directed speech, even if it
is somewhat tender or friendly in nature (as is often the case with motherese). For this
reason, the recognizer only categorizes sufficiently exaggerated prosody such as as praise,
prohibition, attention, and soothing (i.e., the caregiver has to say it as if she really means
it). Vocalizations with insufficient exaggeration are classified as neutral.
Acceptable versus unacceptable misclassification Given that humans are not perfect
at recognizing the affective content in speech, the robot is sure to make mistakes as well.
However,somefailuremodesaremoreacceptablethanothers.Forateachingtask,confusing
strongly valenced intent for neutrally valenced intent is better than confusing oppositely
valenced intents. For instance, confusing approval for an attentional bid, or prohibition for
neutral speech, is better than interpreting prohibition for praise. Ideally, the recognizer’s
failure modes will minimize these sorts of errors.
Expressive feedback Nonetheless, mistakes in communication will be made. This mo-
tivates the need for feedback from the robot back to the caregiver. Fundamentally, the
caregiver is trying to communicate his/her intent to the robot. The caregiver has no idea
whether or not the robot interpreted the intent correctly without some form of feedback. By
interfacing the output of the recognizer to Kismet’s emotional system, the robot’s ability to
express itself through facial expression, voice quality, and body posture conveys the robot’s
affective interpretation of the message. This allows people to reiterate themselves until they
believe they have been properly understood. It also enables the caregiver to reiterate the
message until the intent is communicated strongly enough (perhaps what the robot just did
was very good, and the robot should be really happy about it).
Speaker dependence versus independence An interesting question is whether the recog-
nizer should be speaker-dependent or speaker-independent. There are obviously advantages
and disadvantages to both, and the appropriate choice depends on the application. Typically,
it is easier to get higher recognition performance from a speaker-dependent system. In the
case of a personal robot, this is a good alternative since the robot should be personalized to
a particular human over time, not preferentially tuned to others. If the robot must interact
with a wide variety of people, then the speaker-independent system is preferable. The un-
derlying question in both cases is what level of performance is necessary for people to feel
that the robot is responsive and understands them well enough so that it is not challenging
or frustrating to communicate with it and train it.

