Page 99 - Socially Intelligent Agents Creating Relationships with Computers and Robots
P. 99
82 Socially Intelligent Agents
emotion (non-angry, non-happy, etc.) was 85–92%. The important question is
how to combine opinions of the experts to obtain the class of a given sample.
A simple and natural rule is to choose the class with the expert value closest to
1. This rule gives a total accuracy of about 60% for the 10-neuron architecture,
and about 53% for the 20-neuron architecture. Another approach to rule selec-
tion is to use the outputs of expert recognizers as input vectors for a new neural
network. In this case, we give the neural network the opportunity to learn itself
9
the most appropriate rule. The total accuracy we obtained was about 63%
for both 10- and 20-node architectures. The average accuracy for sadness was
rather high (∼76%). Unfortunately, the accuracy of expert recognizers was not
high enough to increase the overall accuracy of recognition.
4. Development
The following pieces of software were developed during the second stage:
ERG – Emotion Recognition Game; ER – Emotion Recognition Software for
call centers; and SpeakSoftly – a dialog emotion recognition program. The
first program was mostly developed to demonstrate the results of the above re-
search. The second software system is a full-fledged prototype of an industrial
solution for computerized call centers. The third program just adds a different
user interface to the core of the ER system. It was developed to demonstrate
real-time emotion recognition. Due to space constraints, only the second soft-
ware will be described here.
4.1 ER: Emotion Recognition Software For Call Centers
Goal. Our goal was to create an emotion recognition agent that can process
telephone quality voice messages (8 kHz/8 bit) and can be used as a part of a
decision support system for prioritizing voice messages and assigning a proper
agent to respond the message.
Recognizer. It was not a surprise that anger was identified as the most im-
portant emotion for call centers. Taking into account the importance of anger
and the scarcity of data for some other emotions, we decided to create a rec-
ognizer that can distinguish between two states: “agitation” which includes
anger, happiness and fear, and “calm” which includes normal state and sad-
ness. To create the recognizer we used a corpus of 56 telephone messages
of varying length (from 15 to 90 seconds) expressing mostly normal and an-
gry emotions that were recorded by eighteen non-professional actors. These
utterances were automatically split into 1–3 second chunks, which were then
evaluated and labeled by people. They were used for creating recognizers 10
using the methodology developed in the first study.