Page 95 - Socially Intelligent Agents Creating Relationships with Computers and Robots

P. 95

78 Socially Intelligent Agents

and suggested theories (reviews of about 60 years of research can be found in
[2, 11]). On the other hand, AI researchers have made contributions in the fol-
lowing areas: emotional speech synthesis [3, 9], recognition of emotions [5],
and using agents for decoding and expressing emotions [12].

2. Motivation
The project is motivated by the question of how recognition of emotions
in speech could be used for business. A potential application is the detection
of the emotional state in telephone call center conversations, and providing
feedback to an operator or a supervisor for monitoring purposes. Another ap-
plication is sorting voice mail messages according to the emotions expressed
by the caller.
Given this orientation, for this study we solicited data from people who are
not professional actors or actresses. We have focused on negative emotions like
anger, sadness and fear. We have targeted telephone quality speech (less than
3.4 kHz) and relied on voice signal only. This means that we have excluded
modern speech recognition techniques. There are several reasons to do this.
First, in speech recognition emotions are considered as noise that decreases
the accuracy of recognition. Second, although it is true that some words and
phrases are correlated with particular emotions, the situation usually is much
more complex and the same word or phrase can express the whole spectrum of
emotions. Third, speech recognition techniques require much better quality of
signal and computational power.
To achieve our objectives we decided to proceed in two stages: research and
development. The objectives of the ﬁrst stage are to learn how well people rec-
ognize emotions in speech, to ﬁnd out which features of speech signal could
be useful for emotion recognition, and to explore different mathematical mod-
els for creating reliable recognizers. The second stage objective is to create a
real-time recognizer for call center applications.

3. Research

For the ﬁrst stage we had to create and evaluate a corpus of emotional data,
evaluate the performance of people, and select data for machine learning. We
decided to use high quality speech data for this stage.

3.1 Corpus of Emotional Data

We asked thirty of our colleagues to record the following four short sen-
tences: “This is not what I expected”, “I’ll be right there”, “Tomorrow is my
birthday”, and “I’m getting married next week.” Each sentence was recorded
by every subject ﬁve times; each time, the subject portrayed one of the follow-

90 91 92 93 94 95 96 97 98 99 100