Page 96 - Socially Intelligent Agents Creating Relationships with Computers and Robots
P. 96

Emotion Recognition Agents for Speech Signal                      79

                              ing emotional states: happiness, anger, sadness, fear and normal (unemotional)
                              state. Five subjects recorded the sentences twice with different recording pa-
                              rameters. Thus, each subject recorded 20 or 40 utterances, yielding a corpus
                                             1
                              of 700 utterances , with 140 utterances per emotional state.
                              3.2     People Performance And Data Selection
                                We designed an experiment to answer the following questions: How well
                              can people without special training portray and recognize emotions in speech?
                              Which kinds of emotions are easier/harder to recognize?
                                We implemented an interactive program that selected and played back the
                              utterances in random order and allowed a user to classify each utterance ac-
                              cording to its emotional content. Twenty-three subjects took part in the eval-
                              uation stage, twenty of whom had participated in the recording stage earlier.
                                                                          2
                              Table 9.1 shows the performance confusion matrix . We can see that the most
                              easily recognizable category is anger (72.2%) and the least easily recognizable
                              category is fear (49.5%). A lot of confusion is going on between sadness and
                              fear, sadness and unemotional state, and happiness and fear. The mean accu-
                              racy is 63.5%, showing agreement with other experimental studies [11, 2].



                              Table 9.1.  Performance Confusion Matrix.

                                        Category  Normal  Happy  Angry  Sad  Afraid  Total
                                         Normal   66.3    2.5    7.0   18.2   6.0  100%
                                         Happy    11.9    61.4   10.1  4.1   12.5  100%
                                         Angry    10.6    5.2    72.2  5.6    6.3  100%
                                          Sad     11.8    1.0    4.7   68.3  14.3  100%
                                         Afraid   11.8    9.4    5.1   24.2  49.5  100%



                                The left half of Table 9.2 shows statistics for evaluators for each emotion
                              category. We can see that the variance for anger and sadness is significantly
                              less than for the other emotion categories. This means that people better under-
                              stand how to express/decode anger and sadness than other emotions. The right
                              half of Table 9.2 shows statistics for “actors”, i.e., how well subjects portray
                              emotions. Comparing the left and right parts of Table 9.2, it is interesting to see
                              that the ability to portray emotions (total mean is 62.9%) stays approximately
                              at the same level as the ability to recognize emotions (total mean is 63.2%),
                              but the variance for portraying is much larger.
                                From the corpus of 700 utterances we selected five nested data sets which
                              include utterances that were recognized as portraying the given emotion by
                              at least p per cent of the subjects (with p = 70, 80, 90, 95, and 100%). We
                              will refer to these data sets as s70, s80, s90, s95,and s100. The sets contain
   91   92   93   94   95   96   97   98   99   100   101