Page 97 - Socially Intelligent Agents Creating Relationships with Computers and Robots
P. 97

80                                             Socially Intelligent Agents


                             Table 9.2.  Evaluators’ and Actors’ statistics.

                                            Evaluators’ statistics         Actors’ statistics
                             Category Mean  s.d.  Median  Min  Max  Mean  s.d.  Median  Min  Max

                             Normal   66.3  13.7  64.3   29.3  95.7  65.1  16.4  68.5   26.1  89.1
                             Happy    61.4  11.8  62.9   31.4  78.6  59.8  21.1  66.3   2.2   91.3
                              Angry   72.2  5.3   72.1   62.9  84.3  71.7  24.5  78.2   13.0  100
                              Sad     68.3  7.8   68.6   50.0  80.0  68.1  18.4  72.6   32.6  93.5
                              Afraid  49.5  13.3  51.4   22.1  68.6  49.7  18.6  48.9   17.4  88.0



                             the following number of items: s70: 369 utterances or 52.0% of the corpus;
                             s80: 257/36.7%; s90: 149/21.3%; s95: 94/13.4%; and s100: 55/7.9%. We
                             can see that only 7.9% of the utterances of the corpus were recognized by
                             all subjects, and this number lineally increases up to 52.7% for the data set
                             s70, which corresponds to the 70% level of concordance in decoding emotion
                             in speech. Distribution of utterances among emotion categories for the data
                             sets is close to a uniform distribution for s70 with ∼20% for normal state and
                             happiness, ∼25% for anger and sadness, and 10% for fear. But for data sets
                             with higher level of concordance anger begins to gradually dominate while the
                             proportion of the normal state, happiness and sadness decreases. Interestingly,
                             the proportion of fear stays approximately at the same level (∼7–10%) for
                             all data sets. The above analysis suggests that anger is easier to portray and
                             recognize because it is easier to come to a consensus about what anger is.

                             3.3     Feature Extraction

                               All studies in the field point to pitch (fundamental frequency) as the main
                             vocal cue for emotion recognition. Other acoustic variables contributing to
                             vocal emotion signaling are [1]: vocal energy, frequency spectral features, for-
                             mants (usually only one or two first formants (F1, F2) are considered), and
                             temporal features (speech rate and pausing). Another approach to feature ex-
                             traction is to enrich the set of features by considering some derivative features
                             such as LPCC (linear predictive coding cepstrum) parameters of signal [12] or
                             features of the smoothed pitch contour and its derivatives [5].
                               For our study we estimated the following acoustic variables: fundamental
                             frequency F0, energy, speaking rate, and first three formants (F1, F2, and F3)
                             and their bandwidths (BW1, BW2, and BW3), and calculated some descriptive
                                            3
                             statistics for them . Then we ranked the statistics using feature selection tech-
                             niques, and picked a set of most “important” features. We used the RELIEF-F
                                                                                        5
                                                           4
                             algorithm [8] for feature selection and identified 14 top features .To in-
                             vestigate how sets of features influence the accuracy of emotion recognition
                                                                                              6
                             algorithms we formed 3 nested sets of features based on their sum of ranks .
   92   93   94   95   96   97   98   99   100   101   102