Page 98 - Socially Intelligent Agents Creating Relationships with Computers and Robots
P. 98

Emotion Recognition Agents for Speech Signal                      81

                              3.4     Computer Recognition

                                To recognize emotions in speech we tried the following approaches: K-
                              nearest neighbors, neural networks, ensembles of neural network classifiers,
                              and set of experts. In general, the approach that is based on ensembles of
                              neural network recognizers outperformed the others, and it was chosen for
                              implementation at the next stage. We summarize below the results obtained
                              with the different techniques.

                              K-nearest neighbors.    We used 70% of the s70 data set as database of
                              cases for comparison and 30% as test set. We ran the algorithm for K = 1
                              to 15 and for number of features 8, 10, and 14. The best average accuracy of
                              recognition (∼55%) can be reached using 8 features, but the average accuracy
                              for anger is much higher (∼65%) for 10- and 14-feature sets. All recognizers
                              performed very poor for fear (about 5–10%).

                              Neural networks.    We used a two-layer backpropagation neural network
                              architecture with a 8-, 10- or 14-element input vector, 10 or 20 nodes in the
                              hidden sigmoid layer and five nodes in the output linear layer. To train and
                              test our algorithms we used the data sets s70, s80 and s90, randomly split into
                              training (70% of utterances) and test (30%) subsets. We created several neural
                              network classifiers trained with different initial weight matrices. This approach
                              applied to the s70 data set and the 8-feature set gave an average accuracy of
                              about 65% with the following distribution for emotion categories: normal state
                              is 55–65%, happiness is 60–70%, anger is 60–80%, sadness is 60–70%, and
                              fear is 25–50%.

                                                                                       7
                              Ensembles of neural network classifiers.  We used ensemble sizes from
                              7 to 15 classifiers. Results for ensembles of 15 neural networks, the s70 data
                              set, all three sets of features, and both neural network architectures (10 and 20
                              neurons in the hidden layer) were the following. The accuracy for happiness
                              remained the same (∼65%) for the different sets of features and architectures.
                              The accuracy for fear was relatively low (35–53%). The accuracy for anger
                              started at 73% for the 8-feature set and increased to 81% for the 14-feature set.
                              The accuracy for sadness varied from 73% to 83% and achieved its maximum
                              for the 10-feature set. The average total accuracy was about 70%.

                              Set of experts.   This approach is based on the following idea. Instead of
                              training a neural network to recognize all emotions, we can train a set of spe-
                                             8
                              cialists or experts that can recognize only one emotion and then combine their
                              results to classify a given sample. The average accuracy of emotion recogni-
                              tion for this approach was about 70% except for fear, which was ∼44% for the
                              10-neuron, and ∼56% for the 20-neuron architecture. The accuracy of non-
   93   94   95   96   97   98   99   100   101   102   103