Page 101 - Socially Intelligent Agents Creating Relationships with Computers and Robots
P. 101

84                                             Socially Intelligent Agents

                               5. The top 14 features are: F0 maximum, F0 standard deviation, F0 range, F0 mean, BW1 mean, BW2
                             mean, energy standard deviation, speaking rate, F0 slope, F1 maximum, energy maximum, energy range,
                             F2 range, and F1 range.
                               6. The first set included the top 8 features (from F0 maximum to speaking rate), the second extended
                             the first by the next 2 features (F0 slope and F1 maximum), and the third included all 14 top features.
                               7. An ensemble consists of an odd number of neural network classifiers trained on different subsets.
                             The ensemble makes a decision based on the majority voting principle.
                               8. To train the experts, we used a two-layer backpropagation neural network architecture with a 8-
                             element input vector, 10 or 20 nodes in the hidden sigmoid layer and one node in the output linear layer.
                             We also used the same subsets of the s70 data set as training and test sets but with only two classes (for
                             example, angry – non-angry).
                               9. To explore this approach, we used a two-layer backpropagation neural network architecture with a
                             5-element input vector, 10 or 20 nodes in the hidden sigmoid layer and five nodes in the output linear layer.
                             We selected five of the best experts and generated several dozens neural network recognizers.
                               10. We created ensembles of 15 neural network recognizers for the 8-,10-, and 14-feature inputs and
                             the 10- and 20-node architectures. The average accuracy of the ensembles of recognizers lies in the range
                             73–77% and achieves its maximum ∼77% for the 8-feature input and 10-node architecture.

                             References

                              [1] R. Banse and K.R. Scherer. Acoustic profiles in vocal emotion expression. Journal of
                                 Personality and Social Psychology, 70: 614–636, 1996.
                              [2] R. van Bezooijen. The characteristics and recognizability of vocal expression of emo-
                                 tions. Foris, Drodrecht, The Netherlands, 1984.
                              [3] J.E. Cahn. Generation of Affect in Synthesized Speech. In Proc. 1989 Conference of
                                 the American Voice I/O Society, pages 251–256. Newport Beach, CA, September 11–13,
                                 1989.
                              [4] C. Darwin. The expression of the emotions in man and animals. University of Chicago
                                 Press, 1965 (Original work published in 1872).
                              [5] F. Dellaert, T. Polzin, and A. Waibel. Recognizing emotions in speech. In Proc. Intl. Conf.
                                 on Spoken Language Processing, pages 734–737. Philadelphia, PA, October 3–6, 1996.
                              [6] C. Elliot and J. Brzezinski. Autonomous Agents as Synthetic Characters. AI Magazine,
                                 19: 13–30, 1998.
                              [7] L. Hansen and P. Salomon. Neural Network Ensembles. IEEE Transactions on Pattern
                                 Analysis and Machine Intelligence. 12: 993–1001, 1990.
                              [8] I. Kononenko. Estimating attributes: Analysis and extension of RELIEF. In L. De Raedt
                                 and F. Bergadano, editors, Proc. European Conf. On Machine Learning (ECML’94),
                                 pages 171–182. Catania, Italy, April 6–8, 1994.
                              [9] I.R. Murray and J.L. Arnott. Toward the simulation of emotion in synthetic speech: A
                                 review of the literature on human vocal emotions. J. Acoust. Society of America, 93(2):
                                 1097–1108, 1993.
                             [10] R. Picard. Affective computing. MIT Press, Cambridge, MA, 1997.
                             [11] K.R. Scherer, R. Banse, H.G. Wallbott, and T. Goldbeck. Vocal clues in emotion encoding
                                 and decoding. Motivation and Emotion, 15: 123–148, 1991.
                             [12] N. Tosa and R. Nakatsu. Life-like communication agent: Emotion sensing character
                                 “MIC” and feeling session character “MUSE”. In Proc. Third IEEE Intl. Conf. on Multi-
                                 media Computing and Systems, pages 12–19. Hiroshima, Japan, June 17–23, 1996.
   96   97   98   99   100   101   102   103   104   105   106