Page 101 - Socially Intelligent Agents Creating Relationships with Computers and Robots

P. 101

84 Socially Intelligent Agents

5. The top 14 features are: F0 maximum, F0 standard deviation, F0 range, F0 mean, BW1 mean, BW2
mean, energy standard deviation, speaking rate, F0 slope, F1 maximum, energy maximum, energy range,
F2 range, and F1 range.
6. The ﬁrst set included the top 8 features (from F0 maximum to speaking rate), the second extended
the ﬁrst by the next 2 features (F0 slope and F1 maximum), and the third included all 14 top features.
7. An ensemble consists of an odd number of neural network classiﬁers trained on different subsets.
The ensemble makes a decision based on the majority voting principle.
8. To train the experts, we used a two-layer backpropagation neural network architecture with a 8-
element input vector, 10 or 20 nodes in the hidden sigmoid layer and one node in the output linear layer.
We also used the same subsets of the s70 data set as training and test sets but with only two classes (for
example, angry – non-angry).
9. To explore this approach, we used a two-layer backpropagation neural network architecture with a
5-element input vector, 10 or 20 nodes in the hidden sigmoid layer and ﬁve nodes in the output linear layer.
We selected ﬁve of the best experts and generated several dozens neural network recognizers.
10. We created ensembles of 15 neural network recognizers for the 8-,10-, and 14-feature inputs and
the 10- and 20-node architectures. The average accuracy of the ensembles of recognizers lies in the range
73–77% and achieves its maximum ∼77% for the 8-feature input and 10-node architecture.

References

[1] R. Banse and K.R. Scherer. Acoustic proﬁles in vocal emotion expression. Journal of
Personality and Social Psychology, 70: 614–636, 1996.
[2] R. van Bezooijen. The characteristics and recognizability of vocal expression of emo-
tions. Foris, Drodrecht, The Netherlands, 1984.
[3] J.E. Cahn. Generation of Affect in Synthesized Speech. In Proc. 1989 Conference of
the American Voice I/O Society, pages 251–256. Newport Beach, CA, September 11–13,
1989.
[4] C. Darwin. The expression of the emotions in man and animals. University of Chicago
Press, 1965 (Original work published in 1872).
[5] F. Dellaert, T. Polzin, and A. Waibel. Recognizing emotions in speech. In Proc. Intl. Conf.
on Spoken Language Processing, pages 734–737. Philadelphia, PA, October 3–6, 1996.
[6] C. Elliot and J. Brzezinski. Autonomous Agents as Synthetic Characters. AI Magazine,
19: 13–30, 1998.
[7] L. Hansen and P. Salomon. Neural Network Ensembles. IEEE Transactions on Pattern
Analysis and Machine Intelligence. 12: 993–1001, 1990.
[8] I. Kononenko. Estimating attributes: Analysis and extension of RELIEF. In L. De Raedt
and F. Bergadano, editors, Proc. European Conf. On Machine Learning (ECML’94),
pages 171–182. Catania, Italy, April 6–8, 1994.
[9] I.R. Murray and J.L. Arnott. Toward the simulation of emotion in synthetic speech: A
review of the literature on human vocal emotions. J. Acoust. Society of America, 93(2):
1097–1108, 1993.
[10] R. Picard. Affective computing. MIT Press, Cambridge, MA, 1997.
[11] K.R. Scherer, R. Banse, H.G. Wallbott, and T. Goldbeck. Vocal clues in emotion encoding
and decoding. Motivation and Emotion, 15: 123–148, 1991.
[12] N. Tosa and R. Nakatsu. Life-like communication agent: Emotion sensing character
“MIC” and feeling session character “MUSE”. In Proc. Third IEEE Intl. Conf. on Multi-
media Computing and Systems, pages 12–19. Hiroshima, Japan, June 17–23, 1996.

96 97 98 99 100 101 102 103 104 105 106