ABSTRACT
We introduce the openSMILE feature extraction toolkit, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities. Audio low-level descriptors such as CHROMA and CENS features, loudness, Mel-frequency cepstral coefficients, perceptual linear predictive cepstral coefficients, linear predictive coefficients, line spectral frequencies, fundamental frequency, and formant frequencies are supported. Delta regression and various statistical functionals can be applied to the low-level descriptors. openSMILE is implemented in C++ with no third-party dependencies for the core functionality. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. It supports on-line incremental processing for all implemented features as well as off-line and batch processing. Numeric compatibility with future versions is ensured by means of unit tests. openSMILE can be downloaded from http://opensmile.sourceforge.net/.
- X. Amatriain, P. Arumi, and D. Garcia. A framework for efficient and rapid development of cross-platform audio applications. Multimedia Systems, 14(1):15--32, June 2008.Google ScholarDigital Library
- A. Batliner, S. Steidl, B. Schuller, D. Seppi, K. Laskowski, T. Vogt, L. Devillers, L. Vidrascu, N. Amir, L. Kessous, and V. Aharonson. Combining efforts for improving automatic classification of emotional user states. In T. Erjavec and J. Gros, editors, Language Technologies, IS-LTC 2006, pages 240--245. Informacijska Druzba, 2006.Google Scholar
- P. Boersma and D. Weenink. Praat: doing phonetics by computer (v. 4.3.14). http://www.praat.org/, 2005.Google Scholar
- F. Eyben, M. Wöllmer, A. Graves, B. Schuller, E. Douglas-Cowie, and R. Cowie. On-line emotion recognition in a 3-d activation-valence-time continuum using acoustic and linguistic cues. Journal on Multimodal User Interfaces, 3(1-2):7--19, Mar. 2010.Google ScholarCross Ref
- F. Eyben, M. Wöllmer, and B. Schuller. openEAR - introducing the munich open-source emotion and affect recognition toolkit. In Proc. of ACII 2009, volume I, pages 576--581. IEEE, 2009.Google ScholarCross Ref
- R. Fernandez. A Computational Model for the Automatic Recognition of Affect in Speech. PhD thesis, MIT Media Arts and Science, Feb. 2004. Google ScholarDigital Library
- P. N. Garner, J. Dines, T. Hain, A. El Hannani, M. Karafiat, D. Korchagin, M. Lincoln, V. Wan, and L. Zhang. Real-time asr from meetings. In Proc. of INTERSPEECH 2009, Brighton, UK. ISCA, 2009.Google Scholar
- A. Lerch and G. Eisenberg. FEAPI: a low level feature extraction plug-in api. In Proc. of the 8th International Conference on Digital Audio Effects (DAFx), Madrid, Spain, 2005.Google Scholar
- D. McEnnis, C. McKay, I. Fujinaga, and P. Depalle. jaudio: A feature extraction library. In Proc. of ISMIR 2005, pages 600--603, 2005.Google Scholar
- I. Mporas and T. Ganchev. Estimation of unknown speaker's height from speech. International Journal of Speech Technology, 12(4):149--160, dec 2009.Google ScholarCross Ref
- B. Schuller, S. Steidl, and A. Batliner. The INTERSPEECH 2009 emotion challenge. In Proc. Interspeech (2009), Brighton, UK, 2009. ISCA.Google Scholar
- B. Schuller, F. Wallhoff, D. Arsic, and G. Rigoll. Musical signal type discrimination based on large open feature sets. In Proc. of the International Conference on Multimedia and Expo ICME 2006. IEEE, 2006.Google ScholarCross Ref
- B. Schuller, M. Wimmer, L. Mösenlechner, C. Kern, D. Arsic, and G. Rigoll. Brute-forcing hierarchical functionals for paralinguistics: A waste of feature space? In Proc. of ICASSP 2008, April 2008.Google ScholarCross Ref
- I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2nd edition edition, 2005. Google ScholarDigital Library
- S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland. The HTK book (v3.4). Cambridge University Press, Cambridge, UK, December 2006.Google Scholar
Index Terms
- Opensmile: the munich versatile and fast open-source audio feature extractor
Recommendations
Modeling Emotion and Attitude in Speech by Means of Perceptually Based Parameter Values
This study focuses on the perception of emotion and attitude in speech. The ability to identify vocal expressions of emotion and/or attitude in speech material was investigated. Systematic perception experiments were carried out to determine optimal ...
Evaluation of the affective valence of speech using pitch substructure
In order to study the relationship between emotion and intonation, a new technique is introduced for the extraction of the dominant pitches within speech utterances and the quasi-musical analysis of the multipitch structure. After the distribution of ...
On the perception of "segmental intonation": F0 context effects on sibilant identification in German
In normal modally voiced utterances, voiceless fricatives like [s], [ź], [f], and [x] vary such that their aperiodic pitch impressions mirror the pitch level of the adjacent F0 contour. For instance, if the F0 contour creates a high or low pitch context,...
Comments