skip to main content
10.1145/1873951.1874246acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Opensmile: the munich versatile and fast open-source audio feature extractor

Published:25 October 2010Publication History

ABSTRACT

We introduce the openSMILE feature extraction toolkit, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities. Audio low-level descriptors such as CHROMA and CENS features, loudness, Mel-frequency cepstral coefficients, perceptual linear predictive cepstral coefficients, linear predictive coefficients, line spectral frequencies, fundamental frequency, and formant frequencies are supported. Delta regression and various statistical functionals can be applied to the low-level descriptors. openSMILE is implemented in C++ with no third-party dependencies for the core functionality. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. It supports on-line incremental processing for all implemented features as well as off-line and batch processing. Numeric compatibility with future versions is ensured by means of unit tests. openSMILE can be downloaded from http://opensmile.sourceforge.net/.

References

  1. X. Amatriain, P. Arumi, and D. Garcia. A framework for efficient and rapid development of cross-platform audio applications. Multimedia Systems, 14(1):15--32, June 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Batliner, S. Steidl, B. Schuller, D. Seppi, K. Laskowski, T. Vogt, L. Devillers, L. Vidrascu, N. Amir, L. Kessous, and V. Aharonson. Combining efforts for improving automatic classification of emotional user states. In T. Erjavec and J. Gros, editors, Language Technologies, IS-LTC 2006, pages 240--245. Informacijska Druzba, 2006.Google ScholarGoogle Scholar
  3. P. Boersma and D. Weenink. Praat: doing phonetics by computer (v. 4.3.14). http://www.praat.org/, 2005.Google ScholarGoogle Scholar
  4. F. Eyben, M. Wöllmer, A. Graves, B. Schuller, E. Douglas-Cowie, and R. Cowie. On-line emotion recognition in a 3-d activation-valence-time continuum using acoustic and linguistic cues. Journal on Multimodal User Interfaces, 3(1-2):7--19, Mar. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  5. F. Eyben, M. Wöllmer, and B. Schuller. openEAR - introducing the munich open-source emotion and affect recognition toolkit. In Proc. of ACII 2009, volume I, pages 576--581. IEEE, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  6. R. Fernandez. A Computational Model for the Automatic Recognition of Affect in Speech. PhD thesis, MIT Media Arts and Science, Feb. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. N. Garner, J. Dines, T. Hain, A. El Hannani, M. Karafiat, D. Korchagin, M. Lincoln, V. Wan, and L. Zhang. Real-time asr from meetings. In Proc. of INTERSPEECH 2009, Brighton, UK. ISCA, 2009.Google ScholarGoogle Scholar
  8. A. Lerch and G. Eisenberg. FEAPI: a low level feature extraction plug-in api. In Proc. of the 8th International Conference on Digital Audio Effects (DAFx), Madrid, Spain, 2005.Google ScholarGoogle Scholar
  9. D. McEnnis, C. McKay, I. Fujinaga, and P. Depalle. jaudio: A feature extraction library. In Proc. of ISMIR 2005, pages 600--603, 2005.Google ScholarGoogle Scholar
  10. I. Mporas and T. Ganchev. Estimation of unknown speaker's height from speech. International Journal of Speech Technology, 12(4):149--160, dec 2009.Google ScholarGoogle ScholarCross RefCross Ref
  11. B. Schuller, S. Steidl, and A. Batliner. The INTERSPEECH 2009 emotion challenge. In Proc. Interspeech (2009), Brighton, UK, 2009. ISCA.Google ScholarGoogle Scholar
  12. B. Schuller, F. Wallhoff, D. Arsic, and G. Rigoll. Musical signal type discrimination based on large open feature sets. In Proc. of the International Conference on Multimedia and Expo ICME 2006. IEEE, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  13. B. Schuller, M. Wimmer, L. Mösenlechner, C. Kern, D. Arsic, and G. Rigoll. Brute-forcing hierarchical functionals for paralinguistics: A waste of feature space? In Proc. of ICASSP 2008, April 2008.Google ScholarGoogle ScholarCross RefCross Ref
  14. I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2nd edition edition, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland. The HTK book (v3.4). Cambridge University Press, Cambridge, UK, December 2006.Google ScholarGoogle Scholar

Index Terms

  1. Opensmile: the munich versatile and fast open-source audio feature extractor

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '10: Proceedings of the 18th ACM international conference on Multimedia
        October 2010
        1836 pages
        ISBN:9781605589336
        DOI:10.1145/1873951

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 October 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader