skip to main content
10.1145/2647868.2654904acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Music Emotion Recognition by Multi-label Multi-layer Multi-instance Multi-view Learning

Authors Info & Claims
Published:03 November 2014Publication History

ABSTRACT

Music emotion recognition, which aims to automatically recognize the affective content of a piece of music, has become one of the key components of music searching, exploring, and social networking applications. Although researchers have given more and more attention to music emotion recognition studies, the recognition performance has come to a bottleneck in recent years. One major reason is that experts' labels for music emotion are mostly song-level, while music emotion usually varies within a song. Traditional methods have considered each song as a single instance and have built models based on song-level features. However, they ignored the dynamics of music emotion and failed to capture accurate emotion-feature correlations. In this paper, we model music emotion recognition as a novel multi-label multi-layer multi-instance multi-view learning problem: music is formulated as a hierarchical multi-instance structure (e.g., song-segment-sentence) where multiple emotion labels correspond to at least one of the instances with multiple views of each layer. We propose a Hierarchical Music Emotion Recognition model (HMER) -- a novel hierarchical Bayesian model using sentence-level music and lyrics features. It captures music emotion dynamics with a song-segment-sentence hierarchical structure. HMER also considers emotion correlations between both music segments and sentences. Experimental results show that HMER outperforms several state-of-the-art methods in terms of $F_1$ score and mean average precision.

References

  1. AllMusic moods. Online: http://www.allmusic.com/moods (9 Dec 2011).Google ScholarGoogle Scholar
  2. Bohemian rhapsody. Online (22 March 2014): http://www.queensongs.info/the-book/songwritinganalyses/no-synth-era/a-night-at-the-opera/bohemianrhapsody.html.Google ScholarGoogle Scholar
  3. T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere. The million song dataset. In Proceedings of the International Society for Music Information Retrieval Conference, pages 591--596, 2011.Google ScholarGoogle Scholar
  4. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. M. Bradley and P. J. Lang. Affective norms for English words (ANEW): Instruction manual and affective ratings. Psychology, (C-1):1--45, 1999.Google ScholarGoogle Scholar
  6. R. Cai, C. Zhang, C. Wang, L. Zhang, and W.-Y. Ma. Musicsense: contextual music recommendation using emotional allocation modeling. In Proceedings of the 15th ACM International Conference on Multimedia, pages 553--556, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1):5228--5235, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  8. Z. Gu, T. Mei, X.-S. Hua, J. Tang, and X. Wu. Multi-layer multi-instance learning for video concept detection. IEEE Transactions on Multimedia, 10(8):1605--1616, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. X. Hu and J. S. Downie. Improving mood classification in music digital libraries by combining lyrics and audio. In Proceedings of the 10th Annual Joint Conference on Digital Libraries, pages 159--168. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Hu, X. Chen, and D. Yang. Lyric-based song emotion detection with affective lexicon and fuzzy clustering method. In Proceedings of the International Society for Music Information Retrieval Conference, pages 123--128, 2009.Google ScholarGoogle Scholar
  11. B. Jun Han, S. Rho, R. B. Dannenberg, and E. Hwang. Smers: Music emotion recognition using support vector regression. In Proceedings of the International Society for Music Information Retrieval Conference, pages 651--656, 2009.Google ScholarGoogle Scholar
  12. P. N. Juslin and J. A. Sloboda. Music and emotion: Theory and research. Oxford University Press, 2001.Google ScholarGoogle Scholar
  13. C. Laurier, J. Grivolla, and P. Herrera. Multimodal music mood classification using audio and lyrics. In International Conference on Machine Learning and Applications, pages 688--693. IEEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Li and M. Ogihara. Content-based music similarity search and emotion detection. In IEEE International Conference on Acoustics, Speech, and Signal Processing., volume 5, pages 705--708, 2004.Google ScholarGoogle Scholar
  15. L. Lu, D. Liu, and H. Zhang. Automatic mood detection and tracking of music audio signals. IEEE Transactions on Audio, Speech, and Language Processing, 14(1):5--18, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. I. Mandel and D. P. Ellis. Multiple-instance learning for music information retrieval. In Proceedings of the International Society for Music Information Retrieval Conference, pages 577--582, 2008.Google ScholarGoogle Scholar
  17. C.-T. Nguyen, D.-C. Zhan, and Z.-H. Zhou. Multi-modal image annotation with multi-instance multi-label lda. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pages 1558--1564, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. N. Rubin, A. Chambers, P. Smyth, and M. Steyvers. Statistical topic models for multi-label document classification. Machine Learning, 88(1--2):157--208, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. M. Schmidt and Y. E. Kim. Prediction of time-varying musical mood distributions using kalman filtering. In IEEE International Conference on Machine Learning and Applications, pages 655--660, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. M. Schmidt and Y. E. Kim. Modeling musical emotion dynamics with conditional random fields. In Proceedings of the International Society for Music Information Retrieval Conference, pages 777--782, 2011.Google ScholarGoogle Scholar
  21. K. Trohidis, G. Tsoumakas, G. Kalliris, and I. P. Vlahavas. Multi-label classification of music into emotions. In Proceedings of the International Society for Music Information Retrieval Conference, volume 8, pages 325--330, 2008.Google ScholarGoogle Scholar
  22. G. Tsoumakas, I. Katakis, and I. Vlahavas. Mining multi-label data. In Data mining and knowledge discovery handbook, pages 667--685. Springer, 2010.Google ScholarGoogle Scholar
  23. G. Tzanetakis and P. Cook. Musical genre classification of audio signals. IEEE Transactions on Audio, Speech, and Language Processing, 10(5):293--302, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  24. J.-C. Wang, Y.-H. Yang, H.-M. Wang, and S.-K. Jeng. The acoustic emotion gaussians model for emotion-based music annotation and retrieval. In Proceedings of the 20th ACM International Conference on Multimedia, pages 89--98, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Wang, X. Chen, D. Yang, and Y. Wu. Music emotion classification of chinese songs based on lyrics using tf*idf and rhyme. In Proceedings of the International Society for Music Information Retrieval Conference, pages 765--770, 2011.Google ScholarGoogle Scholar
  26. B. Wu, E. Zhong, D. H. Hu, A. Horner, and Q. Yang. Smart: Semi-supervised music emotion recognition with social tagging. In SIAM International Conference on Data Mining, pages 279--287. SIAM, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  27. C. Xu, D. Tao, and C. Xu. A survey on multi-view learning. arXiv preprint arXiv:1304.5634, 2013.Google ScholarGoogle Scholar
  28. Y.-H. Yang and H. H. Chen. Machine recognition of music emotion: A review. ACM Transactions on Intelligent Systems and Technology, 3(3):40, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y.-H. Yang and J.-Y. Liu. Quantitative study of music listening behavior in a social and affective context. IEEE Transactions on Multimedia, 15(6):1304--1315, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Z.-J. Zha, X.-S. Hua, T. Mei, J. Wang, G.-J. Qi, and Z. Wang. Joint multi-label multi-instance learning for image classification. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1--8, 2008.Google ScholarGoogle Scholar
  31. Z.-H. Zhou and M.-L. Zhang. Multi-instance multi-label learning with application to scene classification. In Advances in Neural Information Processing Systems, pages 1609--1616, 2006.Google ScholarGoogle Scholar

Index Terms

  1. Music Emotion Recognition by Multi-label Multi-layer Multi-instance Multi-view Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '14: Proceedings of the 22nd ACM international conference on Multimedia
        November 2014
        1310 pages
        ISBN:9781450330633
        DOI:10.1145/2647868

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 November 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        MM '14 Paper Acceptance Rate55of286submissions,19%Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader