ABSTRACT
Music emotion recognition, which aims to automatically recognize the affective content of a piece of music, has become one of the key components of music searching, exploring, and social networking applications. Although researchers have given more and more attention to music emotion recognition studies, the recognition performance has come to a bottleneck in recent years. One major reason is that experts' labels for music emotion are mostly song-level, while music emotion usually varies within a song. Traditional methods have considered each song as a single instance and have built models based on song-level features. However, they ignored the dynamics of music emotion and failed to capture accurate emotion-feature correlations. In this paper, we model music emotion recognition as a novel multi-label multi-layer multi-instance multi-view learning problem: music is formulated as a hierarchical multi-instance structure (e.g., song-segment-sentence) where multiple emotion labels correspond to at least one of the instances with multiple views of each layer. We propose a Hierarchical Music Emotion Recognition model (HMER) -- a novel hierarchical Bayesian model using sentence-level music and lyrics features. It captures music emotion dynamics with a song-segment-sentence hierarchical structure. HMER also considers emotion correlations between both music segments and sentences. Experimental results show that HMER outperforms several state-of-the-art methods in terms of $F_1$ score and mean average precision.
- AllMusic moods. Online: http://www.allmusic.com/moods (9 Dec 2011).Google Scholar
- Bohemian rhapsody. Online (22 March 2014): http://www.queensongs.info/the-book/songwritinganalyses/no-synth-era/a-night-at-the-opera/bohemianrhapsody.html.Google Scholar
- T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere. The million song dataset. In Proceedings of the International Society for Music Information Retrieval Conference, pages 591--596, 2011.Google Scholar
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
- M. M. Bradley and P. J. Lang. Affective norms for English words (ANEW): Instruction manual and affective ratings. Psychology, (C-1):1--45, 1999.Google Scholar
- R. Cai, C. Zhang, C. Wang, L. Zhang, and W.-Y. Ma. Musicsense: contextual music recommendation using emotional allocation modeling. In Proceedings of the 15th ACM International Conference on Multimedia, pages 553--556, 2007. Google ScholarDigital Library
- T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1):5228--5235, 2004.Google ScholarCross Ref
- Z. Gu, T. Mei, X.-S. Hua, J. Tang, and X. Wu. Multi-layer multi-instance learning for video concept detection. IEEE Transactions on Multimedia, 10(8):1605--1616, 2008. Google ScholarDigital Library
- X. Hu and J. S. Downie. Improving mood classification in music digital libraries by combining lyrics and audio. In Proceedings of the 10th Annual Joint Conference on Digital Libraries, pages 159--168. ACM, 2010. Google ScholarDigital Library
- Y. Hu, X. Chen, and D. Yang. Lyric-based song emotion detection with affective lexicon and fuzzy clustering method. In Proceedings of the International Society for Music Information Retrieval Conference, pages 123--128, 2009.Google Scholar
- B. Jun Han, S. Rho, R. B. Dannenberg, and E. Hwang. Smers: Music emotion recognition using support vector regression. In Proceedings of the International Society for Music Information Retrieval Conference, pages 651--656, 2009.Google Scholar
- P. N. Juslin and J. A. Sloboda. Music and emotion: Theory and research. Oxford University Press, 2001.Google Scholar
- C. Laurier, J. Grivolla, and P. Herrera. Multimodal music mood classification using audio and lyrics. In International Conference on Machine Learning and Applications, pages 688--693. IEEE, 2008. Google ScholarDigital Library
- T. Li and M. Ogihara. Content-based music similarity search and emotion detection. In IEEE International Conference on Acoustics, Speech, and Signal Processing., volume 5, pages 705--708, 2004.Google Scholar
- L. Lu, D. Liu, and H. Zhang. Automatic mood detection and tracking of music audio signals. IEEE Transactions on Audio, Speech, and Language Processing, 14(1):5--18, 2006. Google ScholarDigital Library
- M. I. Mandel and D. P. Ellis. Multiple-instance learning for music information retrieval. In Proceedings of the International Society for Music Information Retrieval Conference, pages 577--582, 2008.Google Scholar
- C.-T. Nguyen, D.-C. Zhan, and Z.-H. Zhou. Multi-modal image annotation with multi-instance multi-label lda. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pages 1558--1564, 2013. Google ScholarDigital Library
- T. N. Rubin, A. Chambers, P. Smyth, and M. Steyvers. Statistical topic models for multi-label document classification. Machine Learning, 88(1--2):157--208, 2012. Google ScholarDigital Library
- E. M. Schmidt and Y. E. Kim. Prediction of time-varying musical mood distributions using kalman filtering. In IEEE International Conference on Machine Learning and Applications, pages 655--660, 2010. Google ScholarDigital Library
- E. M. Schmidt and Y. E. Kim. Modeling musical emotion dynamics with conditional random fields. In Proceedings of the International Society for Music Information Retrieval Conference, pages 777--782, 2011.Google Scholar
- K. Trohidis, G. Tsoumakas, G. Kalliris, and I. P. Vlahavas. Multi-label classification of music into emotions. In Proceedings of the International Society for Music Information Retrieval Conference, volume 8, pages 325--330, 2008.Google Scholar
- G. Tsoumakas, I. Katakis, and I. Vlahavas. Mining multi-label data. In Data mining and knowledge discovery handbook, pages 667--685. Springer, 2010.Google Scholar
- G. Tzanetakis and P. Cook. Musical genre classification of audio signals. IEEE Transactions on Audio, Speech, and Language Processing, 10(5):293--302, 2002.Google ScholarCross Ref
- J.-C. Wang, Y.-H. Yang, H.-M. Wang, and S.-K. Jeng. The acoustic emotion gaussians model for emotion-based music annotation and retrieval. In Proceedings of the 20th ACM International Conference on Multimedia, pages 89--98, 2012. Google ScholarDigital Library
- X. Wang, X. Chen, D. Yang, and Y. Wu. Music emotion classification of chinese songs based on lyrics using tf*idf and rhyme. In Proceedings of the International Society for Music Information Retrieval Conference, pages 765--770, 2011.Google Scholar
- B. Wu, E. Zhong, D. H. Hu, A. Horner, and Q. Yang. Smart: Semi-supervised music emotion recognition with social tagging. In SIAM International Conference on Data Mining, pages 279--287. SIAM, 2013.Google ScholarCross Ref
- C. Xu, D. Tao, and C. Xu. A survey on multi-view learning. arXiv preprint arXiv:1304.5634, 2013.Google Scholar
- Y.-H. Yang and H. H. Chen. Machine recognition of music emotion: A review. ACM Transactions on Intelligent Systems and Technology, 3(3):40, 2012. Google ScholarDigital Library
- Y.-H. Yang and J.-Y. Liu. Quantitative study of music listening behavior in a social and affective context. IEEE Transactions on Multimedia, 15(6):1304--1315, 2013. Google ScholarDigital Library
- Z.-J. Zha, X.-S. Hua, T. Mei, J. Wang, G.-J. Qi, and Z. Wang. Joint multi-label multi-instance learning for image classification. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1--8, 2008.Google Scholar
- Z.-H. Zhou and M.-L. Zhang. Multi-instance multi-label learning with application to scene classification. In Advances in Neural Information Processing Systems, pages 1609--1616, 2006.Google Scholar
Index Terms
- Music Emotion Recognition by Multi-label Multi-layer Multi-instance Multi-view Learning
Recommendations
Machine Recognition of Music Emotion: A Review
The proliferation of MP3 players and the exploding amount of digital music content call for novel ways of music organization and retrieval to meet the ever-increasing demand for easy and effective information access. As almost every music piece is ...
Emotion Recognition of Chinese Traditional Folk Music using an Assembling Machine Learning Method
ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning TechnologiesVarious papers published recently about the emotion of western pop music, none have looked into how to describe Chinese traditional folk music. The accuracy of existing algorithms in recognizing emotions in Chinese traditional folk music is just 42%. ...
The Role of Time in Music Emotion Recognition: Modeling Musical Emotions from Time-Varying Music Features
CMMR 2012: Revised Selected Papers of the 9th International Symposium on From Sounds to Music and Emotions - Volume 7900Music is widely perceived as expressive of emotion. However, there is no consensus on which factors in music contribute to the expression of emotions, making it difficult to find robust objective predictors for music emotion recognition MER. Currently, ...
Comments