research-article

Music Emotion Recognition by Multi-label Multi-layer Multi-instance Multi-view Learning

Authors:
Bin Wu

Hong Kong University of Science and Technology, Hong Kong, Hong Kong

Hong Kong University of Science and Technology, Hong Kong, Hong Kong
View Profile

,
Erheng Zhong

Hong Kong University of Science and Technology, Hong Kong, Hong Kong

Hong Kong University of Science and Technology, Hong Kong, Hong Kong
View Profile

,
Andrew Horner

Hong Kong University of Science and Technology, Hong Kong, Hong Kong

Hong Kong University of Science and Technology, Hong Kong, Hong Kong
View Profile

,
Qiang Yang

Hong Kong University of Science and Technology, Hong Kong, Hong Kong

Hong Kong University of Science and Technology, Hong Kong, Hong Kong
View Profile

MM '14: Proceedings of the 22nd ACM international conference on MultimediaNovember 2014Pages 117–126https://doi.org/10.1145/2647868.2654904

Published:03 November 2014Publication History

MM '14: Proceedings of the 22nd ACM international conference on Multimedia

Pages 117–126

ABSTRACT

Music emotion recognition, which aims to automatically recognize the affective content of a piece of music, has become one of the key components of music searching, exploring, and social networking applications. Although researchers have given more and more attention to music emotion recognition studies, the recognition performance has come to a bottleneck in recent years. One major reason is that experts' labels for music emotion are mostly song-level, while music emotion usually varies within a song. Traditional methods have considered each song as a single instance and have built models based on song-level features. However, they ignored the dynamics of music emotion and failed to capture accurate emotion-feature correlations. In this paper, we model music emotion recognition as a novel multi-label multi-layer multi-instance multi-view learning problem: music is formulated as a hierarchical multi-instance structure (e.g., song-segment-sentence) where multiple emotion labels correspond to at least one of the instances with multiple views of each layer. We propose a Hierarchical Music Emotion Recognition model (HMER) -- a novel hierarchical Bayesian model using sentence-level music and lyrics features. It captures music emotion dynamics with a song-segment-sentence hierarchical structure. HMER also considers emotion correlations between both music segments and sentences. Experimental results show that HMER outperforms several state-of-the-art methods in terms of $F_1$ score and mean average precision.

References

AllMusic moods. Online: http://www.allmusic.com/moods (9 Dec 2011).Google Scholar
Bohemian rhapsody. Online (22 March 2014): http://www.queensongs.info/the-book/songwritinganalyses/no-synth-era/a-night-at-the-opera/bohemianrhapsody.html.Google Scholar
T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere. The million song dataset. In Proceedings of the International Society for Music Information Retrieval Conference, pages 591--596, 2011.Google Scholar
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
M. M. Bradley and P. J. Lang. Affective norms for English words (ANEW): Instruction manual and affective ratings. Psychology, (C-1):1--45, 1999.Google Scholar
R. Cai, C. Zhang, C. Wang, L. Zhang, and W.-Y. Ma. Musicsense: contextual music recommendation using emotional allocation modeling. In Proceedings of the 15th ACM International Conference on Multimedia, pages 553--556, 2007. Google ScholarDigital Library
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1):5228--5235, 2004.Google ScholarCross Ref
Z. Gu, T. Mei, X.-S. Hua, J. Tang, and X. Wu. Multi-layer multi-instance learning for video concept detection. IEEE Transactions on Multimedia, 10(8):1605--1616, 2008. Google ScholarDigital Library
X. Hu and J. S. Downie. Improving mood classification in music digital libraries by combining lyrics and audio. In Proceedings of the 10th Annual Joint Conference on Digital Libraries, pages 159--168. ACM, 2010. Google ScholarDigital Library
Y. Hu, X. Chen, and D. Yang. Lyric-based song emotion detection with affective lexicon and fuzzy clustering method. In Proceedings of the International Society for Music Information Retrieval Conference, pages 123--128, 2009.Google Scholar
B. Jun Han, S. Rho, R. B. Dannenberg, and E. Hwang. Smers: Music emotion recognition using support vector regression. In Proceedings of the International Society for Music Information Retrieval Conference, pages 651--656, 2009.Google Scholar
P. N. Juslin and J. A. Sloboda. Music and emotion: Theory and research. Oxford University Press, 2001.Google Scholar
C. Laurier, J. Grivolla, and P. Herrera. Multimodal music mood classification using audio and lyrics. In International Conference on Machine Learning and Applications, pages 688--693. IEEE, 2008. Google ScholarDigital Library
T. Li and M. Ogihara. Content-based music similarity search and emotion detection. In IEEE International Conference on Acoustics, Speech, and Signal Processing., volume 5, pages 705--708, 2004.Google Scholar
L. Lu, D. Liu, and H. Zhang. Automatic mood detection and tracking of music audio signals. IEEE Transactions on Audio, Speech, and Language Processing, 14(1):5--18, 2006. Google ScholarDigital Library
M. I. Mandel and D. P. Ellis. Multiple-instance learning for music information retrieval. In Proceedings of the International Society for Music Information Retrieval Conference, pages 577--582, 2008.Google Scholar
C.-T. Nguyen, D.-C. Zhan, and Z.-H. Zhou. Multi-modal image annotation with multi-instance multi-label lda. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pages 1558--1564, 2013. Google ScholarDigital Library
T. N. Rubin, A. Chambers, P. Smyth, and M. Steyvers. Statistical topic models for multi-label document classification. Machine Learning, 88(1--2):157--208, 2012. Google ScholarDigital Library
E. M. Schmidt and Y. E. Kim. Prediction of time-varying musical mood distributions using kalman filtering. In IEEE International Conference on Machine Learning and Applications, pages 655--660, 2010. Google ScholarDigital Library
E. M. Schmidt and Y. E. Kim. Modeling musical emotion dynamics with conditional random fields. In Proceedings of the International Society for Music Information Retrieval Conference, pages 777--782, 2011.Google Scholar
K. Trohidis, G. Tsoumakas, G. Kalliris, and I. P. Vlahavas. Multi-label classification of music into emotions. In Proceedings of the International Society for Music Information Retrieval Conference, volume 8, pages 325--330, 2008.Google Scholar
G. Tsoumakas, I. Katakis, and I. Vlahavas. Mining multi-label data. In Data mining and knowledge discovery handbook, pages 667--685. Springer, 2010.Google Scholar
G. Tzanetakis and P. Cook. Musical genre classification of audio signals. IEEE Transactions on Audio, Speech, and Language Processing, 10(5):293--302, 2002.Google ScholarCross Ref
J.-C. Wang, Y.-H. Yang, H.-M. Wang, and S.-K. Jeng. The acoustic emotion gaussians model for emotion-based music annotation and retrieval. In Proceedings of the 20th ACM International Conference on Multimedia, pages 89--98, 2012. Google ScholarDigital Library
X. Wang, X. Chen, D. Yang, and Y. Wu. Music emotion classification of chinese songs based on lyrics using tf*idf and rhyme. In Proceedings of the International Society for Music Information Retrieval Conference, pages 765--770, 2011.Google Scholar
B. Wu, E. Zhong, D. H. Hu, A. Horner, and Q. Yang. Smart: Semi-supervised music emotion recognition with social tagging. In SIAM International Conference on Data Mining, pages 279--287. SIAM, 2013.Google ScholarCross Ref
C. Xu, D. Tao, and C. Xu. A survey on multi-view learning. arXiv preprint arXiv:1304.5634, 2013.Google Scholar
Y.-H. Yang and H. H. Chen. Machine recognition of music emotion: A review. ACM Transactions on Intelligent Systems and Technology, 3(3):40, 2012. Google ScholarDigital Library
Y.-H. Yang and J.-Y. Liu. Quantitative study of music listening behavior in a social and affective context. IEEE Transactions on Multimedia, 15(6):1304--1315, 2013. Google ScholarDigital Library
Z.-J. Zha, X.-S. Hua, T. Mei, J. Wang, G.-J. Qi, and Z. Wang. Joint multi-label multi-instance learning for image classification. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1--8, 2008.Google Scholar
Z.-H. Zhou and M.-L. Zhang. Multi-instance multi-label learning with application to scene classification. In Advances in Neural Information Processing Systems, pages 1609--1616, 2006.Google Scholar

Index Terms

Music Emotion Recognition by Multi-label Multi-layer Multi-instance Multi-view Learning
1. Applied computing
  1. Arts and humanities
    1. Sound and music computing
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Music retrieval

Recommendations

Machine Recognition of Music Emotion: A Review

The proliferation of MP3 players and the exploding amount of digital music content call for novel ways of music organization and retrieval to meet the ever-increasing demand for easy and effective information access. As almost every music piece is ...
Read More
Emotion Recognition of Chinese Traditional Folk Music using an Assembling Machine Learning Method
ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning Technologies

Various papers published recently about the emotion of western pop music, none have looked into how to describe Chinese traditional folk music. The accuracy of existing algorithms in recognizing emotions in Chinese traditional folk music is just 42%. ...
Read More
The Role of Time in Music Emotion Recognition: Modeling Musical Emotions from Time-Varying Music Features
CMMR 2012: Revised Selected Papers of the 9th International Symposium on From Sounds to Music and Emotions - Volume 7900

Music is widely perceived as expressive of emotion. However, there is no consensus on which factors in music contribute to the expression of emotions, making it difficult to find robust objective predictors for music emotion recognition MER. Currently, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '14: Proceedings of the 22nd ACM international conference on Multimedia
November 2014
1310 pages
ISBN:9781450330633
DOI:10.1145/2647868
General Chairs:
Kien A. Hua
University of Central Florida, USA
,
Yong Rui
Microsoft Research, China
,
Ralf Steinmetz
Technische Universitt Darmstadt, Germany
,
Program Chairs:
Alan Hanjalic
Delft University of Technology, Netherlands
,
Apostol (Paul) Natsev
Google, USA
,
Wenwu Zhu
Tsinghua University, China
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
multi-label multi-layer multi-instance multi-view learning
music emotion recognition
Qualifiers
- research-article
Conference

Acceptance Rates
MM '14 Paper Acceptance Rate55of286submissions,19%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 53
  Total Citations
  View Citations
- 996
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Music Emotion Recognition by Multi-label Multi-layer Multi-instance Multi-view Learning

MM '14: Proceedings of the 22nd ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Machine Recognition of Music Emotion: A Review

Emotion Recognition of Chinese Traditional Folk Music using an Assembling Machine Learning Method

The Role of Time in Music Emotion Recognition: Modeling Musical Emotions from Time-Varying Music Features

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Music Emotion Recognition by Multi-label Multi-layer Multi-instance Multi-view Learning

MM '14: Proceedings of the 22nd ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Machine Recognition of Music Emotion: A Review

Emotion Recognition of Chinese Traditional Folk Music using an Assembling Machine Learning Method

The Role of Time in Music Emotion Recognition: Modeling Musical Emotions from Time-Varying Music Features

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media