ABSTRACT
In this work we investigate a data-driven vector representation of word embedding for the task of classifying song lyrics into their semantic topics. Previous research on topic classification of song lyrics has used traditional frequency based text representation. On the other hand, empirically driven word embedding has shown sensible performance improvment of text classification tasks, because of its ability to capture semantic relationship between words from big data. As averaging the word vectors from a short text is known to work reasonably well compared to the other comprehensive models utilizing their order, we adopt the averaged word vectors from the lyrics and user's interpretations about them, which are short in general, as the feature for this classification task. This simple approach showed promising classification accuracy of 57%. From this, we envision the potential of the data-driven approaches to creating features, such as the sequence of word vectors and doc2vec models, to improve the performance of the system.
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606 (2016).Google Scholar
- Kahyun Choi, Jin Ha Lee, and J Stephen Downie. 2014. What is this song about anyway?: Automatic classification of subject using user interpretations and lyrics. Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries (2014), 453--454. Google ScholarDigital Library
- Kahyun Choi, Jin Ha Lee, Xiao Hu, and J Stephen Downie. 2016. Music Subject Classification Based on Lyrics and User Interpretations. Proceedings of the American Society for Information Science and Technology (2016), 1--10. Google ScholarDigital Library
- Cedric De Boom, Steven Van Canneyt, Thomas Demeester, and Bart Dhoedt. 2016. Representation learning for very short texts using weighted word embedding aggregation. Pattern Recognition Letters 80 (2016), 150--156. Google ScholarDigital Library
- Jin Ha Lee and J Stephen Downie. 2004. Survey Of Music Information Needs, Uses, And Seeking Behaviours: Preliminary Findings. Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR) 2004 (2004), 441--446.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google Scholar
Index Terms
- Exploratory Investigation of Word Embedding in Song Lyric Topic Classification: Promising Preliminary Results
Recommendations
A Trend Analysis on Concreteness of Popular Song Lyrics
DLfM '19: Proceedings of the 6th International Conference on Digital Libraries for MusicologyRecently, music complexity has drawn attention from researchers in Music Digital Libraries area. In particular, computational methods to measure music complexity have been studied to provide better music services in large-scale music digital libraries. ...
A topic-enhanced word embedding for Twitter sentiment classification
Word representation is crucial to lexical features used in Twitter sentiment analysis models. Recent work has demonstrated that dense, low-dimensional and real-valued word embedding gives competitive performance for Twitter sentiment classification. We ...
Development of a Song Lyric Corpus for the English Language
Natural Language Processing and Information SystemsAbstractWeb Scraping Tools are simplifying the task of creating large databases for various applications such as the construction of corpus aimed at the development of applications for natural language processing. Many of these applications require a ...
Comments