Abstract
Turkish classical music, characterized by ‘makam’, specific melodic configurations delineated by sequential pitches and intervals, is rich in cultural significance and poses a considerable challenge in identifying a musical piece's particular makam. This identification complexity remains an issue even for experienced musical experts, emphasizing the need for automated and accurate classification techniques. In response, we introduce a residual LSTM neural network model that classifies makams by leveraging the distinct sequential pitch patterns discerned within various audio segments over spectrogram-based inputs. This model's design uniquely merges the spatial capabilities of two-dimensional convolutional layers with the temporal understanding of one-dimensional convolutional and LSTM mechanisms embedded within a residual framework. Such an integrated approach allows for detailed temporal analysis of shifting frequencies, as revealed in logarithmically scaled spectrograms, and is adept at recognizing consecutive pitch patterns within segments. Employing stratified cross-validation on a comprehensive dataset encompassing 1154 pieces spanning 15 unique makams, we found that our model demonstrated an accuracy of 95.60% for a subset of 9 makams and 89.09% for all 15 makams. Our approach demonstrated consistent precision even when distinguishing makam pairs known for their closely related pitch sequences. To further validate our model's prowess, we conducted benchmark tests against established methodologies found in current literature, providing a comparative assessment of our proposed workflow’s abilities.
Similar content being viewed by others
Data availability
The datasets analyzed during the current study are available in the database SymbTR. Web link: https://github.com/MTG/SymbTr.
References
Akkoc C (2010) Non-deterministic scales used in traditional Turkish music. J New Music Res 2002:285–293. https://doi.org/10.1076/jnmr.31.4.285.14169
Akkoc C, Sethares W, Karaosmanoğlu M (2015) Experiments on the relationship between Perde and Seyir in Turkish Makam Music. Music Percept: Interdisc J 32:322–343. https://doi.org/10.1525/mp.2015.32.4.322
Bozkurt B, Karaosmanoğlu M, Karacali B, Unal E (2014) Usul and Makam driven automatic melodic segmentation for Turkish music. J New Music Res 43:375–389. https://doi.org/10.1080/09298215.2014.924535
Demirel E, Bozkurt B, Serra X (2018) Automatic makam recognition using chroma features. In: Proceedings of the 8th International Workshop on Folk Music Analysis. Thessaloniki, Greece. Greece: Aristotle University of Thessaloniki, pp 19–24
Unal E, Bozkurt B, Karaosmanoğlu M (2014) A hierarchical approach to makam classification of Turkish Makam music, using symbolic data. J New Music Res 43:132–146. https://doi.org/10.1080/09298215.2013.870211
Bozkurt B, Gedik A, Karaosmanoğlu M (2009) Music information retrieval for Turkish music: problems, solutions and tools. In: 2009 IEEE 17th Signal Processing and Communications Applications Conference, Antalya, Turkey, pp 804–807. https://doi.org/10.1109/SIU.2009.5136518
Bozkurt B, Karaçali B (2015) A computational analysis of Turkish makam music based on a probabilistic characterization of segmented phrases. J Math Music 9:1–22
Gedik A, Bozkurt B (2009) Evaluation of the makam scale theory of arel for music information retrieval on traditional Turkish Art Music. J New Music Res 38:103–116. https://doi.org/10.1080/09298210903171152
Hammarlund A, Olsson T, Ozdalga E (2001) Sufism, music and society in Turkey and the Middle East, 1st edn. Routledge. https://doi.org/10.4324/9780203346976
Cholevas M (2014) Makam: Modality and style in Turkish art music. In: Pätzold C, Walter CJ (eds). Mikrotonalität—Praxis und Utopie, pp 197–203. Schott Music
Wright O (1992) Words without songs: A musicological study of an early ottoman anthology and its precursors. University of London, School of Oriental and African Studies
Pappas M, Beşiroğlu ŞŞ (2007) Apostolos Konstas’ın Nazariyat Kitabı’na İlişkin Bir İnceleme. İTÜ Dergisi vol 4, no 2, 33–42. Available: http://itudergi.itu.edu.tr/index.php/itudergisi_b/article/view/244. Accessed 15 Sept 2022
Pappas M (2007) Apostolos Konstas’ın Nazariyat Kitabı, Thesis (PhD) -- İstanbul Technical University, Institute of Social Sciences, 2007. Accessed: Aug 12 2022. [Online]. Available: http://hdl.handle.net/11527/12478
Beken M, Signell K (2006) Confirming, delaying, and deceptive elements in turkish improvisations. in Maqām Traditions of Turkic Peoples Berlin: trafo, Trafo
Signell KL (1977) Makam: Modal practice in Turkish Art Music. Asian Music Publications
Yöre S (2012) Maqam in music as a concept, scale and phenomenon. Zeitschrift für die Welt der Türken, vol 4, No. 3
Downie JS (2003) Music information retrieval. Ann Rev Inf Sci Technol 37:295–340
Typke R, Wiering F, Veltkamp R (2005) A survey of music information retrieval systems. In: Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR), Queen Mary, University of London, pp 153–160
Cornelis O, Lesaffre M, Moelants D, Leman M (2010) Access to ethnic music: Advances and perspectives in content-based music information retrieval. Signal Process 90(4):1008–1031. https://doi.org/10.1016/j.sigpro.2009.06.020
McFee B, Humphrey EJ, Urbano J (2016) A plan for sustainable MIR Evaluation. In: ISMIR, 2016
Dieleman S, Schrauwen B (2014) End-to-end learning for music audio. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, pp 6964–6968. https://doi.org/10.1109/ICASSP.2014.6854950
Fathollahi M, Razzazi F (2021) Music similarity measurement and recommendation system using convolutional neural networks. Int J Multimed Inf Retr 10:1–11. https://doi.org/10.1007/s13735-021-00206-5
Kong Q, Choi K, Wang Y (2020) Large-scale MIDI-based composer classification. ArXiv abs/2010.14805
Kong Q, Cao Y, Iqbal T, Wang Y, Plumbley M (2020) Panns: Large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Trans Audio Speech Lang Process 28:2880–2894. https://doi.org/10.1109/TASLP.2020.3030497
Baniya BK, Lee J (2016) Importance of audio feature reduction in automatic music genre classification. Multimed Tools Appl 75(6):3013–3026. https://doi.org/10.1007/s11042-014-2418-z
Müller M, Kurth F, Clausen M (2005) Audio matching via chroma-based statistical features. International Society for Music Information Retrieval Conference, pp 288–295
Alpkocak A, Gedik AC (2005) Classification of Turkish songs according to makams by using n grams. In: 2006 15th Turkish Symposium on Artificial Intelligence and Neural Networks, Muğla
Sağun MAK, Bolat B (2016) Classification of classic Turkish music makams by using deep belief networks. In: 2016 International Symposium on Innovations in Intelligent Systems and Applications (INISTA). Sinaia, pp 1–6. https://doi.org/10.1109/INISTA.2016.7571850
Kizrak MA, Bolat B (2015) Classification of classic Turkish Music Makams by using deep belief networks. In: 2015 23rd Signal Processing and Communications Applications Conference (SIU) 2015, pp 527–530. https://doi.org/10.1109/SIU.2015.7129877
Liu C, Feng L, Liu G, Wang H, Liu S (2021) Bottom-up broadcast neural network for music genre classification. Multimedia Tools Appl 80(5):7313–7331. https://doi.org/10.1007/s11042-020-09643-6
Li J, Han L, Li X, Zhu J, Yuan B, Gou Z (2022) An evaluation of deep neural network models for music classification using spectrograms. Multimed Tools Appl 81(4):4621–4647. https://doi.org/10.1007/s11042-020-10465-9
Fong H, Kumar V, Sudhir K (2021) A theory-based interpretable deep learning architecture for music emotion. Available at SSRN: https://ssrn.com/abstract=4025386, https://doi.org/10.2139/ssrn.4025386
Alqudah AM, Qazan S, Al-Ebbini L, Alquran H, Qasmieh IA (2022) ECG heartbeat arrhythmias classification: a comparison study between different types of spectrum representation and convolutional neural networks architectures. J Ambient Intell Humaniz Comput 13:4877–4907
Arpitha Y, Madhumathi GL, Balaji N (2022) Spectrogram analysis of ECG signal and classification efficiency using MFCC feature extraction technique. J Ambient Intell Humaniz Comput 13:757–767
Harmanny RIA, de Wit JJM, Cabic GP (2014) Radar micro-Doppler feature extraction using the spectrogram and the cepstrogram. In: 2014 11th European Radar Conference, Rome, pp 165–168. https://doi.org/10.1109/EuRAD.2014.6991233
Klatt D, Stevens K (1973) On the automatic recognition of continuous speech: Implications from a spectrogram-reading experiment. IEEE Trans Audio Electroacoustics 21(3):210–217. https://doi.org/10.1109/TAU.1973.1162453
Mehta J, Gandhi D, Thakur G, Kanani P (2021) Music genre classification using transfer learning on log-based MEL spectrogram. In: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), pp 1101–1107. https://doi.org/10.1109/ICCMC51019.2021.9418035
Han K-P, Park Y-S, Jeon S-G, Lee G-C, Ha Y-H (1998) Genre classification system of TV sound signals based on a spectrogram analysis. IEEE Trans Consum Electron 44(1):33–42. https://doi.org/10.1109/30.663728
Mao Y, Zhong G, Wang H, Huang K (2022) Music-CRN: An efficient content-based music classification and recommendation network. Cogn Comput 14(6):2306–2316. https://doi.org/10.1007/s12559-022-10039-X
Choi K, Fazekas G, Sandler M, Cho K (2017) Convolutional recurrent neural networks for music classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 2392–2396. https://doi.org/10.1109/ICASSP.2017.7952585
Kumaraswamy B (2022) Optimized deep learning for genre classification via improved moth flame algorithm. Multimed Tools Appl 81(12):17071–17093. https://doi.org/10.1007/s11042-022-12254-y
Araño KA, Gloor P, Orsenigo C, Vercellis C (2021) When old meets new: emotion recognition from speech signals. Cogn Comput 13:771–783. https://doi.org/10.1007/s12559-021-09865-2
Petran LA (1932) An experimental study of pitch recognition. Psychol Monogr 42(6):1–124
Deutsch D (1982) The influence of melodic context on pitch recognition judgment. Percept Psychophys 31(5):407–410. https://doi.org/10.3758/BF03204849
Laske OE (1988) Introduction to cognitive musicology. Comput Music J 12(1):43–57
de Cheveigné A (2005) Pitch perception models. In: Plack CJ, Fay RR, Oxenham AJ, Popper AN (eds) Pitch. Springer Handbook of Auditory Research, vol 24. Springer, New York. https://doi.org/10.1007/0-387-28958-5_6
Holzapfel A, Benetos E (2019) Automatic music transcription and ethnomusicology: A user study, in ISMIR
Moorer JA (1977) On the transcription of musical sound by computer. Comput Music J 1(4):32–38
Calvo-Zaragoza J, Rizo D (2018) End-to-end neural optical music recognition of monophonic scores. Appl Sci 8:606
Calvo-Zaragoza J, Hajic J, Pacha A (2020) Understanding optical music recognition. ACM Comput Surv (CSUR) 53:1–35
Cheuk KW, Herremans D, Su L (2021) ReconVAT: A semi-supervised automatic music transcription framework for low-resource real-world data. In: Proceedings of the 29th ACM International Conference on Multimedia (MM ’21). Association for Computing Machinery, New York, pp 3918–3926. https://doi.org/10.1145/3474085.3475405
Hernandez-Olivan C, Zay Pinilla I, Hernandez-Lopez C, Beltran JR (2021) A comparison of deep learning methods for timbre analysis in polyphonic automatic music transcription. Electronics 2021(10):810. https://doi.org/10.3390/electronics10070810
Benetos E, Holzapfel A (2015) Automatic transcription of Turkish microtonal music. J Acoust Soc Am 138(4):2118–30
Karaosmanoglu MK (2012) A Turkish Makam music symbolic database for music information retrieval: SymbTr. In: Proceedings of ISMIR,2012
http://www.mus2.com.tr/. Mu2 Software. Accessed 08 Jun 2022
Kizrak MA, Bayram KS, Bolat B (2014) Classification of classic Turkish Music Makams. In: 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings, Alberobello, pp 394–397. https://doi.org/10.1109/INISTA.2014.6873650
McFee B, Raffel C, Liang D, Ellis D, Mcvicar M, Battenberg E, Nieto O (2015) Librosa: Audio and music signal analysis in Python, pp 18–24. https://doi.org/10.25080/Majora-7b98e3ed-003
Müller M, Ellis D, Klapuri A, Richard G (2011) Signal processing for music analysis selected topics in signal processing. IEEE J 5:1088–1110. https://doi.org/10.1109/JSTSP.2011.2112333
Allen JB (1982) Applications of the short time fourier transform to speech processing and spectral analysis. In: ICASSP ’82. IEEE International Conference on Acoustics, Speech, and Signal Processing, Paris, pp 1012–1015. https://doi.org/10.1109/ICASSP.1982.1171703
Oppenheim AV (1970) Speech Spectrograms using the Fast Fourier Transform. Spectrum IEEE 7:57–62. https://doi.org/10.1109/MSPEC.1970.5213512
Cohen L (1989) Time-frequency distributions-a review. Proc IEEE 77(7):941–981. https://doi.org/10.1109/5.30749
Li X, Yan Y, Soraghan J, Wang Z, Ren J (2022) A Music Cognition-Guided Framework for Multi-pitch Estimation. Cognit Comput. https://doi.org/10.1007/s12559-022-10031-5
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302. https://doi.org/10.1109/TSA.2002.800560
Travieso CM, Alonso JB (2013) Special Issue on Advanced Cognitive Systems Based on Nonlinear Analysis. Cognit Comput 5(4):397–398. https://doi.org/10.1007/s12559-013-9237-9
Rabiner LR, Schafer RW (2011) Theory and applications of digital speech processing. Pearson. [Online]. Available: https://books.google.com.tr/books?id=ME67RAAACAAJ. Accessed 10 Jul 2022
Pes B (2021) Learning from high-dimensional and class-imbalanced datasets using random forests. Information 2021(12):286. https://doi.org/10.3390/info12080286
Arbelaitz O, Gurrutxaga I, Muguerza J, Pérez JM (2013) Applying Resampling Methods for Imbalanced Datasets to Not So Imbalanced Datasets,” in Advances in Artificial Intelligence, A. and A.-B. A. and H. J. I. and M. L. and T. A. and C. E. and C. J. M. Bielza Concha and Salmerón, Ed., Berlin, Heidelberg: Springer Berlin Heidelberg, pp 111–120
James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning: With applications in R. Springer, New York. https://doi.org/10.1007/978-1-4614-7138-7
Estabrooks A, Jo DT, Japkowicz N (2004) A Multiple Resampling Method for Learning from Imbalanced Data Sets. Comput Intell 20:18–36. https://doi.org/10.1111/j.0824-7935.2004.t01-1-00228.x
Anguita D, Ghelardoni L, Ghio A, Oneto L, Ridella S (2012) The ‘K’ in K-fold cross validation. In: ESANN 2012 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges (Belgium)
Haykin S, Lippmann R (1994) Neural networks, a comprehensive foundation. Int J Neural Syst 5(4):363–4
Berrar D (2018) Cross-validation. In: Encyclopedia of bioinformatics and computational biology, pp 542–545. https://doi.org/10.1016/B978-0-12-809633-8.20349-X
Sechidis K, Tsoumakas G, Vlahavas I (2011) On the stratification of multi-label data. In: T. M. D., V. M. Gunopulos D, Hofmann T (eds) Machine learning and knowledge discovery in databases, Springer Berlin Heidelberg, Berlin, pp 145–158
Costa Y, de Oliveira L, Silla C (2017) An evaluation of convolutional neural networks for music classification using spectrograms. Appl Soft Comput:52. https://doi.org/10.1016/j.asoc.2016.12.024
Nanni L, Costa Y, Aguiar R, Silla C, Brahnam S (2018) Ensemble of deep learning, visual and acoustic features for music genre classification. J New Music Res 47:1–15. https://doi.org/10.1080/09298215.2018.1438476
Choi K, Fazekas G, Sandler M, Cho K (2018) A comparison of audio signal preprocessing methods for deep neural networks on music tagging. In: 26th European Signal Processing Conference (EUSIPCO), pp 1870–1874. https://doi.org/10.23919/EUSIPCO.2018.8553106
Athulya MK, Sindhu S (2021) Deep learning based music genre classification using spectrogram. Social Science Research Network. In: Proceedings of the International Conference on IoT Based Control Networks & Intelligent Systems - ICICNIS 2021. https://doi.org/10.2139/ssrn.3883911
Chollet F et al (2015) Keras. Retreived from https://github.com/keras-team/keras
Karim F, Majumdar S, Darabi H, Chen S (2018) LSTM fully convolutional networks for time series classification. IEEE Access 6:1662–1669. https://doi.org/10.1109/ACCESS.2017.2779939
Yu Y, Si X, Hu C, Zhang J (2019) A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput 31(7):1235–1270. https://doi.org/10.1162/neco_a_01199
Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2017) LSTM: A search space Odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232. https://doi.org/10.1109/TNNLS.2016.2582924
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Eck D, Schmidhuber J (2002) Finding temporal structure in music: blues improvisation with LSTM recurrent networks. In: Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing, pp 747–756. https://doi.org/10.1109/NNSP.2002.1030094
Ycart A, Benetos E et al (2017) A study on LSTM networks for polyphonic music sequence modelling. In: 18th International Society for Music Information Retrieval Conference (ISMIR), Suzhou
Zuo Z, Shuai B, Wang G, Liu X, Wang X, Wang B (2015) Learning contextual dependence with convolutional hierarchical recurrent neural networks. IEEE Trans Image Process 25(7):2983–2996. https://doi.org/10.1109/TIP.2016.2548241
Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint arXiv:1312.4400. https://doi.org/10.48550/ARXIV.1312.4400
Kingma D, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Mannor S, Peleg D, Rubinstein R (2005) The cross entropy method for classification. In: Proceedings of the 22nd international conference on Machine learning (ICML ’05). Association for Computing Machinery, New York, pp 561–568. https://doi.org/10.1145/1102351.1102422
Hawkins DM (2004) The Problem of Overfitting. J Chem Inf Comput Sci 44(1):1–12. https://doi.org/10.1021/ci0342472
Wang S, Manning CD (2013) Fast dropout training. In: Proceedings of the 30th International Conference on Machine Learning, PMLR 28(2):118–126
Kingma DP, Salimans T, Welling M (2015) Variational dropout and the local reparameterization trick. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, Curran Associates, Inc. [Online]. Available: https://proceedings.neurips.cc/paper/2015/file/bc7316929fe1545bf0b98d114ee3ecb8-Paper.pdf. Accessed 8 Sept 2022
Szegedy C, Ioffe S, Vanhoucke V (2016) Inception-v4, Inception-ResNet and the impact of residual connections on learning. CoRR, vol abs/1602.07261, [Online]. Available: http://arxiv.org/abs/1602.07261. Accessed 8 Sept 2022
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. IEEE Conf Comput Vis Pattern Recogn (CVPR) 2016:770–778. https://doi.org/10.1109/CVPR.2016.90
Wang Z, Yan W, Oates T (2017) Time series classification from scratch with deep neural networks: A strong baseline. Int Jt Conf Neural Netw (IJCNN) 2017:1578–1585. https://doi.org/10.1109/IJCNN.2017.7966039
Cortes C, Mohri M, Rostamizadeh A (2012) L2 regularization for learning kernels. In: Proceedings of the 25th conference on uncertainty in artificial intelligence, UAI 2009. https://doi.org/10.48550/ARXIV.1205.2653
van Laarhoven T (2017) L2 Regularization versus batch and weight normalization. ArXiv abs/1706.05350. https://doi.org/10.48550/arXiv.1706.05350
Japkowicz N, Shah M (2011) Evaluating learning algorithms: A classification perspective. Cambridge University Press
Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: A family of discriminant measures for performance evaluation. In: Sattar A, Kang B (eds) AI 2006: Advances in Artificial Intelligence. Lecture notes in computer science, vol 4304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11941439_114
Acknowledgment
Authors of this paper are thankful to Dr. Gönül Paçacı Tunçay for her insightful discussions on Turkish Classical Music Makams’ scale properties and progression typology.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Compliance with ethical standards
The submitted work is original and has not been published elsewhere in any form or language. The results/data/figures in this manuscript have not been published elsewhere, nor are they under consideration (from you or one of your Contributing Authors) by another publisher.
I confirm that I understand Multimedia Tools and Applications is a transformative journal. When research is accepted for publication, there is a choice to publish using either immediate gold open access or the traditional publishing route.
I have read the Springer journal policies on author responsibilities and submit this manuscript in accordance with those policies.
All of the material is owned by the authors and/or no permissions are required.
Human and animal rights
Research involving human participants and/or animals: This article does not contain any studies with animals performed by any of the authors. So, we need no informed consent of any party.
Competing interests
Disclosure of potential conflicts of interest: The authors have no relevant financial or non-financial interests to disclose. No, I declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mirza, F.K., Gürsoy, A.F., Baykaş, T. et al. Residual LSTM neural network for time dependent consecutive pitch string recognition from spectrograms: a study on Turkish classical music makams. Multimed Tools Appl 83, 41243–41271 (2024). https://doi.org/10.1007/s11042-023-17105-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17105-y