Abstract
Domestic canaries produce complex vocal patterns embedded in various levels of abstraction. Studying such temporal organization is of particular relevance to understand how animal brains represent and process vocal inputs such as language. However, this requires a large amount of annotated data. We propose a fast and easy-to-train transducer model based on RNN architectures to automate parts of the annotation process. This is similar to a speech recognition task. We demonstrate that RNN architectures can be efficiently applied on spectral features (MFCC) to annotate songs at time frame level and at phrase level. We achieved around 95% accuracy at frame level on particularly complex canary songs, and ESNs achieved around \(5\%\) of word error rate (WER) at phrase level. Moreover, we are able to build this model using only around 13 to 20 min of annotated songs. Training time takes only 35 s using 2 h and 40 min of data for the ESN, allowing to quickly run experiments without the need of powerful hardware.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
frame stride and window width are respectively called hop_length and win_length in librosa.
- 2.
As the work in [4] seems to not having been reviewed yet, we will not make further comparisons to avoid any mistakes than could originate from unverified results.
References
Anderson, S.E., Dave, A.S., Margoliash, D.: Template-based automatic recognition of birdsong syllables from continuous recordings. J. Acoust. Soc. Am. 100(2), 1209–1219 (1996)
Bay, M., Ehmann, A.F., Downie, J.S.: Evaluation of multiple-F0 estimation and tracking systems. In: ISMIR (2009)
Chu, W., Blumstein, D.T.: Noise robust bird song detection using syllable pattern-based hidden Markov models. In: 2011 IEEE ICASSP, pp. 345–348 (2011)
Cohen, Y., Nicholson, D., Gardner, T.J.: TweetyNet: a neural network that enables high-throughput, automated annotation of birdsong. bioRxiv p. 2020.08.28.272088 (2020)
Gallicchio, C., Micheli, A., Pedrelli, L.: Fast spectral radius initialization for recurrent neural networks. In: Oneto, L., Navarin, N., Sperduti, A., Anguita, D. (eds.) INNSBDDL 2019. PINNS, vol. 1, pp. 380–390. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-16841-4_39
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. In: ICANN 1999, vol. 2, pp. 850–855 (1999)
Giraudon, J., Trouvain, N., Cazala, A., Del Negro, C., Hinaut, X.: Labeled songs of domestic canary M1–2016-spring (Serinus canaria), May 2021
Jaeger, H.: The “echo state” approach to analysing and training recurrent neural networks-with an erratum note’. German National Research Center for Information Technology GMD Technical Report 148 (2001)
Juang, B.H., Rabiner, L., Wilpon, J.: On the use of bandpass liftering in speech recognition. IEEE Trans. Acoust. Speech Signal Process. 35(7), 947–954 (1987)
Kaewtip, K., Alwan, A., O’Reilly, C., Taylor, C.E.: A robust automatic birdsong phrase classification: a template-based approach. J. Acoust. Soc. Am. 140(5), 3691–3701 (2016)
Koumura, T., Okanoya, K.: Automatic recognition of element classes and boundaries in the birdsong with variable sequences. PLOS ONE 11(7), e0159188 (2016)
Leitner, S., Catchpole, C.K.: Syllable repertoire and the size of the song control system in captive canaries (Serinus canaria). J. Neurobiol. 60(1), 21–27 (2004)
Markowitz, J.E., Ivie, E., Kligler, L., Gardner, T.J.: Long-range order in canary song. PLOS Comput. Biol. 9(5), e1003052 (2013)
McFee, B., et al.: Librosa/librosa: 0.8.0. Zenodo (2020)
Sainburg, T., Thielk, M., Gentner, T.Q.: Latent space visualization, characterization, and generation of diverse vocal communication signals. bioRxiv, p. 870311 (2020)
Tachibana, R.O., Oosugi, N., Okanoya, K.: Semi-automatic classification of birdsong elements using a linear support vector machine. PLOS ONE 9(3), e92584 (2014)
Tan, L.N., Alwan, A., Kossan, G., Cody, M.L., Taylor, C.E.: Dynamic time warping and sparse representation classification for birdsong phrase classification using limited training data. J. Acoust. Soc. Am. 137(3), 1069–1080 (2015)
Trouvain, N., Pedrelli, L., Dinh, T.T., Hinaut, X.: ReservoirPy: an efficient and user-friendly library to design echo state networks. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12397, pp. 494–505. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61616-8_40
Waser, M.S., Marler, P.: Song learning in canaries. J. Comp. Physiol. Psychol. 91(1), 1–7 (1977)
Acknowledgment
We would like to thank Catherine Del Negro, Aurore Cazala and Juliette Giraudon for the recording and transcription of the canary data. We also thank Inria for the ADT grant Scikit-ESN.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Trouvain, N., Hinaut, X. (2021). Canary Song Decoder: Transduction and Implicit Segmentation with ESNs and LTSMs. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12895. Springer, Cham. https://doi.org/10.1007/978-3-030-86383-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-86383-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86382-1
Online ISBN: 978-3-030-86383-8
eBook Packages: Computer ScienceComputer Science (R0)