Canary Song Decoder: Transduction and Implicit Segmentation with ESNs and LTSMs

Trouvain, Nathan; Hinaut, Xavier

doi:10.1007/978-3-030-86383-8_6

Canary Song Decoder: Transduction and Implicit Segmentation with ESNs and LTSMs

Conference paper
First Online: 07 September 2021

2028 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12895))

Abstract

Domestic canaries produce complex vocal patterns embedded in various levels of abstraction. Studying such temporal organization is of particular relevance to understand how animal brains represent and process vocal inputs such as language. However, this requires a large amount of annotated data. We propose a fast and easy-to-train transducer model based on RNN architectures to automate parts of the annotation process. This is similar to a speech recognition task. We demonstrate that RNN architectures can be efficiently applied on spectral features (MFCC) to annotate songs at time frame level and at phrase level. We achieved around 95% accuracy at frame level on particularly complex canary songs, and ESNs achieved around \(5\%\) of word error rate (WER) at phrase level. Moreover, we are able to build this model using only around 13 to 20 min of annotated songs. Training time takes only 35 s using 2 h and 40 min of data for the ESN, allowing to quickly run experiments without the need of powerful hardware.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
frame stride and window width are respectively called hop_length and win_length in librosa.
2.
As the work in [4] seems to not having been reviewed yet, we will not make further comparisons to avoid any mistakes than could originate from unverified results.

References

Anderson, S.E., Dave, A.S., Margoliash, D.: Template-based automatic recognition of birdsong syllables from continuous recordings. J. Acoust. Soc. Am. 100(2), 1209–1219 (1996)
Article Google Scholar
Bay, M., Ehmann, A.F., Downie, J.S.: Evaluation of multiple-F0 estimation and tracking systems. In: ISMIR (2009)
Google Scholar
Chu, W., Blumstein, D.T.: Noise robust bird song detection using syllable pattern-based hidden Markov models. In: 2011 IEEE ICASSP, pp. 345–348 (2011)
Google Scholar
Cohen, Y., Nicholson, D., Gardner, T.J.: TweetyNet: a neural network that enables high-throughput, automated annotation of birdsong. bioRxiv p. 2020.08.28.272088 (2020)
Google Scholar
Gallicchio, C., Micheli, A., Pedrelli, L.: Fast spectral radius initialization for recurrent neural networks. In: Oneto, L., Navarin, N., Sperduti, A., Anguita, D. (eds.) INNSBDDL 2019. PINNS, vol. 1, pp. 380–390. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-16841-4_39
Chapter Google Scholar
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. In: ICANN 1999, vol. 2, pp. 850–855 (1999)
Google Scholar
Giraudon, J., Trouvain, N., Cazala, A., Del Negro, C., Hinaut, X.: Labeled songs of domestic canary M1–2016-spring (Serinus canaria), May 2021
Google Scholar
Jaeger, H.: The “echo state” approach to analysing and training recurrent neural networks-with an erratum note’. German National Research Center for Information Technology GMD Technical Report 148 (2001)
Google Scholar
Juang, B.H., Rabiner, L., Wilpon, J.: On the use of bandpass liftering in speech recognition. IEEE Trans. Acoust. Speech Signal Process. 35(7), 947–954 (1987)
Article Google Scholar
Kaewtip, K., Alwan, A., O’Reilly, C., Taylor, C.E.: A robust automatic birdsong phrase classification: a template-based approach. J. Acoust. Soc. Am. 140(5), 3691–3701 (2016)
Article Google Scholar
Koumura, T., Okanoya, K.: Automatic recognition of element classes and boundaries in the birdsong with variable sequences. PLOS ONE 11(7), e0159188 (2016)
Google Scholar
Leitner, S., Catchpole, C.K.: Syllable repertoire and the size of the song control system in captive canaries (Serinus canaria). J. Neurobiol. 60(1), 21–27 (2004)
Article Google Scholar
Markowitz, J.E., Ivie, E., Kligler, L., Gardner, T.J.: Long-range order in canary song. PLOS Comput. Biol. 9(5), e1003052 (2013)
Google Scholar
McFee, B., et al.: Librosa/librosa: 0.8.0. Zenodo (2020)
Google Scholar
Sainburg, T., Thielk, M., Gentner, T.Q.: Latent space visualization, characterization, and generation of diverse vocal communication signals. bioRxiv, p. 870311 (2020)
Google Scholar
Tachibana, R.O., Oosugi, N., Okanoya, K.: Semi-automatic classification of birdsong elements using a linear support vector machine. PLOS ONE 9(3), e92584 (2014)
Google Scholar
Tan, L.N., Alwan, A., Kossan, G., Cody, M.L., Taylor, C.E.: Dynamic time warping and sparse representation classification for birdsong phrase classification using limited training data. J. Acoust. Soc. Am. 137(3), 1069–1080 (2015)
Article Google Scholar
Trouvain, N., Pedrelli, L., Dinh, T.T., Hinaut, X.: ReservoirPy: an efficient and user-friendly library to design echo state networks. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12397, pp. 494–505. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61616-8_40
Chapter Google Scholar
Waser, M.S., Marler, P.: Song learning in canaries. J. Comp. Physiol. Psychol. 91(1), 1–7 (1977)
Article Google Scholar

Download references

Acknowledgment

We would like to thank Catherine Del Negro, Aurore Cazala and Juliette Giraudon for the recording and transcription of the canary data. We also thank Inria for the ADT grant Scikit-ESN.

Author information

Authors and Affiliations

INRIA Bordeaux Sud-Ouest, Talence, France
Nathan Trouvain & Xavier Hinaut
LaBRI, Bordeaux INP, CNRS, UMR 5800, Bordeaux, France
Nathan Trouvain & Xavier Hinaut
Institut des Maladies Neurodégénératives, Université de Bordeaux, CNRS, UMR 5293, Bordeaux, France
Nathan Trouvain & Xavier Hinaut

Authors

Nathan Trouvain
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Hinaut
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xavier Hinaut .

Editor information

Editors and Affiliations

Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
iMotions A/S, Copenhagen, Denmark
Paolo Masulli
University of Tübingen, Tübingen, Baden-Württemberg, Germany
Sebastian Otte
Universität Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Trouvain, N., Hinaut, X. (2021). Canary Song Decoder: Transduction and Implicit Segmentation with ESNs and LTSMs. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12895. Springer, Cham. https://doi.org/10.1007/978-3-030-86383-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-86383-8_6
Published: 07 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86382-1
Online ISBN: 978-3-030-86383-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics