Skip to main content

Canary Song Decoder: Transduction and Implicit Segmentation with ESNs and LTSMs

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12895))

Abstract

Domestic canaries produce complex vocal patterns embedded in various levels of abstraction. Studying such temporal organization is of particular relevance to understand how animal brains represent and process vocal inputs such as language. However, this requires a large amount of annotated data. We propose a fast and easy-to-train transducer model based on RNN architectures to automate parts of the annotation process. This is similar to a speech recognition task. We demonstrate that RNN architectures can be efficiently applied on spectral features (MFCC) to annotate songs at time frame level and at phrase level. We achieved around 95% accuracy at frame level on particularly complex canary songs, and ESNs achieved around \(5\%\) of word error rate (WER) at phrase level. Moreover, we are able to build this model using only around 13 to 20 min of annotated songs. Training time takes only 35 s using 2 h and 40 min of data for the ESN, allowing to quickly run experiments without the need of powerful hardware.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    frame stride and window width are respectively called hop_length and win_length in librosa.

  2. 2.

    As the work in [4] seems to not having been reviewed yet, we will not make further comparisons to avoid any mistakes than could originate from unverified results.

References

  1. Anderson, S.E., Dave, A.S., Margoliash, D.: Template-based automatic recognition of birdsong syllables from continuous recordings. J. Acoust. Soc. Am. 100(2), 1209–1219 (1996)

    Article  Google Scholar 

  2. Bay, M., Ehmann, A.F., Downie, J.S.: Evaluation of multiple-F0 estimation and tracking systems. In: ISMIR (2009)

    Google Scholar 

  3. Chu, W., Blumstein, D.T.: Noise robust bird song detection using syllable pattern-based hidden Markov models. In: 2011 IEEE ICASSP, pp. 345–348 (2011)

    Google Scholar 

  4. Cohen, Y., Nicholson, D., Gardner, T.J.: TweetyNet: a neural network that enables high-throughput, automated annotation of birdsong. bioRxiv p. 2020.08.28.272088 (2020)

    Google Scholar 

  5. Gallicchio, C., Micheli, A., Pedrelli, L.: Fast spectral radius initialization for recurrent neural networks. In: Oneto, L., Navarin, N., Sperduti, A., Anguita, D. (eds.) INNSBDDL 2019. PINNS, vol. 1, pp. 380–390. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-16841-4_39

    Chapter  Google Scholar 

  6. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. In: ICANN 1999, vol. 2, pp. 850–855 (1999)

    Google Scholar 

  7. Giraudon, J., Trouvain, N., Cazala, A., Del Negro, C., Hinaut, X.: Labeled songs of domestic canary M1–2016-spring (Serinus canaria), May 2021

    Google Scholar 

  8. Jaeger, H.: The “echo state” approach to analysing and training recurrent neural networks-with an erratum note’. German National Research Center for Information Technology GMD Technical Report 148 (2001)

    Google Scholar 

  9. Juang, B.H., Rabiner, L., Wilpon, J.: On the use of bandpass liftering in speech recognition. IEEE Trans. Acoust. Speech Signal Process. 35(7), 947–954 (1987)

    Article  Google Scholar 

  10. Kaewtip, K., Alwan, A., O’Reilly, C., Taylor, C.E.: A robust automatic birdsong phrase classification: a template-based approach. J. Acoust. Soc. Am. 140(5), 3691–3701 (2016)

    Article  Google Scholar 

  11. Koumura, T., Okanoya, K.: Automatic recognition of element classes and boundaries in the birdsong with variable sequences. PLOS ONE 11(7), e0159188 (2016)

    Google Scholar 

  12. Leitner, S., Catchpole, C.K.: Syllable repertoire and the size of the song control system in captive canaries (Serinus canaria). J. Neurobiol. 60(1), 21–27 (2004)

    Article  Google Scholar 

  13. Markowitz, J.E., Ivie, E., Kligler, L., Gardner, T.J.: Long-range order in canary song. PLOS Comput. Biol. 9(5), e1003052 (2013)

    Google Scholar 

  14. McFee, B., et al.: Librosa/librosa: 0.8.0. Zenodo (2020)

    Google Scholar 

  15. Sainburg, T., Thielk, M., Gentner, T.Q.: Latent space visualization, characterization, and generation of diverse vocal communication signals. bioRxiv, p. 870311 (2020)

    Google Scholar 

  16. Tachibana, R.O., Oosugi, N., Okanoya, K.: Semi-automatic classification of birdsong elements using a linear support vector machine. PLOS ONE 9(3), e92584 (2014)

    Google Scholar 

  17. Tan, L.N., Alwan, A., Kossan, G., Cody, M.L., Taylor, C.E.: Dynamic time warping and sparse representation classification for birdsong phrase classification using limited training data. J. Acoust. Soc. Am. 137(3), 1069–1080 (2015)

    Article  Google Scholar 

  18. Trouvain, N., Pedrelli, L., Dinh, T.T., Hinaut, X.: ReservoirPy: an efficient and user-friendly library to design echo state networks. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12397, pp. 494–505. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61616-8_40

    Chapter  Google Scholar 

  19. Waser, M.S., Marler, P.: Song learning in canaries. J. Comp. Physiol. Psychol. 91(1), 1–7 (1977)

    Article  Google Scholar 

Download references

Acknowledgment

We would like to thank Catherine Del Negro, Aurore Cazala and Juliette Giraudon for the recording and transcription of the canary data. We also thank Inria for the ADT grant Scikit-ESN.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xavier Hinaut .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Trouvain, N., Hinaut, X. (2021). Canary Song Decoder: Transduction and Implicit Segmentation with ESNs and LTSMs. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12895. Springer, Cham. https://doi.org/10.1007/978-3-030-86383-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86383-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86382-1

  • Online ISBN: 978-3-030-86383-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics