Skip to main content

Object Category Detection Using Audio-Visual Cues

  • Conference paper
Computer Vision Systems (ICVS 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5008))

Included in the following conference series:

Abstract

Categorization is one of the fundamental building blocks of cognitive systems. Object categorization has traditionally been addressed in the vision domain, even though cognitive agents are intrinsically multimodal. Indeed, biological systems combine several modalities in order to achieve robust categorization. In this paper we propose a multimodal approach to object category detection, using audio and visual information. The auditory channel is modeled on biologically motivated spectral features via a discriminative classifier. The visual channel is modeled by a state of the art part based model. Multimodality is achieved using two fusion schemes, one high level and the other low level. Experiments on six different object categories, under increasingly difficult conditions, show strengths and weaknesses of the two approaches, and clearly underline the open challenges for multimodal category detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pfeifer, R., Bongard, J.: How the body shapes the way we think. MIT Press, Cambridge (2006)

    Google Scholar 

  2. Fergus, R., Perona, P., Zisserman, A.: Weakly supervised scale-invariant learning of models for visual recognition. Int. J. Comput. Vision 71(3), 273–303 (2006)

    Article  Google Scholar 

  3. Bar-Hillel, A., Weinshall, D.: Efficient learning of relational object class models. Int. J. Comput. Vision (in press, 2007)

    Google Scholar 

  4. Sanderson, C., Paliwal, K.K.: Identity verification using speech and face information. Digital Signal Processing 14(5), 449–480 (2004)

    Article  Google Scholar 

  5. Burr, D., Alais, D.: Combining visual and auditory information. Progress in Brain Research 155, 243–258 (2006)

    Article  Google Scholar 

  6. Schmidt, D., Anemüller, J.: Acoustic feature selection for speech detection based on amplitude modulation spectrograms. In: 33rd German Annual Conference on Acoustics (2007)

    Google Scholar 

  7. Nilsback, M.E., Caputo, B.: Cue integration through discriminative accumulation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 578–585 (2004)

    Google Scholar 

  8. Kadir, T., Brady, M.: Saliency, scale and image description. Int. J. Comput. Vision 45(2), 83–105 (2001)

    Article  MATH  Google Scholar 

  9. Kollmeier, B., Koch, R.: Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. J. Acoust. Soc. Am. 95(3), 1593–1602 (1994)

    Article  Google Scholar 

  10. Polikar, R.: Ensemble based systems in decision making. IEEE Circuits and Systems Mag. 6(3), 21–45 (2006)

    Article  Google Scholar 

  11. Zweig, A., Weinshall, D.: Exploiting object hierarchy: Combining models from different category levels. In: IEEE 11th International Conference on Computer Vision, pp. 1–8 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Antonios Gasteratos Markus Vincze John K. Tsotsos

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Luo, J., Caputo, B., Zweig, A., Bach, JH., Anemüller, J. (2008). Object Category Detection Using Audio-Visual Cues. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds) Computer Vision Systems. ICVS 2008. Lecture Notes in Computer Science, vol 5008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79547-6_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-79547-6_52

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-79546-9

  • Online ISBN: 978-3-540-79547-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics