Object Category Detection Using Audio-Visual Cues

Luo, Jie; Caputo, Barbara; Zweig, Alon; Bach, Jörg-Hendrik; Anemüller, Jörn

doi:10.1007/978-3-540-79547-6_52

Jie Luo^1,2,
Barbara Caputo^1,2,
Alon Zweig³,
Jörg-Hendrik Bach⁴ &
…
Jörn Anemüller⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5008))

Included in the following conference series:

International Conference on Computer Vision Systems

2636 Accesses
1 Citations
3 Altmetric

Abstract

Categorization is one of the fundamental building blocks of cognitive systems. Object categorization has traditionally been addressed in the vision domain, even though cognitive agents are intrinsically multimodal. Indeed, biological systems combine several modalities in order to achieve robust categorization. In this paper we propose a multimodal approach to object category detection, using audio and visual information. The auditory channel is modeled on biologically motivated spectral features via a discriminative classifier. The visual channel is modeled by a state of the art part based model. Multimodality is achieved using two fusion schemes, one high level and the other low level. Experiments on six different object categories, under increasingly difficult conditions, show strengths and weaknesses of the two approaches, and clearly underline the open challenges for multimodal category detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pfeifer, R., Bongard, J.: How the body shapes the way we think. MIT Press, Cambridge (2006)
Google Scholar
Fergus, R., Perona, P., Zisserman, A.: Weakly supervised scale-invariant learning of models for visual recognition. Int. J. Comput. Vision 71(3), 273–303 (2006)
Article Google Scholar
Bar-Hillel, A., Weinshall, D.: Efficient learning of relational object class models. Int. J. Comput. Vision (in press, 2007)
Google Scholar
Sanderson, C., Paliwal, K.K.: Identity verification using speech and face information. Digital Signal Processing 14(5), 449–480 (2004)
Article Google Scholar
Burr, D., Alais, D.: Combining visual and auditory information. Progress in Brain Research 155, 243–258 (2006)
Article Google Scholar
Schmidt, D., Anemüller, J.: Acoustic feature selection for speech detection based on amplitude modulation spectrograms. In: 33rd German Annual Conference on Acoustics (2007)
Google Scholar
Nilsback, M.E., Caputo, B.: Cue integration through discriminative accumulation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 578–585 (2004)
Google Scholar
Kadir, T., Brady, M.: Saliency, scale and image description. Int. J. Comput. Vision 45(2), 83–105 (2001)
Article MATH Google Scholar
Kollmeier, B., Koch, R.: Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. J. Acoust. Soc. Am. 95(3), 1593–1602 (1994)
Article Google Scholar
Polikar, R.: Ensemble based systems in decision making. IEEE Circuits and Systems Mag. 6(3), 21–45 (2006)
Article Google Scholar
Zweig, A., Weinshall, D.: Exploiting object hierarchy: Combining models from different category levels. In: IEEE 11th International Conference on Computer Vision, pp. 1–8 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

IDIAP Research Institute, Centre du Parc, 1920, Martigny, Switzerland
Jie Luo & Barbara Caputo
Swiss Federal Institute of Technology in Lausanne(EPFL), 1015, Lausanne, Switzerland
Jie Luo & Barbara Caputo
Hebrew university of Jerusalem, 91904, Jerusalem, Israel
Alon Zweig
Carl von Ossietzky University Oldenburg, 26111, Oldenburg, Germany
Jörg-Hendrik Bach & Jörn Anemüller

Authors

Jie Luo
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Caputo
View author publications
You can also search for this author in PubMed Google Scholar
Alon Zweig
View author publications
You can also search for this author in PubMed Google Scholar
Jörg-Hendrik Bach
View author publications
You can also search for this author in PubMed Google Scholar
Jörn Anemüller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Antonios Gasteratos Markus Vincze John K. Tsotsos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, J., Caputo, B., Zweig, A., Bach, JH., Anemüller, J. (2008). Object Category Detection Using Audio-Visual Cues. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds) Computer Vision Systems. ICVS 2008. Lecture Notes in Computer Science, vol 5008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79547-6_52

Download citation

DOI: https://doi.org/10.1007/978-3-540-79547-6_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-79546-9
Online ISBN: 978-3-540-79547-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics