Image Annotation Using Metric Learning in Semantic Neighbourhoods

Verma, Yashaswi; Jawahar, C. V.

doi:10.1007/978-3-642-33712-3_60

Image Annotation Using Metric Learning in Semantic Neighbourhoods

Yashaswi Verma²¹ &
C. V. Jawahar²¹

Conference paper

9704 Accesses
82 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7574))

Abstract

Automatic image annotation aims at predicting a set of textual labels for an image that describe its semantics. These are usually taken from an annotation vocabulary of few hundred labels. Because of the large vocabulary, there is a high variance in the number of images corresponding to different labels (“class-imbalance”). Additionally, due to the limitations of manual annotation, a significant number of available images are not annotated with all the relevant labels (“weak-labelling”). These two issues badly affect the performance of most of the existing image annotation models. In this work, we propose 2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, that addresses these two issues in the image annotation task. The first step of 2PKNN uses “image-to-label” similarities, while the second step uses “image-to-image” similarities; thus combining the benefits of both. Since the performance of nearest-neighbour based methods greatly depends on how features are compared, we also propose a metric learning framework over 2PKNN that learns weights for multiple features as well as distances together. This is done in a large margin set-up by generalizing a well-known (single-label) classification metric learning algorithm for multi-label prediction. For scalability, we implement it by alternating between stochastic sub-gradient descent and projection steps.

Extensive experiments demonstrate that, though conceptually simple, 2PKNN alone performs comparable to the current state-of-the-art on three challenging image annotation datasets, and shows significant improvements after metric learning.

Download to read the full chapter text

Chapter PDF

References

Hastie, T., Tibshirani, R.: Discriminant adaptive nearest neighbor classification. PAMI 18(6), 607–616 (1996)
Article Google Scholar
Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In: First International Workshop on Multimedia Intelligent Storage and Retrieval Management (1999)
Google Scholar
Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)
Chapter Google Scholar
Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: NIPS (2003)
Google Scholar
Feng, S.L., Manmatha, R., Lavrenko, V.: Multiple Bernoulli relevance models for image and video annotation. In: CVPR, pp. 1002–1009 (2004)
Google Scholar
von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: ACM SIGCHI (2004)
Google Scholar
Zhang, H., Berg, A.C., Maire, M., Malik, J.: SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In: CVPR (2006)
Google Scholar
Grubinger, M.: Analysis and Evaluation of Visual Information Systems Performance. PhD thesis, Victoria University, Melbourne, Australia (2007)
Google Scholar
Shwartz, S.S., Singer, Y., Srebro, N.: Pegasos: Primal estimated sub-gradient solver for SVM. In: ICML (2007)
Google Scholar
Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. PAMI 29(3), 394–410 (2007)
Article Google Scholar
Makadia, A., Pavlovic, V., Kumar, S.: A New Baseline for Image Annotation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 316–329. Springer, Heidelberg (2008)
Chapter Google Scholar
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. JMLR 10(2), 207–244 (2009)
MATH Google Scholar
Xiang, Y., Zhou, X., Chua, T.-S., Ngo, C.W.: A Revisit of generative models for automatic image annotation using markov random fields. In: CVPR (2009)
Google Scholar
Jin, R., Wang, S., Zhou, Z.H.: Learning a distance metric from multi-instance multi-label data. In: CVPR, pp. 896–902 (2009)
Google Scholar
Wang, H., Huang, H., Ding, C.: Image annotation using multi-label correlated Green’s function. In: ICCV (2009)
Google Scholar
Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: Tagprop: Discriminative metric learning in nearest neighbour models for image auto-annotation. In: ICCV (2009)
Google Scholar
Dembczyński, K., Cheng, W., Hüllermeier, E.: Bayes optimal multilabel classification via probabilistic classifier chains. In: ICML (2010)
Google Scholar
Parameswaran, S., Weinberger, K.Q.: Large margin multi-task metric learning. In: NIPS (2010)
Google Scholar
Zhang, S., Huang, J., Huang, Y., Yu, Y., Li, H., Metaxas, D.N.: Automatic image annotation using group sparsity. In: CVPR (2010)
Google Scholar
Wang, H., Huang, H., Ding, C.: Image annotation using bi-relational graph of images and semantic labels. In: CVPR (2011)
Google Scholar
Bucak, S.S., Jin, R., Jain, A.K.: Multi-label learning with incomplete class assignments. In: CVPR, pp. 2801–2808 (2011)
Google Scholar
Nakayama, H.: Linear distance metric Learning for large-scale generic image recognition. PhD thesis, The University of Tokyo, Japan (2011)
Google Scholar
Gupta, A., Verma, Y., Jawahar, C.V.: Choosing linguistics over vision to describe images. In: AAAI (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

International Institute of Information Technology, Hyderabad, India, 500032
Yashaswi Verma & C. V. Jawahar

Authors

Yashaswi Verma
View author publications
You can also search for this author in PubMed Google Scholar
C. V. Jawahar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Ltd., CB3 0FB, Cambridge, UK
Andrew Fitzgibbon
Dept. of Computer Science, University of North Carolina, 27599, Chapel Hill, NC, USA
Svetlana Lazebnik
California Institute of Technology, 91125, Pasadena, CA, USA
Pietro Perona
Institute of Industrial Science, The University of Tokyo, 153-8505, Tokyo, Japan
Yoichi Sato
INRIA, 38330, Montbonnot, France
Cordelia Schmid

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Verma, Y., Jawahar, C.V. (2012). Image Annotation Using Metric Learning in Semantic Neighbourhoods. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33712-3_60

Download citation

DOI: https://doi.org/10.1007/978-3-642-33712-3_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33711-6
Online ISBN: 978-3-642-33712-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics