Skip to main content
Log in

A two-stage hybrid probabilistic topic model for refining image annotation

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Refining image annotation has become one of the core research topics in computer vision and pattern recognition due to its great potentials in image retrieval. However, it is still in its infancy and is not sophisticated enough to extract perfect semantic concepts just according to the image low-level features. In this paper, we propose a two-stage hybrid probabilistic topic model to improve the quality of automatic image annotation. To start with, a probabilistic latent semantic analysis model with asymmetric modalities is learned to estimate the posterior probabilities of each annotation keyword, during which the image-to-word relation can be well established. Next, a label similarity graph is constructed by a weighted linear combination of label similarity and visual similarity of images associated with the corresponding labels. By this way, the information from image low-level visual features and high-level semantic concepts can be seamlessly integrated by fully taking into account the word-to-word and image-to-image relations. Finally, the rank-two relaxation heuristics is exploited to further mine the correlation of the candidate annotations so as to capture the refining results, which plays a critical role in semantic based image retrieval. Extensive experiments show that the proposed model achieves not only superior annotation accuracy but also better retrieval performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Here label means the initial annotation generated by the PLSA.

  2. http://vision.sista.arizona.edu/kobus/research/data/eccv_2002/index.html

  3. http://appsrv.cse.cuhk.edu.hk/~jkzhu/felib.html

  4. Downloaded from http://press.liacs.nl/mirflickr/dlform.php

References

  1. Bhagat P, Choudhary P (2018) Image annotation: then and now. Image Vis Comput 80:1–23

    Article  Google Scholar 

  2. Binder A, Samek W, Müller K et al (2013) Enhanced representation and multi-task learning for image annotation. Comput Vis Image Underst 117(5):466–478

    Article  Google Scholar 

  3. Blei D, Lafferty J (2007) Correlated topic models. Ann Appl Stat 1(1):17–35

    Article  MathSciNet  Google Scholar 

  4. Blei D (2012) Probabilistic topic models. Commun ACM 55(4):77–84

    Article  Google Scholar 

  5. Bosch A, Zisserman A, Munoz X (2006) Scene classification via PLSA. Proc 9th Eur Conf Comput Vis (ECCV’06) 3954:517–530

    Google Scholar 

  6. Burer S, Monteiro R, Zhang Y (2002) Rank-two relaxation heuristics for max-cut and other binary quadratic programs. SIAM J Optim 12(2):503–521

    Article  MathSciNet  Google Scholar 

  7. Carneiro G, Chan A, Moreno P et al (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410

    Article  Google Scholar 

  8. Chen Z, Fu H, Chi Z et al (2012) An adaptive recognition model for image annotation. IEEE Trans Syst Man Cybern Part C 42(6):1120–1127

    Article  Google Scholar 

  9. Cheng G, Guo L, Zhao T et al (2013) Automatic landslide detection from remote-sensing imagery using a scene classification method based on BoVW and PLSA. Int J Remote Sens 34(1):45–59

    Article  Google Scholar 

  10. Cilibrasi R, Paul M (2007) The google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383

    Article  Google Scholar 

  11. Duygulu P, Barnard K, Freitas N et al (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. Proc 7th Eur Conf Comput Vis (ECCV’02) 2353:97–112

    MATH  Google Scholar 

  12. Ergul E, Arica N (2010) Scene classification using spatial pyramid of latent topics. In: Proceedings of the 20th international conference on pattern recognition (ICPR’10), pp 3603–3606

  13. Farahat A, Chen F (2006) Improving probabilistic latent semantic analysis with principal component analysis. In: Proceedings of the 11th conference of the european chapter of the association for computational linguistics (EACL’06), pp 105–112

  14. Fathian M, Tab F, Moradi K et al (2018) A learning automata framework based on relevance feedback for content-based image retrieval. Int J Mach Learn Cybern 9(9):1457–1472

    Article  Google Scholar 

  15. Fellbaum C (2010) WordNet. Theory Appl Ontol Comput Appl 2010:231–243

    Google Scholar 

  16. Feng Z, Jin R, Jain A (2013) Large-scale image annotation by efficient and robust kernel metric learning. In: Proceedings of the 16th international conference on computer vision (ICCV’13), pp 1609–1616

  17. Feng S, Manmatha R, Lavrenko V (2004) Multiple Bernoulli relevance models for image and video annotation. In: Proceedings of the computer vision and pattern recognition (CVPR’04), pp 1002–1009

  18. Foumani S, Nickabadi A (2019) A probabilistic topic model using deep visual word representation for simultaneous image classification and annotation. J Vis Commun Image Represent 59:195–203

    Article  Google Scholar 

  19. Guillaumin M, Mensink T, Verbeek J et al (2009) Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: Proceedings of the 12th international conference on computer vision (ICCV’09), pp 309–316

  20. Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196

    Article  Google Scholar 

  21. Hou Y (2015) Image annotation incorporating low-rankness, tag and visual correlation and inhomogeneous errors. In: Proceedings of the 11th international symposium on visual computing (ISVC’15), pp 71–81

    Chapter  Google Scholar 

  22. Huiskes M, Lew M (2008) The MIR flickr retrieval evaluation. In: Proceedings of the 1st international conference on multimedia information retrieval (MIR’08), pp 39–43

  23. Jeon L, Lavrenko V, Manmantha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th international ACM SIGIR conference on research and development in information retrieval (SIGIR’03), pp 119–126

  24. Jin Y, Jin K, Khan L et al (2008) The randomized approximating graph algorithm for image annotation refinement problem. In: Proceedings of the computer vision and pattern recognition workshop (CVPRW’08), pp 1–8

  25. Jin Y, Khan L, Prabhakaran B (2010) Knowledge based image annotation refinement. J Signal Process Syst 58(3):387–406

    Article  Google Scholar 

  26. Jin Y, Khan L, Wang L et al (2005) Image annotations by combining multiple evidence and wordnet. In: Proceedings of the 13th international conference on multimedia (MM’05), pp 706–715

  27. Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. In: Advances in Neural Information Processing Systems 16 (NIPS’03), pp 553–560

  28. Lee S, Neve W, Plataniotis K et al (2010) MAP-based image tag recommendation using a visual folksonomy. Pattern Recognit Lett 31(9):976–982

    Article  Google Scholar 

  29. Lee S, Neve W, Yong M (2010) Tag refinement in an image folksonomy using visual similarity and tag co-occurrence statistics. Signal Process Image Commun 25(10):761–773

    Article  Google Scholar 

  30. Li P, Cheng J, Li Z et al (2011) Correlated PLSA for image clustering. In: Proceedings of the 17th international conference on multimedia modeling (MMM’11), pp 307–316

    Google Scholar 

  31. Li N, Luo W, Yang K et al (2018) Self-organizing weighted incremental probabilistic latent semantic analysis. Int J Mach Learn Cybern 9(12):1987–1998

    Article  Google Scholar 

  32. Li Z, Shi Z, Liu X et al (2010) Fusing semantic aspects for image annotation and retrieval. J Vis Commun Image Represent 21(8):798–805

    Article  Google Scholar 

  33. Li Z, Shi Z, Liu X et al (2011) Modeling continuous visual features for semantic image annotation and retrieval. Pattern Recognit Lett 32:516–523

    Article  Google Scholar 

  34. Li X, Snoek C, Worring M (2009) Learning social tag relevance by neighbor voting. IEEE Trans Multimed 11(7):1310–1322

    Article  Google Scholar 

  35. Liu D, Hua X, Yang L et al (2009) Tag ranking. In: Proceedings of the 18th international conference on world wide web (WWW’09), pp 351–360

  36. Liu J, Li M, Liu Q et al (2009) Image annotation via graph learning. Pattern Recognit 42(2):218–228

    Article  MathSciNet  Google Scholar 

  37. Liu Z, Ma J (2011) Refining image annotation by graph partition and image search engine. J Comput Res Development 48(7):1246–1254

    Google Scholar 

  38. Liu J, Wang B, Li M et al (2007) Dual cross-media relevance model for image annotation. In: Proceedings of the 15th international conference on multimedia (MM’07), pp 605–614

  39. Liu Y, Xu D, Feng S et al (2010) A novel visual words definition algorithm of image patch based on contextual semantic information. Acta Electron Sin 38(5):1156–1161

    Google Scholar 

  40. Liu Z, Zhang C, Chen C (2018) MMDF-LDA: an improved multi-modal latent dirichlet allocation model for social image annotation. Expert Syst Appl 104:168–184

    Article  Google Scholar 

  41. Lu Z, Peng Y, Horace H (2010) Image categorization via robust PLSA. Pattern Recognit Lett 31(1):36–43

    Article  Google Scholar 

  42. Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Proceedings of the European Conference on Computer Vision (ECCV’08), pp 316–329

    Google Scholar 

  43. Monay F, Gatica-Perez D (2003) On image auto-annotation with latent space models. In: Proceedings of the 11th international conference on multimedia (MM’03), pp 275–278

  44. Monay F, Gatica-Perez D (2004) PLSA-based image auto-annotation: constraining the latent space. In: Proceedings of the 12th international conference on multimedia (MM’04), pp 348–351

  45. Monay F, Gatica-Perez D (2007) Modeling semantic aspects for cross-media image indexing. IEEE Trans Pattern Anal Mach Intell 29(10):1802–1817

    Article  Google Scholar 

  46. Nikolopoulos S, Zafeiriou S, Patras I et al (2013) High order PLSA for indexing tagged images. Signal Process 93(8):2212–2228

    Article  Google Scholar 

  47. Romberg S, Lienhart R, Horster E (2012) Multimodal image retrieval: fusing modalities with multilayer multimodal PLSA. Int J Multimed Inf Retrieval 1(1):31–44

    Article  Google Scholar 

  48. Rui X, Li M, Li Z et al (2007) Bipartite graph reinforcement model for web image annotation. In: Proceedings of the 15th international conference on multimedia (MM’07), pp 585–594

  49. Sun L, Ge H, Yoshida S et al (2014) Support vector description of clusters for content-based image annotation. Pattern Recognit 47(3):1361–1374

    Article  Google Scholar 

  50. Tian D, Zhao X, Shi Z (2014) An efficient refining image annotation technique by combining probabilistic latent semantic analysis and random walk model. Intell Autom Soft Comput 20(3):335–345

    Article  Google Scholar 

  51. Tian D (2015) Exploiting PLSA model and conditional random field for refining image annotation. High Technol Lett 21(1):78–84

    Google Scholar 

  52. Tian D, Zhang W, Zhao X et al (2013) Employing PLSA model and max-bisection for refining image annotation. In: Proceedings of the 20th international conference on image processing (ICIP’13), pp 3996–4000

  53. Tian D (2018) Research on PLSA model based semantic image analysis: a systematic review. J Inf Hiding Multimed Signal Process 9(5):1099–1113

    Google Scholar 

  54. Wang C, Jing F, Zhang L et al (2006) Image annotation refinement using random walk with restarts. In: Proceedings of the 14th international conference on multimedia (MM’06), pp 647–650

  55. Wang C, Jing F, Zhang L et al (2007) Content-based image annotation refinement. In: Proceedings of the computer vision and pattern recognition (CVPR’07), pp 1–8

  56. Wang Z, Yi H, Wang J et al (2009) Hierarchical Gaussian mixture model for image annotation via PLSA. In: Proceedings of the 5th international conference on image and graphics (ICIG’09), pp 384–389

  57. Wang J, Zhou J, Xu H et al (2014) Image tag refinement by regularized latent Dirichlet allocation. Comput Vis Image Underst 124(7):61–70

    Article  Google Scholar 

  58. Xu H, Wang J, Hua X et al (2009) Tag refinement by regularized LDA. In: Proceedings of the 17th international conference on multimedia (MM’09), pp 573–576

  59. Zheng Y, Takiguchi T, Ariki Y (2011) Image annotation with concept level feature using PLSA + CCA. In: Proceedings of the 17th international conference on multimedia modeling (MMM’11), pp 454–464

    Google Scholar 

  60. Zhou N, Cheung W, Qiu G et al (2011) A hybrid probabilistic model for unified collaborative and content based image tagging. IEEE Trans Pattern Anal Mach Intell 33(7):1281–1294

    Article  Google Scholar 

  61. Zhu J, Hoi S, Lyu M et al (2008) Near-duplicate keyframe retrieval by nonrigid image matching. In: Proceedings of the 16th international conference on multimedia (MM’08), pp 41–50

  62. Zhu G, Yan S, Ma Y (2010) Image tag refinement towards low-rank, content-tag prior and error sparsity. In: Proceedings of the 18th international conference on multimedia (MM’10), pp 461–470

Download references

Acknowledgements

The authors would like to sincerely thank the editor and anonymous reviewers for their valuable comments and insightful suggestions that have helped us to improve the paper. Also, the authors thank Prof. Xiaofei Zhao for stimulating discussions and helpful hints. In addition, this work is fully supported by the National Program on Key Basic Research Project (973 Program) (No. 2013CB329502), National Natural Science Foundation of China (No. 61035003, No. 61202212), Tianchenghuizhi Fund for Innovation and Promotion of Education (No. 2018A03036) and Key R&D Program of the Shaanxi Province of China (No. 2018GY-037).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongping Tian.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, D., Shi, Z. A two-stage hybrid probabilistic topic model for refining image annotation. Int. J. Mach. Learn. & Cyber. 11, 417–431 (2020). https://doi.org/10.1007/s13042-019-00983-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-019-00983-w

Keywords

Navigation