Abstract
Refining image annotation has become one of the core research topics in computer vision and pattern recognition due to its great potentials in image retrieval. However, it is still in its infancy and is not sophisticated enough to extract perfect semantic concepts just according to the image low-level features. In this paper, we propose a two-stage hybrid probabilistic topic model to improve the quality of automatic image annotation. To start with, a probabilistic latent semantic analysis model with asymmetric modalities is learned to estimate the posterior probabilities of each annotation keyword, during which the image-to-word relation can be well established. Next, a label similarity graph is constructed by a weighted linear combination of label similarity and visual similarity of images associated with the corresponding labels. By this way, the information from image low-level visual features and high-level semantic concepts can be seamlessly integrated by fully taking into account the word-to-word and image-to-image relations. Finally, the rank-two relaxation heuristics is exploited to further mine the correlation of the candidate annotations so as to capture the refining results, which plays a critical role in semantic based image retrieval. Extensive experiments show that the proposed model achieves not only superior annotation accuracy but also better retrieval performance.
Similar content being viewed by others
Notes
Here label means the initial annotation generated by the PLSA.
Downloaded from http://press.liacs.nl/mirflickr/dlform.php
References
Bhagat P, Choudhary P (2018) Image annotation: then and now. Image Vis Comput 80:1–23
Binder A, Samek W, Müller K et al (2013) Enhanced representation and multi-task learning for image annotation. Comput Vis Image Underst 117(5):466–478
Blei D, Lafferty J (2007) Correlated topic models. Ann Appl Stat 1(1):17–35
Blei D (2012) Probabilistic topic models. Commun ACM 55(4):77–84
Bosch A, Zisserman A, Munoz X (2006) Scene classification via PLSA. Proc 9th Eur Conf Comput Vis (ECCV’06) 3954:517–530
Burer S, Monteiro R, Zhang Y (2002) Rank-two relaxation heuristics for max-cut and other binary quadratic programs. SIAM J Optim 12(2):503–521
Carneiro G, Chan A, Moreno P et al (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410
Chen Z, Fu H, Chi Z et al (2012) An adaptive recognition model for image annotation. IEEE Trans Syst Man Cybern Part C 42(6):1120–1127
Cheng G, Guo L, Zhao T et al (2013) Automatic landslide detection from remote-sensing imagery using a scene classification method based on BoVW and PLSA. Int J Remote Sens 34(1):45–59
Cilibrasi R, Paul M (2007) The google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383
Duygulu P, Barnard K, Freitas N et al (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. Proc 7th Eur Conf Comput Vis (ECCV’02) 2353:97–112
Ergul E, Arica N (2010) Scene classification using spatial pyramid of latent topics. In: Proceedings of the 20th international conference on pattern recognition (ICPR’10), pp 3603–3606
Farahat A, Chen F (2006) Improving probabilistic latent semantic analysis with principal component analysis. In: Proceedings of the 11th conference of the european chapter of the association for computational linguistics (EACL’06), pp 105–112
Fathian M, Tab F, Moradi K et al (2018) A learning automata framework based on relevance feedback for content-based image retrieval. Int J Mach Learn Cybern 9(9):1457–1472
Fellbaum C (2010) WordNet. Theory Appl Ontol Comput Appl 2010:231–243
Feng Z, Jin R, Jain A (2013) Large-scale image annotation by efficient and robust kernel metric learning. In: Proceedings of the 16th international conference on computer vision (ICCV’13), pp 1609–1616
Feng S, Manmatha R, Lavrenko V (2004) Multiple Bernoulli relevance models for image and video annotation. In: Proceedings of the computer vision and pattern recognition (CVPR’04), pp 1002–1009
Foumani S, Nickabadi A (2019) A probabilistic topic model using deep visual word representation for simultaneous image classification and annotation. J Vis Commun Image Represent 59:195–203
Guillaumin M, Mensink T, Verbeek J et al (2009) Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: Proceedings of the 12th international conference on computer vision (ICCV’09), pp 309–316
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196
Hou Y (2015) Image annotation incorporating low-rankness, tag and visual correlation and inhomogeneous errors. In: Proceedings of the 11th international symposium on visual computing (ISVC’15), pp 71–81
Huiskes M, Lew M (2008) The MIR flickr retrieval evaluation. In: Proceedings of the 1st international conference on multimedia information retrieval (MIR’08), pp 39–43
Jeon L, Lavrenko V, Manmantha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th international ACM SIGIR conference on research and development in information retrieval (SIGIR’03), pp 119–126
Jin Y, Jin K, Khan L et al (2008) The randomized approximating graph algorithm for image annotation refinement problem. In: Proceedings of the computer vision and pattern recognition workshop (CVPRW’08), pp 1–8
Jin Y, Khan L, Prabhakaran B (2010) Knowledge based image annotation refinement. J Signal Process Syst 58(3):387–406
Jin Y, Khan L, Wang L et al (2005) Image annotations by combining multiple evidence and wordnet. In: Proceedings of the 13th international conference on multimedia (MM’05), pp 706–715
Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. In: Advances in Neural Information Processing Systems 16 (NIPS’03), pp 553–560
Lee S, Neve W, Plataniotis K et al (2010) MAP-based image tag recommendation using a visual folksonomy. Pattern Recognit Lett 31(9):976–982
Lee S, Neve W, Yong M (2010) Tag refinement in an image folksonomy using visual similarity and tag co-occurrence statistics. Signal Process Image Commun 25(10):761–773
Li P, Cheng J, Li Z et al (2011) Correlated PLSA for image clustering. In: Proceedings of the 17th international conference on multimedia modeling (MMM’11), pp 307–316
Li N, Luo W, Yang K et al (2018) Self-organizing weighted incremental probabilistic latent semantic analysis. Int J Mach Learn Cybern 9(12):1987–1998
Li Z, Shi Z, Liu X et al (2010) Fusing semantic aspects for image annotation and retrieval. J Vis Commun Image Represent 21(8):798–805
Li Z, Shi Z, Liu X et al (2011) Modeling continuous visual features for semantic image annotation and retrieval. Pattern Recognit Lett 32:516–523
Li X, Snoek C, Worring M (2009) Learning social tag relevance by neighbor voting. IEEE Trans Multimed 11(7):1310–1322
Liu D, Hua X, Yang L et al (2009) Tag ranking. In: Proceedings of the 18th international conference on world wide web (WWW’09), pp 351–360
Liu J, Li M, Liu Q et al (2009) Image annotation via graph learning. Pattern Recognit 42(2):218–228
Liu Z, Ma J (2011) Refining image annotation by graph partition and image search engine. J Comput Res Development 48(7):1246–1254
Liu J, Wang B, Li M et al (2007) Dual cross-media relevance model for image annotation. In: Proceedings of the 15th international conference on multimedia (MM’07), pp 605–614
Liu Y, Xu D, Feng S et al (2010) A novel visual words definition algorithm of image patch based on contextual semantic information. Acta Electron Sin 38(5):1156–1161
Liu Z, Zhang C, Chen C (2018) MMDF-LDA: an improved multi-modal latent dirichlet allocation model for social image annotation. Expert Syst Appl 104:168–184
Lu Z, Peng Y, Horace H (2010) Image categorization via robust PLSA. Pattern Recognit Lett 31(1):36–43
Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Proceedings of the European Conference on Computer Vision (ECCV’08), pp 316–329
Monay F, Gatica-Perez D (2003) On image auto-annotation with latent space models. In: Proceedings of the 11th international conference on multimedia (MM’03), pp 275–278
Monay F, Gatica-Perez D (2004) PLSA-based image auto-annotation: constraining the latent space. In: Proceedings of the 12th international conference on multimedia (MM’04), pp 348–351
Monay F, Gatica-Perez D (2007) Modeling semantic aspects for cross-media image indexing. IEEE Trans Pattern Anal Mach Intell 29(10):1802–1817
Nikolopoulos S, Zafeiriou S, Patras I et al (2013) High order PLSA for indexing tagged images. Signal Process 93(8):2212–2228
Romberg S, Lienhart R, Horster E (2012) Multimodal image retrieval: fusing modalities with multilayer multimodal PLSA. Int J Multimed Inf Retrieval 1(1):31–44
Rui X, Li M, Li Z et al (2007) Bipartite graph reinforcement model for web image annotation. In: Proceedings of the 15th international conference on multimedia (MM’07), pp 585–594
Sun L, Ge H, Yoshida S et al (2014) Support vector description of clusters for content-based image annotation. Pattern Recognit 47(3):1361–1374
Tian D, Zhao X, Shi Z (2014) An efficient refining image annotation technique by combining probabilistic latent semantic analysis and random walk model. Intell Autom Soft Comput 20(3):335–345
Tian D (2015) Exploiting PLSA model and conditional random field for refining image annotation. High Technol Lett 21(1):78–84
Tian D, Zhang W, Zhao X et al (2013) Employing PLSA model and max-bisection for refining image annotation. In: Proceedings of the 20th international conference on image processing (ICIP’13), pp 3996–4000
Tian D (2018) Research on PLSA model based semantic image analysis: a systematic review. J Inf Hiding Multimed Signal Process 9(5):1099–1113
Wang C, Jing F, Zhang L et al (2006) Image annotation refinement using random walk with restarts. In: Proceedings of the 14th international conference on multimedia (MM’06), pp 647–650
Wang C, Jing F, Zhang L et al (2007) Content-based image annotation refinement. In: Proceedings of the computer vision and pattern recognition (CVPR’07), pp 1–8
Wang Z, Yi H, Wang J et al (2009) Hierarchical Gaussian mixture model for image annotation via PLSA. In: Proceedings of the 5th international conference on image and graphics (ICIG’09), pp 384–389
Wang J, Zhou J, Xu H et al (2014) Image tag refinement by regularized latent Dirichlet allocation. Comput Vis Image Underst 124(7):61–70
Xu H, Wang J, Hua X et al (2009) Tag refinement by regularized LDA. In: Proceedings of the 17th international conference on multimedia (MM’09), pp 573–576
Zheng Y, Takiguchi T, Ariki Y (2011) Image annotation with concept level feature using PLSA + CCA. In: Proceedings of the 17th international conference on multimedia modeling (MMM’11), pp 454–464
Zhou N, Cheung W, Qiu G et al (2011) A hybrid probabilistic model for unified collaborative and content based image tagging. IEEE Trans Pattern Anal Mach Intell 33(7):1281–1294
Zhu J, Hoi S, Lyu M et al (2008) Near-duplicate keyframe retrieval by nonrigid image matching. In: Proceedings of the 16th international conference on multimedia (MM’08), pp 41–50
Zhu G, Yan S, Ma Y (2010) Image tag refinement towards low-rank, content-tag prior and error sparsity. In: Proceedings of the 18th international conference on multimedia (MM’10), pp 461–470
Acknowledgements
The authors would like to sincerely thank the editor and anonymous reviewers for their valuable comments and insightful suggestions that have helped us to improve the paper. Also, the authors thank Prof. Xiaofei Zhao for stimulating discussions and helpful hints. In addition, this work is fully supported by the National Program on Key Basic Research Project (973 Program) (No. 2013CB329502), National Natural Science Foundation of China (No. 61035003, No. 61202212), Tianchenghuizhi Fund for Innovation and Promotion of Education (No. 2018A03036) and Key R&D Program of the Shaanxi Province of China (No. 2018GY-037).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tian, D., Shi, Z. A two-stage hybrid probabilistic topic model for refining image annotation. Int. J. Mach. Learn. & Cyber. 11, 417–431 (2020). https://doi.org/10.1007/s13042-019-00983-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-019-00983-w