Abstract
In this article, a general framework of image annotation is proposed by involving salient object detection (SOD), feature extraction, feature selection, and multi-label classification. For SOD, Augmented-Gradient Vector Flow (A-GVF) is proposed, which fuses benefits of GVF and Minimum Directional Contrast. The article also proposes to control the background information to be included for annotation. This article brings about a comprehensive study of all major feature selection methods for a study on four publicly available datasets. The study concludes with the proposition of using Fisher’s method for reducing the dimension of features. Moreover, this article also proposes a set of features that are found to be strong discriminants by most of the methods. This reduced set for image annotation gives 3--4% better accuracy across all the four datasets. This article also proposes an improved multi-label classification algorithm C-MLFE.
- Radhakrishna Achanta, Francisco Estrada, Patricia Wils, and Sabine Süsstrunk. 2008. Salient region detection and segmentation. In Proceedings of the 6th International Conference on Computer Vision Systems (ICVS’08). Springer-Verlag, Berlin, 66--75. Retrieved from http://dl.acm.org/citation.cfm?id=1788524.178Google ScholarDigital Library
- S. Agarwal and D. Roth. 2002. Learning a sparse representation for object detection. In Proceedings of the European Conference on Computer Vision, Vol. 4. Springer-Verlag, Copenhagen, Denmark, 113--130.Google Scholar
- K. Akhilesh and R. R. Sedamkar. 2016. Automatic image annotation using an ant colony optimization algorithm (ACO). In Proceedings of the IEEE 7th Power India International Conference (PIICON’16). 1--4. DOI:https://doi.org/10.1109/POWERI.2016.8077423Google Scholar
- Mykhaylo Andriluka, Jasper R. R. Uijlings, and Vittorio Ferrari. 2018. Fluid annotation: A human-machine collaboration interface for full image annotation. Retrieved from http://arxiv.org/abs/1806.07527.Google Scholar
- Kai Keng Ang, Zheng Yang Chin, Haihong Zhang, and Cuntai Guan. 2012. Mutual information-based selection of optimal spatial-temporal patterns for single-trial EEG-based BCIs. Pattern Recogn. 45, 6 (June 2012), 2137--2144. DOI:https://doi.org/10.1016/j.patcog.2011.04.018Google ScholarDigital Library
- Paul S. Bradley and O. L. Mangasarian. 1998. Feature selection via concave minimization and support vector machines. In Proceedings of the 15th International Conference on Machine Learning (ICML’98). Morgan Kaufmann, San Francisco, CA, 82--90. Retrieved from http://dl.acm.org/citation.cfm?id=645527.657467.Google Scholar
- Deng Cai, Chiyuan Zhang, and Xiaofei He. 2010. Unsupervised feature selection for multi-cluster data. In Proceedings of the ACM Special Interest Group (SIG) on Knowledge Discovery and Data Mining (KDD’10).Google ScholarDigital Library
- Ye Chen, D. Marc Kilgour, and Keith W. Hipel. 2011. An extreme-distance approach to multiple criteria ranking. Math. Comput. Model. 53, 5 (2011), 646--658. DOI:https://doi.org/10.1016/j.mcm.2010.10.001Google ScholarDigital Library
- M. Cheng, N. J. Mitra, X. Huang, P. H. S. Torr, and S. Hu. 2015. Global contrast-based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37, 3 (Mar. 2015), 569--582. DOI:https://doi.org/10.1109/TPAMI.2014.2345401Google ScholarDigital Library
- Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Philip H. S. Torr, and Shi-Min Hu. 2015. Global contrast-based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37, 3 (2015), 569--582. DOI:https://doi.org/10.1109/TPAMI.2014.2345401Google ScholarDigital Library
- Y. Cheung and H. Zeng. 2010. Feature selection and kernel learning for local learning-based clustering. IEEE Trans. Pattern Anal. Mach. Intell. 33 (Nov. 2010), 1532--1547. DOI:https://doi.org/10.1109/TPAMI.2010.215Google Scholar
- Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yan-Tao Zheng. 2009. NUS-WIDE: A real-world web image database from national university of Singapore. In Proceeding of the ACM Conference on Image and Video Retrieval (CIVR’09).Google ScholarDigital Library
- Y. Deng, Y. Sun, Y. Zhu, Y. Xu, Q. Yang, S. Zhang, Z. Wang, J. Sun, W. Zhao, X. Zhou, and K. Yuan. 2019. A new framework to reduce doctor’s workload for medical image annotation. IEEE Access 7 (2019), 107097--107104. DOI:https://doi.org/10.1109/ACCESS.2019.2917932Google ScholarCross Ref
- Liang Du and Yi-Dong Shen. 2015. Unsupervised feature selection with adaptive structure learning. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15). ACM, New York, NY, 209--218. DOI:https://doi.org/10.1145/2783258.2783345Google ScholarDigital Library
- Lijuan Duan, Chunpeng Wu, Jun Miao, Laiyun Qing, and Yu Fu. 2011. Visual saliency detection by spatially weighted dissimilarity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE Computer Society, Washington, DC, 473--480. DOI:https://doi.org/10.1109/CVPR.2011.5995676Google ScholarDigital Library
- Erkut Erdem and Aykut Erdem. 2013. Visual saliency estimation by nonlinearly integrating features using region covariances. J. Vision 13, 4 (2013), 11. DOI:https://doi.org/10.1167/13.4.11 arXiv:/data/journals/jov/932809/i1534-7362-13-4-11.pdfGoogle ScholarCross Ref
- Jianping Fan, Yi Shen, Chunlei Yang, and Ning Zhou. 2011. Structured max-margin learning for inter-related classifier training and multilabel image annotation. IEEE Trans. Image Process. 20, 3 (2011), 837--854.Google ScholarDigital Library
- Ruochen Fan, Qibin Hou, Ming-Ming Cheng, Tai-Jiang Mu, and Shi-Min Hu. 2017. SNet: Single stage salient-instance segmentation. Retrieved from http://arxiv.org/abs/1711.07618.Google Scholar
- Shenghua Gao, Liang-Tien Chia, Ivor Wai-Hung Tsang, and Zhixiang Ren. 2014. Concurrent single-label image classification and annotation via efficient multi-layer group sparse coding. IEEE Trans. Multimedia 16, 3 (2014), 762--771.Google ScholarDigital Library
- S. Goferman, L. Zelnik-Manor, and A. Tal. 2012. Context-aware saliency detection. IEEE Trans. Pattern Anal. Mach. Intell. 34, 10 (Oct. 2012), 1915--1926. DOI:https://doi.org/10.1109/TPAMI.2011.272Google ScholarDigital Library
- Yunchao Gong, Yangqing Jia, Alexander Toshev, Thomas Leung, and Sergey Ioffe. 2014. Deep convolutional ranking for multilabel image annotation. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Quanquan Gu, Zhenhui Li, and Jiawei Han. 2011. Generalized fisher score for feature selection. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI’11). AUAI Press, Arlington, VA, 266--273. Retrieved from http://dl.acm.org/citation.cfm?id=3020548.3020580.Google Scholar
- J. Guo, Y. Quo, X. Kong, and R. He. 2017. Unsupervised feature selection with ordinal locality. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’17). 1213--1218. DOI:https://doi.org/10.1109/ICME.2017.8019357Google Scholar
- Jun Guo and Wenwu Zhu. 2018. Dependence Guided Unsupervised Feature Selection. Retrieved from https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17171.Google Scholar
- Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. 2002. Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 1 (Jan. 2002), 389--422. DOI:https://doi.org/10.1023/A:1012487302797Google ScholarDigital Library
- Mark A. Hall. 1998. Correlation-based Feature Selection for Machine Learning. Technical Report.Google Scholar
- Xiaofei He, Deng Cai, and Partha Niyogi. 2006. Laplacian score for feature selection. In Advances in Neural Information Processing Systems, vol. 18. Y. Weiss, B. Schölkopf, and J. C. Platt (Eds.). MIT Press, 507--514. Retrieved from http://papers.nips.cc/paper/2909-laplacian-score-for-feature-selection.pdf.Google Scholar
- Q. Hou, M. Cheng, X. Hu, A. Borji, Z. Tu, and P. H. S. Torr. 2019. Deeply supervised salient object detection with short connections. IEEE Trans. Pattern Anal. Mach. Intell. 41, 4 (Apr. 2019), 815--828. DOI:https://doi.org/10.1109/TPAMI.2018.2815688Google ScholarDigital Library
- X. Hou, J. Harel, and C. Koch. 2012. Image signature: Highlighting sparse salient regions. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1 (Jan. 2012), 194--201. DOI:https://doi.org/10.1109/TPAMI.2011.146Google Scholar
- Xiaodi Hou and Liqing Zhang. 2007. Saliency detection: A spectral residual approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07). IEEE, 1--8.Google ScholarCross Ref
- L. Hu and L. Chen. 2018. Semi-automatic annotation of distorted image based on neighborhood rough set. In Proceedings of the 13th IEEE Conference on Industrial Electronics and Applications (ICIEA’18). 2782--2786. DOI:https://doi.org/10.1109/ICIEA.2018.8398182Google Scholar
- X. Huang and Y. Zhang. 2017. 300-FPS salient object detection via minimum directional contrast. IEEE Trans. Image Process. 26, 9 (Sept. 2017), 4243--4254. DOI:https://doi.org/10.1109/TIP.2017.2710636Google ScholarDigital Library
- B. Jiang, L. Zhang, H. Lu, C. Yang, and M. Yang. 2013. Saliency detection via absorbing Markov chain. In Proceedings of the IEEE International Conference on Computer Vision. 1665--1672. DOI:https://doi.org/10.1109/ICCV.2013.209Google Scholar
- Michael Kass, Andrew Witkin, and Demetri Terzopoulos. 1988. Snakes: Active contour models. Int. J. Comput. Vision 1, 4 (Jan. 1988), 321--331. DOI:https://doi.org/10.1007/BF00133570Google ScholarCross Ref
- Igor Kononenko, Edvard Šimec, and Marko Robnik-Šikonja. 1997. Overcoming the myopia of inductive learning algorithms with RELIEFF. Appl. Intell. 7, 1 (Jan. 1997), 39--55. DOI:https://doi.org/10.1023/A:1008280620621Google ScholarDigital Library
- L. Li and Li Fei-Fei. 2007. What, where and who? Classifying events by scene and object recognition. In Proceedings of the IEEE 11th International Conference on Computer Vision. 1--8. DOI:https://doi.org/10.1109/ICCV.2007.4408872Google ScholarCross Ref
- X. Li, H. Lu, L. Zhang, X. Ruan, and M. Yang. 2013. Saliency detection via dense and sparse reconstruction. In Proceedings of the IEEE International Conference on Computer Vision. 2976--2983. DOI:https://doi.org/10.1109/ICCV.2013.370Google Scholar
- Yin Li, Xiaodi Hou, Christof Koch, James M. Rehg, and Alan L. Yuille. 2014. The secrets of salient object segmentation. Retrieved from http://arxiv.org/abs/1406.2807.Google Scholar
- Zhenqiu Liu and Gang Li. 2014. Efficient regularized regression for variable selection with L0 penalty. Retrieved from http://arxiv.org/abs/1407.7508.Google Scholar
- H. Lu, X. Li, L. Zhang, X. Ruan, and M. Yang. 2016. Dense and sparse reconstruction error-based saliency descriptor. IEEE Trans. Image Process. 25, 4 (Apr. 2016), 1592--1603. DOI:https://doi.org/10.1109/TIP.2016.2524198Google ScholarDigital Library
- Ran Margolin, Ayellet Tal, and Lihi Zelnik-Manor. 2013. What makes a patch distinct? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1139--1146.Google ScholarDigital Library
- N. Murray, M. Vanrell, X. Otazu, and C. A. Parraga. 2011. Saliency estimation using a non-parametric low-level vision model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE Computer Society, Washington, DC, 433--440. DOI:https://doi.org/10.1109/CVPR.2011.5995506Google Scholar
- Yulei Niu, Zhiwu Lu, Ji-Rong Wen, Tao Xiang, and Shih-Fu Chang. 2017. Multi-modal multi-scale deep learning for large-scale image annotation. Retrieved from http://arxiv.org/abs/1709.01220.Google Scholar
- H. Peng, B. Li, H. Ling, W. Hu, W. Xiong, and S. J. Maybank. 2017. Salient object detection via structured matrix decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 4 (Apr. 2017), 818--832. DOI:https://doi.org/10.1109/TPAMI.2016.2562626Google ScholarDigital Library
- Matti Pietikainen, Matti Pietikaeinen, Timo Ojala, Matti Pietikäinen, and David Harwood. 1996. A comparative study of texture measures with classification based on feature distributions. Pattern Recogn. 29 (1996), 51--59.Google ScholarCross Ref
- F. Radenovic, G. Tolias, and O. Chum. 2019. Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41, 7 (July 2019), 1655--1668. DOI:https://doi.org/10.1109/TPAMI.2018.2846566Google ScholarCross Ref
- Esa Rahtu, Juho Kannala, Mikko Salo, and Janne Heikkilä. 2010. Segmenting salient objects from images and videos. In Proceedings of the European Conference on Computer Vision (ECCV’10), Kostas Daniilidis, Petros Maragos, and Nikos Paragios (Eds.). Springer, Berlin, 366--379.Google ScholarCross Ref
- S. Renuse and N. Bogiri. 2017. Multi label learning and multi feature extraction for automatic image annotation. In Proceedings of the International Conference on Computing, Communication, Control and Automation (ICCUBEA’17). 1--6. DOI:https://doi.org/10.1109/ICCUBEA.2017.8463659Google Scholar
- Hamed Rezazadegan Tavakoli, Esa Rahtu, and Janne Heikkilä. 2011. Fast and efficient saliency detection using sparse sampling and kernel density estimation. In Image Analysis, Anders Heyden and Fredrik Kahl (Eds.). Springer, Berlin, 666--675.Google Scholar
- Giorgio Roffo and Simone Melzi. 2017. Ranking to learn: Feature ranking and selection via eigenvector centrality. Retrieved from http://arxiv.org/abs/1704.05409.Google Scholar
- Giorgio Roffo, Simone Melzi, Umberto Castellani, and Alessandro Vinciarelli. 2017. Infinite latent feature selection: A probabilistic latent graph-based ranking approach. Retrieved from http://arxiv.org/abs/1707.07538Google Scholar
- G. Roffo, S. Melzi, and M. Cristani. 2015. Infinite feature selection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 4202--4210. DOI:https://doi.org/10.1109/ICCV.2015.478Google Scholar
- Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman. 2008. LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vision 77, 1--3 (May 2008), 157--173. DOI:https://doi.org/10.1007/s11263-007-0090-8Google ScholarDigital Library
- Hae Jong Seo and Peyman Milanfar. 2009. Static and space-time visual saliency detection by self-resemblance. J. Vision 9, 12 (2009), 15. DOI:https://doi.org/10.1167/9.12.15 arXiv:/data/journals/jov/932859/jov-9-12-15.pdfGoogle ScholarCross Ref
- Robert Tibshirani. 1996. Regression Shrinkage and selection via the Lasso. J. Roy. Stat. Soc. Ser. B (Methodol.) 58, 1 (1996), 267--288. Retrieved from http://www.jstor.org/stable/2346178.Google ScholarCross Ref
- K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek. 2010. Evaluating color descriptors for object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (2010), 1582--1596. Retrieved from https://ivi.fnwi.uva.nl/isis/publications/2010/vandeSandeTPAMI2010.Google ScholarDigital Library
- F. Wang, J. Liu, S. Zhang, G. Zhang, Y. Li, and F. Yuan. 2019. Inductive zero-shot image annotation via embedding graph. IEEE Access 7 (2019), 107816--107830. DOI:https://doi.org/10.1109/ACCESS.2019.2925383Google ScholarCross Ref
- Chenyang Xu and J. L. Prince. 1997. Gradient vector flow: A new external force for snakes. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 66--71. DOI:https://doi.org/10.1109/CVPR.1997.609299Google Scholar
- Chuan Yang, Lihe Zhang, and Huchuan Lu. 2013. Graph-regularized saliency detection with convex-hull-based center prior. IEEE Signal Process. Lett. 20, 7 (2013), 637--640. Retrieved from http://dblp.uni-trier.de/db/journals/spl/spl20.htmlYangZL13.Google ScholarCross Ref
- Chuan Yang, Lihe Zhang, Ruan Xiang Lu, Huchuan, and Ming-Hsuan Yang. 2013. Saliency detection via graph-based manifold ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE, 3166--3173.Google ScholarDigital Library
- Yi Yang, Heng Tao Shen, Zhigang Ma, Zi Huang, and Xiaofang Zhou. 2011. L2,1-norm regularized discriminative feature selection for unsupervised learning. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI’11). AAAI Press, 1589--1594. DOI:https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-267Google Scholar
- Baochang Zhang, Yongsheng Gao, Sanqiang Zhao, and Jianzhuang Liu. 2010. Local derivative pattern versus local binary pattern: Face recognition with high-order local pattern descriptor. IEEE Trans. Image Process. 19, 2 (Feb. 2010), 533--544. DOI:https://doi.org/10.1109/TIP.2009.2035882Google Scholar
- Lingyun Zhang, Matthew H. Tong, Tim K. Marks, Honghao Shan, and Garrison W. Cottrell. 2008. Sun: A Bayesian framework for saliency using natural statistics. J. Vision 8, 32 (2008). DOI:https://doi.org/10.1167/8.7.32Google ScholarCross Ref
- M. Zhang and Z. Zhou. 2014. A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26, 8 (Aug. 2014), 1819--1837. DOI:https://doi.org/10.1109/TKDE.2013.39Google ScholarCross Ref
- Qian-Wen Zhang, Yun Zhong, and Min-Ling Zhang. 2018. Feature-induced labeling information enrichment for multi-label learning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), the 30th Innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’18). 4446--4453. Retrieved from https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16454.Google Scholar
- X. Zhang and S. Lou. 2017. Image emotional semantic annotation based on fusion features. In Proceedings of the 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI’17). 1--5. DOI:https://doi.org/10.1109/CISP-BMEI.2017.8301971Google Scholar
- Shuai Zheng, Xiao Cai, Chris H. Q. Ding, Feiping Nie, and Heng Huang. 2016. A closed form solution to multi-view low-rank regression. Retrieved from http://arxiv.org/abs/1610.04668.Google Scholar
- W. Zhu, S. Liang, Y. Wei, and J. Sun. 2014. Saliency optimization from robust background detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2814--2821. DOI:https://doi.org/10.1109/CVPR.2014.360Google Scholar
Index Terms
- Design, Analysis, and Implementation of Efficient Framework for Image Annotation
Recommendations
Image annotation by composite kernel learning with group structure
MM '11: Proceedings of the 19th ACM international conference on MultimediaWe can obtain more and more kinds of heterogeneous features (such as color, shape and texture) in images which can be extracted to describe various aspects of visual characteristics. Those high-dimensional heterogeneous visual features are intrinsically ...
Image annotation techniques based on feature selection for class-pairs
Image annotation technique can be formulated as a multi-class classification problem, which can be solved by the ensemble of multiple class-pair classifiers. Support vector machine (SVM) classifiers based on optimal class-pair feature subsets from the ...
Manifold regularized multi-view feature selection for social image annotation
The features used in many social media analysis-based applications are usually of very high dimension. Feature selection offers several advantages in highly dimensional cases. Recently, multi-task feature selection has attracted much attention, and has ...
Comments