ABSTRACT
In this paper, we exploit the problem of inferring images' semantic concepts from community-contributed images and their associated noisy tags. To infer the concepts more accurately, we propose a novel sparse graph-based semi-supervised learning approach for harnessing the labeled and unlabeled data simultaneously. The sparse graph constructed by datum-wise one-vs-all sparse reconstructions of all samples can remove most of the concept-unrelated links among the data, thus is more robust and discriminative than conventional graphs. More importantly, we propose an effective training label refinement strategy within this graph-based learning framework to handle the noise in the tags, by bringing in a dual regularization for both the quantity and sparsity of the noise. In addition, we construct an informative compact concept space with small semantic gap to infer the semantic concepts in this space to bridge the semantic gap. The relations among different concepts are inherently embedded in this space to help the concept inference. We conduct extensive experiments on a real-world community-contributed image database consisting of 55,615 Flickr images and associated tags. The results demonstrate the effectiveness of the proposed approaches and the capability of our method to deal with the noise in the tags. We further show that we could achieve comparable performance by inferring semantic concepts from training data with noisy tags versus training data with clean ground-truth labels.
- l1-magic. http://www.acm.caltech.edu/l1magic/.Google Scholar
- M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 2003. Google ScholarDigital Library
- S. Boll, P. Sandhaus, A. Scherp, and U. Westermann. Semantics, content, and structure of many for the creation of personal photo albums. In ACM International Conference on Multimedia, pages 641--650, 2007. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. 2001. http://www.csie.ntu.edu.tw/Ücjlin/libsvm.Google Scholar
- O. Chapelle, A. Zien, and B. Scholkopf. Semi-supervised Learning. MIT Press, 2006.Google ScholarDigital Library
- T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y.-T. Zheng. NUS-WIDE: A real-world web image database from national university of singapore. In Proc. of ACM Conf. on Image and Video Retrieval, Santorini, Greece., July 8-10, 2009. Google ScholarDigital Library
- D. L. Donoho. For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics, 59(6):797--829, 2006.Google ScholarCross Ref
- R. Duda, D. Stork, and P. Hart. Pattern Classification. JOHN WILEY, 2000. Google ScholarDigital Library
- C. Elkan. Using the triangle inequality to accelerate k-means. In Proc. of the Twentieth International Conference on Machine Learning, 2003.Google Scholar
- R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. Learning object categories from google's image search. In IEEE International Conference on Computer Vision, 2005. Google ScholarDigital Library
- K.-S. Goh, E. Y. Chang, and W.-C. Lai. Multimodal concept-dependent active learning for image retrieval. In Proc. of the 12th annual ACM international conference on Multimedia, pages 564--571, 2004. Google ScholarDigital Library
- A. Hauptmann, R. Yan, W.-H. Lin, M. Christel, and H. Wactlar. Can high-level concepts fill the semantic gap in video retrieval? a case study with broadcast news. IEEE Transactions on Multimedia, 9(5):958--966, 2007. Google ScholarDigital Library
- J. He, M. Li, H.-J. Zhang, H. Tong, and C. Zhang. Manifold-ranking based image retrieval. In ACM Multimedia, 2004. Google ScholarDigital Library
- J. Jeon, V. Lavrenko, and R. Manmatha. Automatic image annotation and retrieval using cross-media relevance models. In Proc. of the ACM conference on Research and development in informaion retrieval, pages 119--126, 2003. Google ScholarDigital Library
- X. Li, L. Chen, L. Zhang, F. Lin, and W.-Y. Ma. Image annotation by large-scale content-based image retrieval. In ACM Multimedia, 2006. Google ScholarDigital Library
- Y. Lu, L. Zhang, Q. Tian, and W.-Y. Ma. What are the high-level concepts with small semantic gaps? In IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska, USA, 2008.Google Scholar
- J. Magalhaes, F. Ciravegna, and S. Ruger. Exploring multimedia in a keyword space. In ACM Multimedia, 2008. Google ScholarDigital Library
- J. Nocedal and S. J. Wright. Numerical Optimization. Springer-Verlag, 2006.Google Scholar
- R. Rao, B. Olshausen, and M. Lewicki. Probabilistic Models of the Brain: Perception and Neural Function. MIT Press, 2002.Google ScholarCross Ref
- N. Rasiwasia, P. L. Moreno, and N. Vasconcelos. Bridging the gap: Query by semantic example. IEEE Transactions on Multimedia, 9(5):923--938, 2007. Google ScholarDigital Library
- Y. Saad. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Second Edition, 2003. Google ScholarDigital Library
- Y. Saad and M. Schultz. Gmres: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM Journal on Scientific and Statistical Computing, 7:856--869, 1986. Google ScholarDigital Library
- S.T.Roweis and L.K.Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323--2326, 2000.Google ScholarCross Ref
- Y. Sun, S. Shimada, Y. Taniguchi, and A. Kojima. A novel region-based approach to visual concept modeling using web images. In Proceeding of the 16th ACM International Conference on Multimedia, Canada, 2008. Google ScholarDigital Library
- J. Tang, X.-S. Hua, and et al. Video annotation based on kernel linear neighborhood propagation. IEEE Transaction on Multimedia, 10(4), 2008. Google ScholarDigital Library
- A. Torralba, R. Fergus, and W. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transaction on Pattern Analysis and Machine Intelligence, 30(11), 2008. Google ScholarDigital Library
- N. Vasconcelos. From pixels to semantic spaces: Advances in content-based image retrieval. IEEE Computer, 40(7):20--26, 2007. Google ScholarDigital Library
- C. Wang, F. Jing, L. Zhang, and H.-J. Zhang. Image annotation refinement using random walk with restarts. In Proc. ACM Multimedia, 2006. Google ScholarDigital Library
- F. Wang and C. Zhang. Label propagation through linear neighborhoods. In 23rd International Conference on Machine Learning, June 2006. Google ScholarDigital Library
- F. Wang and C. Zhang. Label propagation through linear neighborhoods. IEEE Transactions on Knowledge and Data Engineering, 20(1):55--67, 2008. Google ScholarDigital Library
- X.-J. Wang, L. Zhang, F. Jing, and W.-Y. Ma. Annosearch: Image auto-annotation by search. In IEEE Conference on Computer Vision and Pattern Recognition. New York, USA., Jun. 2006. Google ScholarDigital Library
- X.-J. Wang, L. Zhang, X. Li, and W.-Y. Ma. Annotating images by mining image search results. IEEE Transaction on Pattern Analysis and Machine Intelligence, 30(11):1919--1932, 2008. Google ScholarDigital Library
- X. Y. Wei, C. W. Ngo, and Y. G. Jiang. Selection of concept detectors for video search by ontology-enriched semantic spaces. IEEE Transactions on Multimedia, 10(6), 2008. Google ScholarDigital Library
- J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 31(2):210--227, Feb. 2009. Google ScholarDigital Library
- D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. In Proc. 17-th Annual Conference on Neural Information Processing Systems, 2003.Google Scholar
- X. Zhu. Semi-Supervised Learning with Graphs. PhD Thesis, CMU, 2005. Google ScholarDigital Library
- X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic function. In Proc. 20-th International Conference on Machine Learning, 2003.Google Scholar
Index Terms
- Inferring semantic concepts from community-contributed images and noisy tags
Recommendations
Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images
In this article, we exploit the problem of annotating a large-scale image corpus by label propagation over noisily tagged web images. To annotate the images more accurately, we propose a novel kNN-sparse graph-based semi-supervised learning approach for ...
Linking Images to Semantic Knowledge Base with User-generated Tags
SEMANTiCS 2016: Proceedings of the 12th International Conference on Semantic SystemsImages account for an important part of Multimedia Linked Open Data, but currently most of the semantic relations between images and other entities are based on manual semantic annotation. With the popularity of image hosting websites, such as Flickr, ...
Semi- and Weakly- Supervised Semantic Segmentation with Deep Convolutional Neural Networks
MM '15: Proceedings of the 23rd ACM international conference on MultimediaSuccessful semantic segmentation methods typically rely on the training datasets containing a large number of pixel-wise labeled images. To alleviate the dependence on such a fully annotated training dataset, in this paper, we propose a semi- and weakly-...
Comments