research-article

Self-Paced Cross-Modal Subspace Matching

Authors:
Jian Liang

Chinese Academy of Sciences, Beijing, China

Chinese Academy of Sciences, Beijing, China
View Profile

,
Zhihang Li

Chinese Academy of Sciences, Beijing, China

Chinese Academy of Sciences, Beijing, China
View Profile

,
Dong Cao

Chinese Academy of Sciences, Beijing, China

Chinese Academy of Sciences, Beijing, China
View Profile

,
Ran He

Chinese Academy of Sciences, Beijing, China

Chinese Academy of Sciences, Beijing, China
View Profile

,
Jingdong Wang

Microsoft Research, Beijing, China

Microsoft Research, Beijing, China
View Profile

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalJuly 2016Pages 569–578https://doi.org/10.1145/2911451.2911527

Published:07 July 2016Publication History

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Pages 569–578

ABSTRACT

Cross-modal matching methods match data from different modalities according to their similarities. Most existing methods utilize label information to reduce the semantic gap between different modalities. However, it is usually time-consuming to manually label large-scale data. This paper proposes a Self-Paced Cross-Modal Subspace Matching (SCSM) method for unsupervised multimodal data. We assume that multimodal data are pair-wised and from several semantic groups, which form hard pair-wised constraints and soft semantic group constraints respectively. Then, we formulate the unsupervised cross-modal matching problem as a non-convex joint feature learning and data grouping problem. Self-paced learning, which learns samples from 'easy' to 'complex', is further introduced to refine the grouping result. Moreover, a multimodal graph is constructed to preserve the relationship of both inter- and intra-modality similarity. An alternating minimization method is employed to minimize the non-convex optimization problem, followed by the discussion on its convergence analysis and computational complexity. Experimental results on four multimodal databases show that SCSM outperforms state-of-the-art cross-modal subspace learning methods.

References

G. Andrew, R. Arora, J. Bilmes, and K. Livescu. Deep canonical correlation analysis. In ICML, pages 1247--1255, 2013.Google ScholarDigital Library
F. Bach and Z. Harchaoui. Diffrac: a discriminative and flexible framework for clustering. In NIPS, pages 49--56, 2008.Google Scholar
S. Basu and J. Christensen. Teaching classification boundaries to humans. In AAAI, pages 109--115, 2013. Google ScholarDigital Library
Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In ICML, pages 41--48. ACM, 2009. Google ScholarDigital Library
D. M. Blei and M. I. Jordan. Modeling annotated data. In SIGIR, pages 127--134, 2003. Google ScholarDigital Library
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarDigital Library
J. Costa Pereira, E. Coviello, G. Doyle, N. Rasiwasia, G. R. Lanckriet, R. Levy, and N. Vasconcelos. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE TPAMI, 36(3):521--535, 2014. Google ScholarDigital Library
P. Dhillon, D. P. Foster, and L. H. Ungar. Multi-view learning of word embeddings via cca. In NIPS, pages 199--207, 2011.Google ScholarDigital Library
P. S. Dhillon, J. Rodu, D. P. Foster, and L. H. Ungar. Two step cca: A new spectral method for estimating vector models of words. In ICML, pages 1551--1558, 2012.Google Scholar
M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. IJCV, 88(2):303--338, 2010. Google ScholarDigital Library
Y. Gong, Q. Ke, M. Isard, and S. Lazebnik. A multi-view embedding space for modeling internet images, tags, and their semantics. IJCV, 106(2):210--233, 2014. Google ScholarDigital Library
D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12):2639--2664, 2004. Google ScholarDigital Library
R. He, M. Zhang, L. Wang, Y. Ji, and Q. Yin. Cross-modal subspace learning via pairwise constraints. IEEE TIP, 24(12):5543--5556, 2015.Google Scholar
X. He and P. Niyogi. Locality preserving projections. In NIPS, pages 153--160, 2003.Google ScholarDigital Library
S. J. Hwang and K. Grauman. Reading between the lines: Object localization using implicit cues from image tags. IEEE TPAMI, 34(6):1145--1158, 2012. Google ScholarDigital Library
Y. Jia, M. Salzmann, and T. Darrell. Learning cross-modality similarity for multinomial data. In ICCV, pages 2407--2414, 2011. Google ScholarDigital Library
L. Jiang, D. Meng, T. Mitamura, and A. G. Hauptmann. Easy samples first: Self-paced reranking for zero-example multimedia search. In MM, pages 547--556, 2014. Google ScholarDigital Library
C. Kang, S. Xiang, S. Liao, C. Xu, and C. Pan. Learning consistent feature representation for cross-modal multimedia retrieval. IEEE TMM, 17(3):370--381, 2015.Google Scholar
R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba, and S. Fidler. Skip-thought vectors. In NIPS, pages 3276--3284, 2015. Google ScholarDigital Library
M. P. Kumar, B. Packer, and D. Koller. Self-paced learning for latent variable models. In NIPS, pages 1189--1197, 2010.Google ScholarDigital Library
A. Li, S. Shan, X. Chen, B. Ma, S. Yan, and W. Gao. Cross-pose face recognition by canonical correlation analysis. arXiv preprint: 1507.08076, 2015.Google Scholar
J. Liang, R. He, Z. Sun, and T. Tan. Group-invariant cross-modal subspace learning. In IJCAI, 2016.Google Scholar
D. Lin and X. Tang. Inter-modality face recognition. In ECCV, pages 13--26. 2006. Google ScholarDigital Library
J. Masci, M. M. Bronstein, A. Bronstein, and J. Schmidhuber. Multimodal similarity-preserving hashing. IEEE TPAMI, 36(4):824--830, 2014. Google ScholarDigital Library
D. Meng and Q. Zhao. What objective does self-paced learning indeed optimize? arXiv preprint:1511.06049, 2015.Google Scholar
A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, 42(3):145--175, 2001. Google ScholarDigital Library
B. Ozdemir and L. S. Davis. A probabilistic framework for multimodal retrieval using integrative indian buffet process. In NIPS, pages 2384--2392, 2014. Google ScholarDigital Library
D. Putthividhy, H. T. Attias, and S. S. Nagarajan. Topic regression multi-modal latent dirichlet allocation for image annotation. In CVPR, pages 3408--3415, 2010.Google ScholarCross Ref
N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. R. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In MM, pages 251--260, 2010. Google ScholarDigital Library
N. Rasiwasia, P. J. Moreno, and N. Vasconcelos. Bridging the gap: Query by semantic example. IEEE TMM, 9(5):923--938, 2007. Google ScholarDigital Library
R. Rosipal and N. Krämer. Overview and recent advances in partial least squares. In Subspace, Latent Structure and Feature Selection, pages 34--51. Springer, 2006. Google ScholarDigital Library
B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. Labelme: a database and web-based tool for image annotation. IJCV, 77(1--3):157--173, 2008. Google ScholarDigital Library
A. Sharma, A. Kumar, H. Daume III, and D. W. Jacobs. Generalized multiview analysis: A discriminative latent space. In CVPR, pages 2160--2167, 2012. Google ScholarDigital Library
F. Shen, C. Shen, W. Liu, and H. Tao Shen. Supervised discrete hashing. In CVPR, pages 37--45, 2015.Google ScholarCross Ref
N. Srivastava and R. R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In NIPS, pages 2222--2230, 2012.Google ScholarDigital Library
K. Wang, R. He, L. Wang, W. Wang, and T. Tan. Joint feature selection and subspace learning for cross-modal retrieval. IEEE TPAMI, 2015. doi:10.1109/TPAMI.2015.2505311.Google Scholar
K. Wang, R. He, W. Wang, L. Wang, and T. Tan. Learning coupled feature spaces for cross-modal matching. In ICCV, pages 2088--2095, 2013. Google ScholarDigital Library
Y. Wei, Y. Zhao, C. Lu, S. Wei, L. Liu, Z. Zhu, and S. Yan. Cross-modal retrieval with cnn visual features: A new baseline. IEEE TCYB, 2016. doi:10.1109/TCYB.2016.2519449.Google ScholarCross Ref
B. Wu, Q. Yang, W.-S. Zheng, Y. Wang, and J. Wang. Quantized correlation hashing for fast cross-modal search. In IJCAI, pages 3946--3952, 2015. Google ScholarDigital Library
Z. Yu, F. Wu, Y. Yang, Q. Tian, J. Luo, and Y. Zhuang. Discriminative coupled dictionary hashing for fast cross-media retrieval. In SIGIR, pages 395--404, 2014. Google ScholarDigital Library
T. Zhang and J. Wang. Collaborative quantization for cross-modal similarity search. In CVPR, 2016.Google ScholarCross Ref
Q. Zhao, D. Meng, L. Jiang, Q. Xie, Z. Xu, and A. G. Hauptmann. Self-paced learning for matrix factorization. In AAAI, pages 3196--3202, 2015. Google ScholarDigital Library
J. Zhou, G. Ding, and Y. Guo. Latent semantic sparse hashing for cross-modal similarity search. In SIGIR, pages 415--424, 2014. Google ScholarDigital Library
Y. T. Zhuang, Y. F. Wang, F. Wu, Y. Zhang, and W. M. Lu. Supervised coupled dictionary learning with group structures for multi-modal retrieval. In AAAI, pages 1070--1076, 2013. Google ScholarDigital Library

Index Terms

Self-Paced Cross-Modal Subspace Matching
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Visual content-based indexing and retrieval
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Environment-specific retrieval
        Web and social media search

Recommendations

Meta Self-Paced Learning for Cross-Modal Matching
MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Cross-modal matching has attracted growing attention due to the rapid emergence of the multimedia data on the web and social applications. Recently, many re-weighting methods have been proposed for accelerating model training by designing a mapping ...
Read More
Multi-modal self-paced learning for image classification
Abstract
Self-paced learning (SPL) is a powerful framework, where samples from easy ones to more complex ones are gradually involved in the learning process. Its superiority is significant when dealing with challenging vision tasks, like ...
Read More
Self-paced multi-label co-training
Abstract
Multi-label learning aims to solve classification problems where instances are associated with a set of labels. In reality, it is generally easy to acquire unlabeled data but expensive or time-consuming to label them, and this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
July 2016
1296 pages
ISBN:9781450340694
DOI:10.1145/2911451
General Chairs:
Raffaele Perego
ISTI-CNR, Italy
,
Fabrizio Sebastiani
Qatar Computing Research Institute, HBKU, Qatar
,
Program Chairs:
Javed Aslam
Northeastern University, US
,
Ian Ruthven
University of Strathclyde, UK
,
Justin Zobel
University of Melbourne, Australia
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 July 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cross-modal matching
heterogeneous data
self-paced learning
unsupervised subspace learning
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '16 Paper Acceptance Rate62of341submissions,18%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 22
  Total Citations
  View Citations
- 855
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Self-Paced Cross-Modal Subspace Matching

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Meta Self-Paced Learning for Cross-Modal Matching

Multi-modal self-paced learning for image classification

Self-paced multi-label co-training