skip to main content
10.1145/2911451.2911527acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Self-Paced Cross-Modal Subspace Matching

Published:07 July 2016Publication History

ABSTRACT

Cross-modal matching methods match data from different modalities according to their similarities. Most existing methods utilize label information to reduce the semantic gap between different modalities. However, it is usually time-consuming to manually label large-scale data. This paper proposes a Self-Paced Cross-Modal Subspace Matching (SCSM) method for unsupervised multimodal data. We assume that multimodal data are pair-wised and from several semantic groups, which form hard pair-wised constraints and soft semantic group constraints respectively. Then, we formulate the unsupervised cross-modal matching problem as a non-convex joint feature learning and data grouping problem. Self-paced learning, which learns samples from 'easy' to 'complex', is further introduced to refine the grouping result. Moreover, a multimodal graph is constructed to preserve the relationship of both inter- and intra-modality similarity. An alternating minimization method is employed to minimize the non-convex optimization problem, followed by the discussion on its convergence analysis and computational complexity. Experimental results on four multimodal databases show that SCSM outperforms state-of-the-art cross-modal subspace learning methods.

References

  1. G. Andrew, R. Arora, J. Bilmes, and K. Livescu. Deep canonical correlation analysis. In ICML, pages 1247--1255, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. Bach and Z. Harchaoui. Diffrac: a discriminative and flexible framework for clustering. In NIPS, pages 49--56, 2008.Google ScholarGoogle Scholar
  3. S. Basu and J. Christensen. Teaching classification boundaries to humans. In AAAI, pages 109--115, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In ICML, pages 41--48. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. M. Blei and M. I. Jordan. Modeling annotated data. In SIGIR, pages 127--134, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Costa Pereira, E. Coviello, G. Doyle, N. Rasiwasia, G. R. Lanckriet, R. Levy, and N. Vasconcelos. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE TPAMI, 36(3):521--535, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Dhillon, D. P. Foster, and L. H. Ungar. Multi-view learning of word embeddings via cca. In NIPS, pages 199--207, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. S. Dhillon, J. Rodu, D. P. Foster, and L. H. Ungar. Two step cca: A new spectral method for estimating vector models of words. In ICML, pages 1551--1558, 2012.Google ScholarGoogle Scholar
  10. M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. IJCV, 88(2):303--338, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Gong, Q. Ke, M. Isard, and S. Lazebnik. A multi-view embedding space for modeling internet images, tags, and their semantics. IJCV, 106(2):210--233, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12):2639--2664, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. He, M. Zhang, L. Wang, Y. Ji, and Q. Yin. Cross-modal subspace learning via pairwise constraints. IEEE TIP, 24(12):5543--5556, 2015.Google ScholarGoogle Scholar
  14. X. He and P. Niyogi. Locality preserving projections. In NIPS, pages 153--160, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. J. Hwang and K. Grauman. Reading between the lines: Object localization using implicit cues from image tags. IEEE TPAMI, 34(6):1145--1158, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Jia, M. Salzmann, and T. Darrell. Learning cross-modality similarity for multinomial data. In ICCV, pages 2407--2414, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Jiang, D. Meng, T. Mitamura, and A. G. Hauptmann. Easy samples first: Self-paced reranking for zero-example multimedia search. In MM, pages 547--556, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Kang, S. Xiang, S. Liao, C. Xu, and C. Pan. Learning consistent feature representation for cross-modal multimedia retrieval. IEEE TMM, 17(3):370--381, 2015.Google ScholarGoogle Scholar
  19. R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba, and S. Fidler. Skip-thought vectors. In NIPS, pages 3276--3284, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. P. Kumar, B. Packer, and D. Koller. Self-paced learning for latent variable models. In NIPS, pages 1189--1197, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Li, S. Shan, X. Chen, B. Ma, S. Yan, and W. Gao. Cross-pose face recognition by canonical correlation analysis. arXiv preprint: 1507.08076, 2015.Google ScholarGoogle Scholar
  22. J. Liang, R. He, Z. Sun, and T. Tan. Group-invariant cross-modal subspace learning. In IJCAI, 2016.Google ScholarGoogle Scholar
  23. D. Lin and X. Tang. Inter-modality face recognition. In ECCV, pages 13--26. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Masci, M. M. Bronstein, A. Bronstein, and J. Schmidhuber. Multimodal similarity-preserving hashing. IEEE TPAMI, 36(4):824--830, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Meng and Q. Zhao. What objective does self-paced learning indeed optimize? arXiv preprint:1511.06049, 2015.Google ScholarGoogle Scholar
  26. A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, 42(3):145--175, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Ozdemir and L. S. Davis. A probabilistic framework for multimodal retrieval using integrative indian buffet process. In NIPS, pages 2384--2392, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Putthividhy, H. T. Attias, and S. S. Nagarajan. Topic regression multi-modal latent dirichlet allocation for image annotation. In CVPR, pages 3408--3415, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  29. N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. R. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In MM, pages 251--260, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Rasiwasia, P. J. Moreno, and N. Vasconcelos. Bridging the gap: Query by semantic example. IEEE TMM, 9(5):923--938, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Rosipal and N. Krämer. Overview and recent advances in partial least squares. In Subspace, Latent Structure and Feature Selection, pages 34--51. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. Labelme: a database and web-based tool for image annotation. IJCV, 77(1--3):157--173, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Sharma, A. Kumar, H. Daume III, and D. W. Jacobs. Generalized multiview analysis: A discriminative latent space. In CVPR, pages 2160--2167, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. F. Shen, C. Shen, W. Liu, and H. Tao Shen. Supervised discrete hashing. In CVPR, pages 37--45, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  35. N. Srivastava and R. R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In NIPS, pages 2222--2230, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. K. Wang, R. He, L. Wang, W. Wang, and T. Tan. Joint feature selection and subspace learning for cross-modal retrieval. IEEE TPAMI, 2015. doi:10.1109/TPAMI.2015.2505311.Google ScholarGoogle Scholar
  37. K. Wang, R. He, W. Wang, L. Wang, and T. Tan. Learning coupled feature spaces for cross-modal matching. In ICCV, pages 2088--2095, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Y. Wei, Y. Zhao, C. Lu, S. Wei, L. Liu, Z. Zhu, and S. Yan. Cross-modal retrieval with cnn visual features: A new baseline. IEEE TCYB, 2016. doi:10.1109/TCYB.2016.2519449.Google ScholarGoogle ScholarCross RefCross Ref
  39. B. Wu, Q. Yang, W.-S. Zheng, Y. Wang, and J. Wang. Quantized correlation hashing for fast cross-modal search. In IJCAI, pages 3946--3952, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Z. Yu, F. Wu, Y. Yang, Q. Tian, J. Luo, and Y. Zhuang. Discriminative coupled dictionary hashing for fast cross-media retrieval. In SIGIR, pages 395--404, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. T. Zhang and J. Wang. Collaborative quantization for cross-modal similarity search. In CVPR, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  42. Q. Zhao, D. Meng, L. Jiang, Q. Xie, Z. Xu, and A. G. Hauptmann. Self-paced learning for matrix factorization. In AAAI, pages 3196--3202, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Zhou, G. Ding, and Y. Guo. Latent semantic sparse hashing for cross-modal similarity search. In SIGIR, pages 415--424, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Y. T. Zhuang, Y. F. Wang, F. Wu, Y. Zhang, and W. M. Lu. Supervised coupled dictionary learning with group structures for multi-modal retrieval. In AAAI, pages 1070--1076, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Self-Paced Cross-Modal Subspace Matching

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
        July 2016
        1296 pages
        ISBN:9781450340694
        DOI:10.1145/2911451

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 July 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGIR '16 Paper Acceptance Rate62of341submissions,18%Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader