Composite Correlation Quantization for Efficient Multimodal Retrieval

Authors:
Mingsheng Long

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Yue Cao

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Jianmin Wang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Philip S. Yu

Tsinghua University & University of Illinois at Chicago, Chicago, USA

Tsinghua University & University of Illinois at Chicago, Chicago, USA
View Profile

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalJuly 2016Pages 579–588https://doi.org/10.1145/2911451.2911493

Published:07 July 2016Publication History

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Pages 579–588

ABSTRACT

Efficient similarity retrieval from large-scale multimodal database is pervasive in modern search engines and social networks. To support queries across content modalities, the system should enable cross-modal correlation and computation-efficient indexing. While hashing methods have shown great potential in achieving this goal, current attempts generally fail to learn isomorphic hash codes in a seamless scheme, that is, they embed multiple modalities in a continuous isomorphic space and separately threshold embeddings into binary codes, which incurs substantial loss of retrieval accuracy. In this paper, we approach seamless multimodal hashing by proposing a novel Composite Correlation Quantization (CCQ) model. Specifically, CCQ jointly finds correlation-maximal mappings that transform different modalities into isomorphic latent space, and learns composite quantizers that convert the isomorphic latent features into compact binary codes. An optimization framework is devised to preserve both intra-modal similarity and inter-modal correlation through minimizing both reconstruction and quantization errors, which can be trained from both paired and partially paired data in linear time. A comprehensive set of experiments clearly show the superior effectiveness and efficiency of CCQ against the state of the art hashing methods for both unimodal and cross-modal retrieval.

References

A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS. IEEE, 2006. Google ScholarDigital Library
A. Babenko and V. Lempitsky. The inverted multi-index. In CVPR, pages 3069--3076. IEEE, 2012. Google ScholarDigital Library
A. Babenko and V. Lempitsky. Additive quantization for extreme vector compression. In CVPR. IEEE, 2014. Google ScholarDigital Library
J. Besag. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, 48(3):259--320, 1986.Google Scholar
M. Bronstein, A. Bronstein, F. Michel, and N. Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In CVPR. IEEE, 2010.Google ScholarCross Ref
T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y.-T. Zheng. Nus-wide: A real-world web image database from national university of singapore. In CIVR. ACM, 2009. Google ScholarDigital Library
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.Google ScholarCross Ref
F. Feng, X. Wang, and R. Li. Cross-modal retrieval with correspondence autoencoder. In MM. ACM, 2014. Google ScholarDigital Library
Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In CVPR, 2011.Google ScholarDigital Library
Y. Hu, Z. Jin, H. Ren, D. Cai, and X. He. Iterative multi-view hashing for cross media indexing. In MM. ACM, 2014. Google ScholarDigital Library
M. J. Huiskes and M. S. Lew. The mir flickr retrieval evaluation. In ICMR. ACM, 2008. Google ScholarDigital Library
H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. TPAMI, 33(1):117--128, Jan 2011. Google ScholarDigital Library
S. Kumar and R. Udupa. Learning hash functions for cross-view similarity search. In IJCAI, 2011. Google ScholarDigital Library
Z. Lin, G. Ding, M. Hu, and J. Wang. Semantics-preserving hashing for cross-view retrieval. In CVPR, 2015.Google ScholarCross Ref
X. Lu, F. Wu, S. Tang, Z. Zhang, X. He, and Y. Zhuang. A low rank structural large margin method for cross-modal ranking. In SIGIR. ACM, 2013. Google ScholarDigital Library
L. Ma, Z. Lu, L. Shang, and H. Li. Multimodal convolutional neural networks for matching image and sentence. In ICCV, 2015. Google ScholarDigital Library
M. Norouzi and D. J. Fleet. Cartesian k-means. In CVPR. IEEE, 2013. Google ScholarDigital Library
J. C. Pereira, E. Coviello, G. Doyle, N. Rasiwasia, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. On the role of correlation and abstraction in cross-modal multimedia retrieval. TPAMI, 36(3):521--535, 2014. Google ScholarDigital Library
P. H. Schönemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1--10, 1966.Google ScholarCross Ref
J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In SIGMOD. ACM, 2013. Google ScholarDigital Library
N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. JMLR, 15:2949--2980, 2014. Google ScholarDigital Library
J. Wang, H. T. Shen, J. Song, and J. Ji. Hashing for similarity search: A survey. Arxiv, 2014.Google Scholar
Q. Wang, L. Si, and B. Shen. Learning to hash on partial multi-modal data. In IJCAI, pages 3904--3910, 2015. Google ScholarDigital Library
W. Wang, B. C. Ooi, X. Yang, D. Zhang, and Y. Zhuang. Effective multi-modal retrieval based on stacked auto-encoders. In VLDB. ACM, 2014. Google ScholarDigital Library
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, 2009.Google ScholarDigital Library
B. Wu, Q. Yang, W.-S. Zheng, Y. Wang, and J. Wang. Quantized correlation hashing for fast cross-modal search. In IJCAI, 2015. Google ScholarDigital Library
Z. Yu, F. Wu, Y. Yang, Q. Tian, J. Luo, and Y. Zhuang. Discriminative coupled dictionary hashing for fast cross-media retrieval. In SIGIR. ACM, 2014. Google ScholarDigital Library
D. Zhang and W.-J. Li. Large-scale supervised multimodal hashing with semantic correlation maximization. In AAAI, 2014. Google ScholarDigital Library
D. Zhang, F. Wang, and L. Si. Composite hashing with multiple information sources. In SIGIR. ACM, 2011. Google ScholarDigital Library
T. Zhang, C. Du, and J. Wang. Composite quantization for approximate nearest neighbor search. In ICML. ACM, 2014.Google Scholar
F. Zhao, Y. Huang, L. Wang, and T. Tan. Deep semantic ranking based hashing for multi-label image retrieval. In CVPR, 2015.Google Scholar
Y. Zhen and D.-Y. Yeung. Co-regularized hashing for multimodal data. In NIPS, 2012.Google ScholarDigital Library
Y. Zhen and D.-Y. Yeung. A probabilistic model for multimodal hash function learning. In SIGKDD. ACM, 2012. Google ScholarDigital Library
X. Zhu, Z. Huang, H. T. Shen, and X. Zhao. Linear cross-modal hashing for efficient multimedia search. In MM. ACM, 2013. Google ScholarDigital Library

Index Terms

Composite Correlation Quantization for Efficient Multimodal Retrieval
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

Correlation Autoencoder Hashing for Supervised Cross-Modal Search
ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

Due to its storage and query efficiency, hashing has been widely applied to approximate nearest neighbor search from large-scale datasets. While there is increasing interest in cross-modal hashing which facilitates cross-media retrieval by embedding ...
Read More
Latent semantic-enhanced discrete hashing for cross-modal retrieval
Abstract
Hashing methods have been proposed for the cross-modal retrieval tasks due to their flexibility and effectiveness. The main idea of cross-modal hashing is to embed heterogeneous multimedia data into common Hamming space. How to effectively exploit ...
Read More
Multimodal retrieval with relevance feedback based on genetic programming

This paper presents a framework for multimodal retrieval with relevance feedback based on genetic programming. In this supervised learning-to-rank framework, genetic programming is used for the discovery of effective combination functions of (multimodal)...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
July 2016
1296 pages
ISBN:9781450340694
DOI:10.1145/2911451
General Chairs:
Raffaele Perego
ISTI-CNR, Italy
,
Fabrizio Sebastiani
Qatar Computing Research Institute, HBKU, Qatar
,
Program Chairs:
Javed Aslam
Northeastern University, US
,
Ian Ruthven
University of Strathclyde, UK
,
Justin Zobel
University of Melbourne, Australia
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 July 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
correlation analysis
hashing
multimodal retrieval
quantization
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '16 Paper Acceptance Rate62of341submissions,18%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 110
  Total Citations
  View Citations
- 931
  Total Downloads
- Downloads (Last 12 months)119
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Composite Correlation Quantization for Efficient Multimodal Retrieval

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Correlation Autoencoder Hashing for Supervised Cross-Modal Search

Latent semantic-enhanced discrete hashing for cross-modal retrieval

Multimodal retrieval with relevance feedback based on genetic programming