ABSTRACT
Multi-modal hashing is an effective technique to support large-scale multimedia retrieval, due to its capability of encoding heterogeneous multi-modal features into compact and similarity-preserving binary codes. Although great progress has been achieved so far, existing methods still suffer from several problems, including: 1) All existing methods simply adopt fixed modality combination weights in online hashing process to generate the query hash codes. This strategy cannot adaptively capture the variations of different queries. 2) They either suffer from insufficient semantics (for unsupervised methods) or require high computation and storage cost (for the supervised methods, which rely on pair-wise semantic matrix). 3) They solve the hash codes with relaxed optimization strategy or bit-by-bit discrete optimization, which results in significant quantization loss or consumes considerable computation time. To address the above limitations, in this paper, we propose an Online Multi-modal Hashing with Dynamic Query-adaption (OMH-DQ) method in a novel fashion. Specifically, a self-weighted fusion strategy is designed to adaptively preserve the multi-modal feature information into hash codes by exploiting their complementarity. The hash codes are learned with the supervision of pair-wise semantic labels to enhance their discriminative capability, while avoiding the challenging symmetric similarity matrix factorization. Under such learning framework, the binary hash codes can be directly obtained with efficient operations and without quantization errors. Accordingly, our method can benefit from the semantic labels, and simultaneously, avoid the high computation complexity. Moreover, to accurately capture the query variations, at the online retrieval stage, we design a parameter-free online hashing module which can adaptively learn the query hash codes according to the dynamic query contents. Extensive experiments demonstrate the state-of-the-art performance of the proposed approach from various aspects.
Supplemental Material
- Micael Carvalho, Rémi Cadène, David Picard, Laure Soulier, Nicolas Thome, and Matthieu Cord. 2018. Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings. In SIGIR. 135--44. Google ScholarDigital Library
- Suthee Chaidaroon, Travis Ebesu, and Yi Fang. 2018. Deep Semantic Text Hashing with Weak Supervision. In SIGIR. 1109--1112. Google ScholarDigital Library
- Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: a real-world web image database from National University of Singapore. In CIVR. 48. Google ScholarDigital Library
- Guiguang Ding, Yuchen Guo, and Jile Zhou. 2014. Collective matrix factorization hashing for multimodal data. In CVPR. 2075--2082. Google ScholarDigital Library
- Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2013. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. TPAMI, Vol. 35, 12 (2013), 2916--2929. Google ScholarDigital Library
- Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr retrieval evaluation. In SIGMM. 39--43. Google ScholarDigital Library
- Saehoon Kim and Seungjin Choi. 2013. Multi-view anchor graph hashing. In ICASSP. 3123--3127.Google Scholar
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. 740--755.Google Scholar
- Zhouchen Lin, Minming Chen, and Yi Ma. 2010. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint arXiv:1009.5055 (2010).Google Scholar
- Zijia Lin, Guiguang Ding, Jungong Han, and Jianmin Wang. 2017. Cross-view retrieval via probability-based semantics-preserving hashing. TCYB, Vol. 47, 12 (2017), 4342--4355.Google Scholar
- Han Liu, Xiangnan He, Fuli Feng, Liqiang Nie, Rui Liu, and Hanwang Zhang. 2018. Discrete Factorization Machines for Fast Feature-based Recommendation. In IJCAI. 3449--3455. Google ScholarDigital Library
- Li Liu, Mengyang Yu, and Ling Shao. 2015. Multiview alignment hashing for efficient image search. TIP, Vol. 24, 3 (2015), 956--966.Google ScholarDigital Library
- Xianglong Liu, Junfeng He, Di Liu, and Bo Lang. 2012. Compact kernel hashing with multiple features. In ACM MM. 881--884. Google ScholarDigital Library
- Fuchen Long, Ting Yao, Qi Dai, Xinmei Tian, Jiebo Luo, and Tao Mei. 2018. Deep Domain Adaptation Hashing with Adversarial Learning. In SIGIR. 725--734. Google ScholarDigital Library
- Katta G. Murty. 2013. Nonlinear Programming: Theory and Algorithms 3rd ed.). Wiley Publishing. Google ScholarDigital Library
- Fumin Shen, Xin Gao, Li Liu, Yang Yang, and Heng Tao Shen. 2017. Deep Asymmetric Pairwise Hashing. In ACM MM. 1522--1530. Google ScholarDigital Library
- Fumin Shen, Chunhua Shen, Wei Liu, and Heng Tao Shen. 2015. Supervised discrete hashing. In CVPR. 37--45.Google Scholar
- Xiaobo Shen, Funmin Shen, Liliu, Yunhao Yuan, Weiwei Liu, and Quansen Sun. 2018. Multiview Discrete Hashing for Scalable Multimedia Search. ACM TIST, Vol. 9, 5 (2018), 53. Google ScholarDigital Library
- Xiaobo Shen, Fumin Shen, Quan-Sen Sun, and Yunhao Yuan. 2015. Multi-view latent hashing for efficient multimedia search. In ACM MM. 831--834. Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, Vol. abs/1409.1556 (2014).Google Scholar
- Jingkuan Song, Yi Yang, Zi Huang, Heng Tao Shen, and Jiebo Luo. 2013. Effective multiple feature hashing for large-scale near-duplicate video retrieval. TMM, Vol. 15, 8 (2013), 1997--2008. Google ScholarDigital Library
- Jingkuan Song, Yang Yang, Yi Yang, Zi Huang, and Heng Tao Shen. 2013. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In SIGMOD. 785--796. Google ScholarDigital Library
- Di Wang, Xinbo Gao, Xiumei Wang, and Lihuo He. 2018. Label Consistent Matrix Factorization Hashing for Large-Scale Cross-Modal Similarity Search. TPAMI (2018).Google Scholar
- Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, and Heng Tao Shen. 2018. A Survey on Learning to Hash. TPAMI, Vol. 40, 4 (2018), 769--790.Google ScholarCross Ref
- Liang Xie, Jialie Shen, Jungong Han, Lei Zhu, and Ling Shao. 2017. Dynamic Multi-View Hashing for Online Image Retrieval. In IJCAI. 3133--3139. Google ScholarDigital Library
- Liang Xie, Jialie Shen, and Lei Zhu. 2016. Online Cross-Modal Hashing for Web Image Retrieval. In AAAI. 294--300. Google ScholarDigital Library
- Erkun Yang, Cheng Deng, Wei Liu, Xianglong Liu, Dacheng Tao, and Xinbo Gao. 2017. Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval. In AAAI. 1618--1625. Google ScholarDigital Library
- Rui Yang, Yuliang Shi, and Xin-Shun Xu. 2017. Discrete Multi-view Hashing for Effective Image Retrieval. In ICMR. 175--783. Google ScholarDigital Library
- Dongqing Zhang and Wu-Jun Li. 2014. Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization. In AAAI. 2177--2183. Google ScholarDigital Library
- Dan Zhang, Fei Wang, and Luo Si. 2011. Composite hashing with multiple information sources. In SIGIR. 225--234. Google ScholarDigital Library
- Hanwang Zhang, Fumin Shen, Wei Liu, Xiangnan He, Huanbo Luan, and Tat-Seng Chua. 2016. Discrete Collaborative Filtering. In SIGIR. 325--334. Google ScholarDigital Library
- Hanwang Zhang, Meng Wang, Richang Hong, and Tat-Seng Chua. 2016. Play and Rewind: Optimizing Binary Representations of Videos by Self-Supervised Temporal Hashing. In MM. 781--790. Google ScholarDigital Library
- Hanwang Zhang, Na Zhao, Xindi Shang, Huan-Bo Luan, and Tat-Seng Chua. 2016. Discrete Image Hashing Using Large Weakly Annotated Photo Collections. In AAAI. 3669--3675. Google ScholarDigital Library
- Peichao Zhang, Wei Zhang, Wu-Jun Li, and Minyi Guo. 2014. Supervised hashing with latent factor models. In SIGIR. 173--182. Google ScholarDigital Library
- Xi Zhang, Siyu Zhou, Jiashi Feng, Hanjiang Lai, Bo Li, Yan Pan, Jian Yin, and Shuicheng Yan. 2017. HashGAN: Attention-aware Deep Adversarial Hashing for Cross Modal Retrieval. CoRR, Vol. abs/1711.09347 (2017).Google Scholar
- Han Zhu, Mingsheng Long, Jianmin Wang, and Yue Cao. 2016. Deep Hashing Network for Efficient Similarity Retrieval. In AAAI. 2415--2421. Google ScholarDigital Library
- Lei Zhu, Zi Huang, Xiaojun Chang, Jingkuan Song, and Heng Tao Shen. 2017. Exploring Consistent Preferences: Discrete Hashing with Pair-Exemplar for Scalable Landmark Search. In MM. 726--734. Google ScholarDigital Library
- Lei Zhu, Zi Huang, Zhihui Li, Liang Xie, and Heng Tao Shen. 2018. Exploring Auxiliary Context: Discrete Semantic Transfer Hashing for Scalable Image Retrieval. TNNLS, Vol. 29, 11 (2018), 5264--5276.Google Scholar
- Lei Zhu, Jialie Shen, Xiaobai Liu, Liang Xie, and Liqiang Nie. 2016. Learning Compact Visual Representation with Canonical Views for Robust Mobile Landmark Search. In IJCAI. 3959--3967. Google ScholarDigital Library
- Lei Zhu, Jialie Shen, Liang Xie, and Zhiyong Cheng. 2017. Unsupervised visual hashing with semantic assistant for content-based image retrieval. TKDE, Vol. 29, 2 (2017), 472--486. Google ScholarDigital Library
Index Terms
- Online Multi-modal Hashing with Dynamic Query-adaption
Recommendations
Flexible Multi-modal Hashing for Scalable Multimedia Retrieval
Survey Paper and Regular PaperMulti-modal hashing methods could support efficient multimedia retrieval by combining multi-modal features for binary hash learning at the both offline training and online query stages. However, existing multi-modal methods cannot binarize the queries, ...
Flexible Online Multi-modal Hashing for Large-scale Multimedia Retrieval
MM '19: Proceedings of the 27th ACM International Conference on MultimediaMulti-modal hashing fuses multi-modal features at both offline training and online query stage for compact binary hash learning. It has aroused extensive attention in research filed of efficient large-scale multimedia retrieval. However, existing ...
Efficient Multi-modal Hashing with Online Query Adaption for Multimedia Retrieval
Multi-modal hashing supports efficient multimedia retrieval well. However, existing methods still suffer from two problems: (1) Fixed multi-modal fusion. They collaborate the multi-modal features with fixed weights for hash learning, which cannot ...
Comments