Abstract
With the rapid development of big data and the Internet, cross-modal retrieval has become a popular research topic. Cross-modal hashing is an important research direction in cross-modal retrieval, due to its highly efficiency and small memory consumption. Recently, many unsupervised cross-modal hashing methods achieved great results on cross-modal retrieval tasks. However, how to narrow the heterogeneous gap between different modalities and generate more discriminative hash codes are still the main problems of unsupervised hashing. In this paper, we propose a novel unsupervised cross-modal hashing method Pseudo-label Driven Deep Hashing to solve aforementioned problems. We introduce clustering into our modal to obtain initialized semantical information called pseudo-label, and we propose a novel adjusting method that uses pseudo-labels to adjust joint-semantic similarity matrix. We construct a similarity consistency loss function that focuses on the heterogeneity gap between different modalities, and a real values and binary codes fine-tuning strategy for closing the gap between real value space and Hamming space. We conduct experiments on five datasets including three natural datasets which have larger inter-class distances and two medical datasets which have smaller inter-class distances, the results demonstrate the superiority of our method compared with several unsupervised cross-modal hashing methods.
Similar content being viewed by others
Data availability
The Wiki, MRIFlickr and NUS-WIDE data that support the findings of this study are available in UC San Diego, LIACS Media Research Group and National University of Singapore, http://www.svcl.ucsd.edu/projects/crossmodal/, https://press.liacs.nl/mirflickr/mirdownload.html, https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html The FH and Ultrasound datasets generated during and/or analysed during the current study are not publicly available due to research needs but are available from the corresponding author on reasonable request.
References
Arthur D, Vassilvitskii S (2006) k-means++: the advantages of careful seeding. Tech. rep, Stanford
Bronstein MM, Bronstein AM, Michel F et al (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 3594–3601
Cao Y, Long M, Wang J et al (2016) Correlation autoencoder hashing for supervised cross-modal search. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval, pp 197–204
Chua TS, Tang J, Hong R et al (2009) Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM international conference on image and video retrieval, pp 1–9
Deng J, Dong W, Socher R et al (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Ieee, pp 248–255
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082
Fang X, Liu Z, Han N et al (2021) Discrete matrix factorization hashing for cross-modal retrieval. Int J Mach Learn Cybern 12(10):3023–3036
Feng F, Wang X, Li R (2014) Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the 22nd ACM international conference on multimedia, pp 7–16
Hu D, Nie F, Li X (2018) Deep binary reconstruction for cross-modal hashing. IEEE Trans Multimedia 21(4):973–985
Hu P, Peng D, Wang X et al (2019) Multimodal adversarial network for cross-modal retrieval. Knowl-Based Syst 180:38–50
Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on Multimedia information retrieval, pp 39–43
Jiang QY, Li WJ (2017) Deep cross-modal hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3232–3240
Kang C, Xiang S, Liao S et al (2015) Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans Multimedia 17(3):370–381
Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: Twenty-second international joint conference on artificial intelligence
Li C, Deng C, Li N et al (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4242–4251
Li Y, Hu P, Liu Z et al (2021) Contrastive clustering. In: Proceedings of the AAAI conference on artificial intelligence, pp 8547–8555
Lin Z, Ding G, Hu M et al (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3864–3872
Lin Z, Ding G, Han J et al (2016) Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans Cybern 47(12):4342–4355
Lin Q, Cao W, He Z et al (2020) Mask cross-modal hashing networks. IEEE Trans Multimedia 23:550–558
Liu H, Lin M, Zhang S et al (2018) Dense auto-encoder hashing for robust cross-modality retrieval. In: Proceedings of the 26th ACM international conference on multimedia, pp 1589–1597
Liu Y, Guo Y, Liu L et al (2019) Cyclematch: a cycle-consistent embedding network for image-text matching. Pattern Recogn 93:365–379
Liu S, Qian S, Guan Y et al (2020) Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 1379–1388
Lu X, Zhu L, Cheng Z et al (2019a) Flexible online multi-modal hashing for large-scale multimedia retrieval. In: Proceedings of the 27th ACM international conference on multimedia, pp 1129–1137
Lu X, Zhu L, Cheng Z et al (2019b) Efficient discrete latent semantic hashing for scalable cross-modal retrieval. Signal Process 154:217–231
Pereira JC, Coviello E, Doyle G et al (2013) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535
Rasiwasia N, Costa Pereira J, Coviello E et al (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia, pp 251–260
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Song J, Yang Y, Yang Y et al (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 785–796
Su S, Zhong Z, Zhang C (2019) Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3027–3035
Wang B, Yang Y, Xu X et al (2017) Adversarial cross-modal retrieval. In: Proceedings of the 25th ACM international conference on Multimedia, pp 154–162
Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. Adv Neural Inf Process Syst 21
Wu B, Yang Q, Zheng WS et al (2015a) Quantized correlation hashing for fast cross-modal search. In: Twenty-fourth international joint conference on artificial intelligence
Wu F, Jiang X, Li X et al (2015b) Cross-modal learning to rank via latent joint representation. IEEE Trans Image Process 24(5):1497–1509
Wu G, Lin Z, Han J et al (2018) Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI, p 5
Xie D, Deng C, Li C et al (2020) Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Trans Image Process 29:3626–3637
Xu X, Shen F, Yang Y et al (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process 26(5):2494–2507
Yang D, Wu D, Zhang W et al (2020) Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: Proceedings of the 2020 international conference on multimedia retrieval, pp 44–52
Yang F, Ding X, Liu Y et al (2022a) Scalable semantic-enhanced supervised hashing for cross-modal retrieval. Knowl-Based Syst:109176
Yang F, Liu Y, Ding X et al (2022b) Asymmetric cross-modal hashing with high-level semantic similarity. Pattern Recogn 130:108823
Ye Z, Peng Y (2018) Multi-scale correlation for sequential cross-modal hashing learning. In: Proceedings of the 26th ACM international conference on Multimedia, pp 852–860
Yu J, Zhou H, Zhan Y et al (2021) Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing. In: Proceedings of the AAAI conference on artificial intelligence, pp 4626–4634
Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the AAAI conference on artificial intelligence
Zhang J, Peng Y, Yuan M (2018) Unsupervised generative adversarial cross-modal hashing. In: Proceedings of the AAAI conference on artificial intelligence
Zheng W, Liu H, Wang B et al (2020) Cross-modal learning for material perception using deep extreme learning machine. Int J Mach Learn Cybern 11(4):813–823
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, pp 415–424
Zhu X, Huang Z, Shen HT et al (2013) Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM international conference on multimedia, pp 143–152
Acknowledgements
The authors would like to thank the anonymous reviewers for their help. This work was supported by the National Natural Science Foundation of China (Grant no. 62076044), the Chongqing Talent Plan (Grant no. cstc2022ycjh-bgzxm0160), Natural Science Foundation of Chongqing, China (Grant no. cstc2019jcyj-zdxm0011) and Chongqing Graduate Research Innovation Project of China (Grant no. CYS21307).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zeng, X., Xu, K. & Xie, Y. Pseudo-label driven deep hashing for unsupervised cross-modal retrieval. Int. J. Mach. Learn. & Cyber. 14, 3437–3456 (2023). https://doi.org/10.1007/s13042-023-01842-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-023-01842-5