Pseudo-label driven deep hashing for unsupervised cross-modal retrieval

Zeng, XianHua; Xu, Ke; Xie, YiCai

doi:10.1007/s13042-023-01842-5

Pseudo-label driven deep hashing for unsupervised cross-modal retrieval

Original Article
Published: 11 May 2023

Volume 14, pages 3437–3456, (2023)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

455 Accesses
Explore all metrics

Abstract

With the rapid development of big data and the Internet, cross-modal retrieval has become a popular research topic. Cross-modal hashing is an important research direction in cross-modal retrieval, due to its highly efficiency and small memory consumption. Recently, many unsupervised cross-modal hashing methods achieved great results on cross-modal retrieval tasks. However, how to narrow the heterogeneous gap between different modalities and generate more discriminative hash codes are still the main problems of unsupervised hashing. In this paper, we propose a novel unsupervised cross-modal hashing method Pseudo-label Driven Deep Hashing to solve aforementioned problems. We introduce clustering into our modal to obtain initialized semantical information called pseudo-label, and we propose a novel adjusting method that uses pseudo-labels to adjust joint-semantic similarity matrix. We construct a similarity consistency loss function that focuses on the heterogeneity gap between different modalities, and a real values and binary codes fine-tuning strategy for closing the gap between real value space and Hamming space. We conduct experiments on five datasets including three natural datasets which have larger inter-class distances and two medical datasets which have smaller inter-class distances, the results demonstrate the superiority of our method compared with several unsupervised cross-modal hashing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep noise mitigation and semantic reconstruction hashing for unsupervised cross-modal retrieval

Article 03 January 2024

Label-Based Deep Semantic Hashing for Cross-Modal Retrieval

Unsupervised multi-perspective fusing semantic alignment for cross-modal hashing retrieval

Article 09 January 2024

Data availability

The Wiki, MRIFlickr and NUS-WIDE data that support the findings of this study are available in UC San Diego, LIACS Media Research Group and National University of Singapore, http://www.svcl.ucsd.edu/projects/crossmodal/, https://press.liacs.nl/mirflickr/mirdownload.html, https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html The FH and Ultrasound datasets generated during and/or analysed during the current study are not publicly available due to research needs but are available from the corresponding author on reasonable request.

References

Arthur D, Vassilvitskii S (2006) k-means++: the advantages of careful seeding. Tech. rep, Stanford
Bronstein MM, Bronstein AM, Michel F et al (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 3594–3601
Cao Y, Long M, Wang J et al (2016) Correlation autoencoder hashing for supervised cross-modal search. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval, pp 197–204
Chua TS, Tang J, Hong R et al (2009) Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM international conference on image and video retrieval, pp 1–9
Deng J, Dong W, Socher R et al (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Ieee, pp 248–255
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082
Fang X, Liu Z, Han N et al (2021) Discrete matrix factorization hashing for cross-modal retrieval. Int J Mach Learn Cybern 12(10):3023–3036
Article Google Scholar
Feng F, Wang X, Li R (2014) Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the 22nd ACM international conference on multimedia, pp 7–16
Hu D, Nie F, Li X (2018) Deep binary reconstruction for cross-modal hashing. IEEE Trans Multimedia 21(4):973–985
Article Google Scholar
Hu P, Peng D, Wang X et al (2019) Multimodal adversarial network for cross-modal retrieval. Knowl-Based Syst 180:38–50
Article Google Scholar
Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on Multimedia information retrieval, pp 39–43
Jiang QY, Li WJ (2017) Deep cross-modal hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3232–3240
Kang C, Xiang S, Liao S et al (2015) Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans Multimedia 17(3):370–381
Article Google Scholar
Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: Twenty-second international joint conference on artificial intelligence
Li C, Deng C, Li N et al (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4242–4251
Li Y, Hu P, Liu Z et al (2021) Contrastive clustering. In: Proceedings of the AAAI conference on artificial intelligence, pp 8547–8555
Lin Z, Ding G, Hu M et al (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3864–3872
Lin Z, Ding G, Han J et al (2016) Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans Cybern 47(12):4342–4355
Article Google Scholar
Lin Q, Cao W, He Z et al (2020) Mask cross-modal hashing networks. IEEE Trans Multimedia 23:550–558
Article Google Scholar
Liu H, Lin M, Zhang S et al (2018) Dense auto-encoder hashing for robust cross-modality retrieval. In: Proceedings of the 26th ACM international conference on multimedia, pp 1589–1597
Liu Y, Guo Y, Liu L et al (2019) Cyclematch: a cycle-consistent embedding network for image-text matching. Pattern Recogn 93:365–379
Article Google Scholar
Liu S, Qian S, Guan Y et al (2020) Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 1379–1388
Lu X, Zhu L, Cheng Z et al (2019a) Flexible online multi-modal hashing for large-scale multimedia retrieval. In: Proceedings of the 27th ACM international conference on multimedia, pp 1129–1137
Lu X, Zhu L, Cheng Z et al (2019b) Efficient discrete latent semantic hashing for scalable cross-modal retrieval. Signal Process 154:217–231
Article Google Scholar
Pereira JC, Coviello E, Doyle G et al (2013) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535
Article Google Scholar
Rasiwasia N, Costa Pereira J, Coviello E et al (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia, pp 251–260
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Song J, Yang Y, Yang Y et al (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 785–796
Su S, Zhong Z, Zhang C (2019) Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3027–3035
Wang B, Yang Y, Xu X et al (2017) Adversarial cross-modal retrieval. In: Proceedings of the 25th ACM international conference on Multimedia, pp 154–162
Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. Adv Neural Inf Process Syst 21
Wu B, Yang Q, Zheng WS et al (2015a) Quantized correlation hashing for fast cross-modal search. In: Twenty-fourth international joint conference on artificial intelligence
Wu F, Jiang X, Li X et al (2015b) Cross-modal learning to rank via latent joint representation. IEEE Trans Image Process 24(5):1497–1509
Article MathSciNet MATH Google Scholar
Wu G, Lin Z, Han J et al (2018) Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI, p 5
Xie D, Deng C, Li C et al (2020) Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Trans Image Process 29:3626–3637
Article MATH Google Scholar
Xu X, Shen F, Yang Y et al (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process 26(5):2494–2507
Article MathSciNet MATH Google Scholar
Yang D, Wu D, Zhang W et al (2020) Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: Proceedings of the 2020 international conference on multimedia retrieval, pp 44–52
Yang F, Ding X, Liu Y et al (2022a) Scalable semantic-enhanced supervised hashing for cross-modal retrieval. Knowl-Based Syst:109176
Yang F, Liu Y, Ding X et al (2022b) Asymmetric cross-modal hashing with high-level semantic similarity. Pattern Recogn 130:108823
Article Google Scholar
Ye Z, Peng Y (2018) Multi-scale correlation for sequential cross-modal hashing learning. In: Proceedings of the 26th ACM international conference on Multimedia, pp 852–860
Yu J, Zhou H, Zhan Y et al (2021) Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing. In: Proceedings of the AAAI conference on artificial intelligence, pp 4626–4634
Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the AAAI conference on artificial intelligence
Zhang J, Peng Y, Yuan M (2018) Unsupervised generative adversarial cross-modal hashing. In: Proceedings of the AAAI conference on artificial intelligence
Zheng W, Liu H, Wang B et al (2020) Cross-modal learning for material perception using deep extreme learning machine. Int J Mach Learn Cybern 11(4):813–823
Article Google Scholar
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, pp 415–424
Zhu X, Huang Z, Shen HT et al (2013) Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM international conference on multimedia, pp 143–152

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their help. This work was supported by the National Natural Science Foundation of China (Grant no. 62076044), the Chongqing Talent Plan (Grant no. cstc2022ycjh-bgzxm0160), Natural Science Foundation of Chongqing, China (Grant no. cstc2019jcyj-zdxm0011) and Chongqing Graduate Research Innovation Project of China (Grant no. CYS21307).

Author information

Authors and Affiliations

Chongqing Key Laboratory of Image Cognition, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
XianHua Zeng, Ke Xu & YiCai Xie

Authors

XianHua Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Ke Xu
View author publications
You can also search for this author in PubMed Google Scholar
YiCai Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to XianHua Zeng.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zeng, X., Xu, K. & Xie, Y. Pseudo-label driven deep hashing for unsupervised cross-modal retrieval. Int. J. Mach. Learn. & Cyber. 14, 3437–3456 (2023). https://doi.org/10.1007/s13042-023-01842-5

Download citation

Received: 19 July 2022
Accepted: 16 April 2023
Published: 11 May 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s13042-023-01842-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pseudo-label driven deep hashing for unsupervised cross-modal retrieval

Abstract

Access this article

Similar content being viewed by others

Deep noise mitigation and semantic reconstruction hashing for unsupervised cross-modal retrieval

Label-Based Deep Semantic Hashing for Cross-Modal Retrieval

Unsupervised multi-perspective fusing semantic alignment for cross-modal hashing retrieval

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pseudo-label driven deep hashing for unsupervised cross-modal retrieval

Abstract

Access this article

Similar content being viewed by others

Deep noise mitigation and semantic reconstruction hashing for unsupervised cross-modal retrieval

Label-Based Deep Semantic Hashing for Cross-Modal Retrieval

Unsupervised multi-perspective fusing semantic alignment for cross-modal hashing retrieval

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation