Skip to main content
Log in

Pseudo-label driven deep hashing for unsupervised cross-modal retrieval

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

With the rapid development of big data and the Internet, cross-modal retrieval has become a popular research topic. Cross-modal hashing is an important research direction in cross-modal retrieval, due to its highly efficiency and small memory consumption. Recently, many unsupervised cross-modal hashing methods achieved great results on cross-modal retrieval tasks. However, how to narrow the heterogeneous gap between different modalities and generate more discriminative hash codes are still the main problems of unsupervised hashing. In this paper, we propose a novel unsupervised cross-modal hashing method Pseudo-label Driven Deep Hashing to solve aforementioned problems. We introduce clustering into our modal to obtain initialized semantical information called pseudo-label, and we propose a novel adjusting method that uses pseudo-labels to adjust joint-semantic similarity matrix. We construct a similarity consistency loss function that focuses on the heterogeneity gap between different modalities, and a real values and binary codes fine-tuning strategy for closing the gap between real value space and Hamming space. We conduct experiments on five datasets including three natural datasets which have larger inter-class distances and two medical datasets which have smaller inter-class distances, the results demonstrate the superiority of our method compared with several unsupervised cross-modal hashing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The Wiki, MRIFlickr and NUS-WIDE data that support the findings of this study are available in UC San Diego, LIACS Media Research Group and National University of Singapore, http://www.svcl.ucsd.edu/projects/crossmodal/, https://press.liacs.nl/mirflickr/mirdownload.html, https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html The FH and Ultrasound datasets generated during and/or analysed during the current study are not publicly available due to research needs but are available from the corresponding author on reasonable request.

References

  1. Arthur D, Vassilvitskii S (2006) k-means++: the advantages of careful seeding. Tech. rep, Stanford

  2. Bronstein MM, Bronstein AM, Michel F et al (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 3594–3601

  3. Cao Y, Long M, Wang J et al (2016) Correlation autoencoder hashing for supervised cross-modal search. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval, pp 197–204

  4. Chua TS, Tang J, Hong R et al (2009) Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM international conference on image and video retrieval, pp 1–9

  5. Deng J, Dong W, Socher R et al (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Ieee, pp 248–255

  6. Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082

  7. Fang X, Liu Z, Han N et al (2021) Discrete matrix factorization hashing for cross-modal retrieval. Int J Mach Learn Cybern 12(10):3023–3036

    Article  Google Scholar 

  8. Feng F, Wang X, Li R (2014) Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the 22nd ACM international conference on multimedia, pp 7–16

  9. Hu D, Nie F, Li X (2018) Deep binary reconstruction for cross-modal hashing. IEEE Trans Multimedia 21(4):973–985

    Article  Google Scholar 

  10. Hu P, Peng D, Wang X et al (2019) Multimodal adversarial network for cross-modal retrieval. Knowl-Based Syst 180:38–50

    Article  Google Scholar 

  11. Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on Multimedia information retrieval, pp 39–43

  12. Jiang QY, Li WJ (2017) Deep cross-modal hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3232–3240

  13. Kang C, Xiang S, Liao S et al (2015) Learning consistent feature representation for cross-modal multimedia retrieval. IEEE Trans Multimedia 17(3):370–381

    Article  Google Scholar 

  14. Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: Twenty-second international joint conference on artificial intelligence

  15. Li C, Deng C, Li N et al (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4242–4251

  16. Li Y, Hu P, Liu Z et al (2021) Contrastive clustering. In: Proceedings of the AAAI conference on artificial intelligence, pp 8547–8555

  17. Lin Z, Ding G, Hu M et al (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3864–3872

  18. Lin Z, Ding G, Han J et al (2016) Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans Cybern 47(12):4342–4355

    Article  Google Scholar 

  19. Lin Q, Cao W, He Z et al (2020) Mask cross-modal hashing networks. IEEE Trans Multimedia 23:550–558

    Article  Google Scholar 

  20. Liu H, Lin M, Zhang S et al (2018) Dense auto-encoder hashing for robust cross-modality retrieval. In: Proceedings of the 26th ACM international conference on multimedia, pp 1589–1597

  21. Liu Y, Guo Y, Liu L et al (2019) Cyclematch: a cycle-consistent embedding network for image-text matching. Pattern Recogn 93:365–379

    Article  Google Scholar 

  22. Liu S, Qian S, Guan Y et al (2020) Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 1379–1388

  23. Lu X, Zhu L, Cheng Z et al (2019a) Flexible online multi-modal hashing for large-scale multimedia retrieval. In: Proceedings of the 27th ACM international conference on multimedia, pp 1129–1137

  24. Lu X, Zhu L, Cheng Z et al (2019b) Efficient discrete latent semantic hashing for scalable cross-modal retrieval. Signal Process 154:217–231

    Article  Google Scholar 

  25. Pereira JC, Coviello E, Doyle G et al (2013) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535

    Article  Google Scholar 

  26. Rasiwasia N, Costa Pereira J, Coviello E et al (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia, pp 251–260

  27. Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747

  28. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  29. Song J, Yang Y, Yang Y et al (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 785–796

  30. Su S, Zhong Z, Zhang C (2019) Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3027–3035

  31. Wang B, Yang Y, Xu X et al (2017) Adversarial cross-modal retrieval. In: Proceedings of the 25th ACM international conference on Multimedia, pp 154–162

  32. Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. Adv Neural Inf Process Syst 21

  33. Wu B, Yang Q, Zheng WS et al (2015a) Quantized correlation hashing for fast cross-modal search. In: Twenty-fourth international joint conference on artificial intelligence

  34. Wu F, Jiang X, Li X et al (2015b) Cross-modal learning to rank via latent joint representation. IEEE Trans Image Process 24(5):1497–1509

    Article  MathSciNet  MATH  Google Scholar 

  35. Wu G, Lin Z, Han J et al (2018) Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI, p 5

  36. Xie D, Deng C, Li C et al (2020) Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Trans Image Process 29:3626–3637

    Article  MATH  Google Scholar 

  37. Xu X, Shen F, Yang Y et al (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process 26(5):2494–2507

    Article  MathSciNet  MATH  Google Scholar 

  38. Yang D, Wu D, Zhang W et al (2020) Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: Proceedings of the 2020 international conference on multimedia retrieval, pp 44–52

  39. Yang F, Ding X, Liu Y et al (2022a) Scalable semantic-enhanced supervised hashing for cross-modal retrieval. Knowl-Based Syst:109176

  40. Yang F, Liu Y, Ding X et al (2022b) Asymmetric cross-modal hashing with high-level semantic similarity. Pattern Recogn 130:108823

    Article  Google Scholar 

  41. Ye Z, Peng Y (2018) Multi-scale correlation for sequential cross-modal hashing learning. In: Proceedings of the 26th ACM international conference on Multimedia, pp 852–860

  42. Yu J, Zhou H, Zhan Y et al (2021) Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing. In: Proceedings of the AAAI conference on artificial intelligence, pp 4626–4634

  43. Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the AAAI conference on artificial intelligence

  44. Zhang J, Peng Y, Yuan M (2018) Unsupervised generative adversarial cross-modal hashing. In: Proceedings of the AAAI conference on artificial intelligence

  45. Zheng W, Liu H, Wang B et al (2020) Cross-modal learning for material perception using deep extreme learning machine. Int J Mach Learn Cybern 11(4):813–823

    Article  Google Scholar 

  46. Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, pp 415–424

  47. Zhu X, Huang Z, Shen HT et al (2013) Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM international conference on multimedia, pp 143–152

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their help. This work was supported by the National Natural Science Foundation of China (Grant no. 62076044), the Chongqing Talent Plan (Grant no. cstc2022ycjh-bgzxm0160), Natural Science Foundation of Chongqing, China (Grant no. cstc2019jcyj-zdxm0011) and Chongqing Graduate Research Innovation Project of China (Grant no. CYS21307).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to XianHua Zeng.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeng, X., Xu, K. & Xie, Y. Pseudo-label driven deep hashing for unsupervised cross-modal retrieval. Int. J. Mach. Learn. & Cyber. 14, 3437–3456 (2023). https://doi.org/10.1007/s13042-023-01842-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01842-5

Keywords

Navigation