ABSTRACT
Cross-media hashing, which conducts cross-media retrieval by embedding data from different modalities into a common low-dimensional Hamming space, has attracted intensive attention in recent years. The existing cross-media hashing approaches only aim at learning hash functions to preserve the intra-modality and inter-modality correlations, but do not directly capture the underlying semantic information of the multi-modal data. We propose a discriminative coupled dictionary hashing (DCDH) method in this paper. In DCDH, the coupled dictionary for each modality is learned with side information (e.g., categories). As a result, the coupled dictionaries not only preserve the intra-similarity and inter-correlation among multi-modal data, but also contain dictionary atoms that are semantically discriminative (i.e., the data from the same category is reconstructed by the similar dictionary atoms). To perform fast cross-media retrieval, we learn hash functions which map data from the dictionary space to a low-dimensional Hamming space. Besides, we conjecture that a balanced representation is crucial in cross-media retrieval. We introduce multi-view features on the relatively ``weak'' modalities into DCDH and extend it to multi-view DCDH (MV-DCDH) in order to enhance their representation capability. The experiments on two real-world data sets show that our DCDH and MV-DCDH outperform the state-of-the-art methods significantly on cross-media retrieval.
- M. Aharon, M. Elad, and A. Bruckstein. K-svd: An algorithm for designing overcomplete dictionries for sparse representation. IEEE Trans.Signal Processing, 54(11):4311--4322, 2006. Google ScholarDigital Library
- A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS, pages 459--468, 2006. Google ScholarDigital Library
- M. Bronstein, A. Bronstein, F. Michel, and N. Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In CVPR, pages 3594--3601, 2010.Google ScholarCross Ref
- B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Sstatistics, 32(2):407--499, 2004.Google ScholarCross Ref
- Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In CVPR, pages 817--824, 2011. Google ScholarDigital Library
- K. Jia, X. Tang, and X. Wang. Image transformation based on learning dictionaries across image spaces. IEEE Trans.Pattern Anal. Mach. Intell., 2012. Google ScholarDigital Library
- Z. Jiang, G. Zhang, and L. S. Davis. Submodular dictionary learning for sparse coding. In CVPR, pages 3418--3425, 2012. Google ScholarDigital Library
- B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In ICCV, pages 2130--2137, 2009.Google ScholarCross Ref
- S. Kumar and R. Udupa. Learning hash functions for cross-view similarity search. In IJCAI, pages 1360--1365, 2011. Google ScholarDigital Library
- M.-Y. Liu, O. Tuzel, S. Ramalingam, and R. Chellappa. Entropy rate superpixel segmentation. In CVPR, pages 2097--2104, 2011. Google ScholarDigital Library
- W. Liu, J. Wang, S. Kumar, and S. Chang. Hashing with graphs. In ICML, pages 1--8, 2011.Google ScholarDigital Library
- Y. Liu, F. Wu, Y. Yi, Y. Zhuang, and A. Hauptman. Spline regression hashing for fast image search. IEEE Trans. Image Processing, 2012.Google Scholar
- X. Lu, F. Wu, S. Tang, Z. Zhang, X. He, and Y. Zhuang. A low rank structural large margin method for cross-modal ranking. In SIGIR, pages 433--442. ACM, 2013. Google ScholarDigital Library
- G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions i. Mathematical Programming, 14(1):265--294, 1978.Google ScholarDigital Library
- M. Ou, P. Cui, F. Wang, J. Wang, W. Zhu, and S. Yang. Comparing apples to oranges: a scalable solution with heterogeneous hashing. In SIGKDD, pages 230--238, 2013. Google ScholarDigital Library
- N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM MM, pages 251--260, 2010. Google ScholarDigital Library
- J. Song, Y. Yang, Z. Huang, H. Shen, and R. Hong. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In ACM MM, pages 423--432, 2011. Google ScholarDigital Library
- R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267--288, 1996.Google ScholarCross Ref
- C. Wang and S. Mahadevan. A general framework for manifold alignment. In AAAI, 2009.Google Scholar
- J. Wang, S. Kumar, and S. Chang. Semi-supervised hashing for scalable image retrieval. In CVPR, pages 3424--3431, 2010.Google ScholarCross Ref
- Q. Wang, D. Zhang, and L. Si. Semantic hashing using tags and topic modeling. In SIGIR, pages 213--222, 2013. Google ScholarDigital Library
- S. Wang, L. Zhang, Y. Liang, and Q. Pan. Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis. In CVPR, pages 2216--2223, 2012. Google ScholarDigital Library
- Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, 2008.Google ScholarDigital Library
- J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE Trans.Pattern Anal. Mach. Intell., 31(2):210--227, 2009. Google ScholarDigital Library
- F. Wu, Z. Yu, Y. Yang, S. Tang, Y. Zhang, and Y. Zhuang. Sparse multi modal hashing. IEEE Trans. Multimedia, 16(2):427--439.Google ScholarDigital Library
- D. Zhang, F. Wang, and L. Si. Composite hashing with multiple information sources. In SIGIR, pages 225--234, 2011. Google ScholarDigital Library
- D. Zhang, J. Wang, D. Cai, and J. Lu. Self-taught hashing for fast similarity search. In SIGIR, pages 18--25, 2010. Google ScholarDigital Library
- Y. Zhen and D. Yeung. A probabilistic model for multimodal hash function learning. In SIGKDD, 2012. Google ScholarDigital Library
- Y. Zhen and D.-Y. Yeung. Co-regularized hashing for multimodal data. In NIPS, pages 1385--1393, 2012.Google ScholarDigital Library
- X. Zhu, Z. Huang, H. T. Shen, and X. Zhao. Linear cross-modal hashing for efficient multimedia search. In ACM MM, pages 143--152, 2013. Google ScholarDigital Library
- Y. Zhuang, Y. Wang, F. Wu, Y. Zhang, and W. Lu. Supervised coupled dictionary learning with group structures for multi-modal retrieval. In AAAI, 2013.Google ScholarDigital Library
- Y. Zhuang, Y. Yang, and F. Wu. Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multimedia, 10(2):221--229, 2008. Google ScholarDigital Library
Index Terms
- Discriminative coupled dictionary hashing for fast cross-media retrieval
Recommendations
Modality-Dependent Cross-Media Retrieval
Special Issue on Crowd in Intelligent Systems, Research Note/Short Paper and Regular PapersIn this article, we investigate the cross-media retrieval between images and text, that is, using image to search text (I2T) and using text to search images (T2I). Existing cross-media retrieval methods usually learn one couple of projections, by which ...
Cross-media Relevance Computation for Multimedia Retrieval
MM '17: Proceedings of the 25th ACM international conference on MultimediaIn this paper, we summarize our works for cross-media retrieval where the queries and retrieval content are of different media types. We study cross-media retrieval in the context of two applications, i.e., ~image retrieval by textual queries, and ...
Semi-supervised modality-dependent cross-media retrieval
In this paper, we propose a modality-dependent cross-media retrieval approach under semi-supervised conditions. The approach utilizes both labeled samples and unlabeled ones to obtain two couples of projection matrices and uses feature distance to ...
Comments