research-article

Supervised hashing with latent factor models

Authors:
Peichao Zhang

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Wei Zhang

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Wu-Jun Li

Nanjing University, Nanjing , China

Nanjing University, Nanjing , China
View Profile

,
Minyi Guo

Shanghai Jiao Tong University, Shanghai , China

Shanghai Jiao Tong University, Shanghai , China
View Profile

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrievalJuly 2014Pages 173–182https://doi.org/10.1145/2600428.2609600

Published:03 July 2014Publication History

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Pages 173–182

ABSTRACT

Due to its low storage cost and fast query speed, hashing has been widely adopted for approximate nearest neighbor search in large-scale datasets. Traditional hashing methods try to learn the hash codes in an unsupervised way where the metric (Euclidean) structure of the training data is preserved. Very recently, supervised hashing methods, which try to preserve the semantic structure constructed from the semantic labels of the training points, have exhibited higher accuracy than unsupervised methods. In this paper, we propose a novel supervised hashing method, called latent factor hashing(LFH), to learn similarity-preserving binary codes based on latent factor models. An algorithm with convergence guarantee is proposed to learn the parameters of LFH. Furthermore, a linear-time variant with stochastic learning is proposed for training LFH on large-scale datasets. Experimental results on two large datasets with semantic labels show that LFH can achieve superior accuracy than state-of-the-art methods with comparable training time.

References

A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proceedings of the Annual Symposium on Foundations of Computer Science, pages 459--468, 2006. Google ScholarDigital Library
S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM, 45(6):891--923, 1998. Google ScholarDigital Library
T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. Nus-wide: A real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval, 2009. Google ScholarDigital Library
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the Annual Symposium on Computational Geometry, pages 253--262, 2004. Google ScholarDigital Library
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of the International Conference on Very Large Data Bases, pages 518--529, 1999. Google ScholarDigital Library
Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 817--824, 2011. Google ScholarDigital Library
P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Annual ACM Symposium on Theory of Computing, pages 604--613, 1998. Google ScholarDigital Library
W. Kong and W.-J. Li. Double-bit quantization for hashing. In Proceedings of the AAAI Conference on Artificial Intelligence, 2012.Google Scholar
W. Kong and W.-J. Li. Isotropic hashing. In Proceedings of the Annual Conference on Neural Information Processing Systems, pages 1655--1663, 2012.Google Scholar
W. Kong, W.-J. Li, and M. Guo. Manhattan hashing for large-scale image retrieval. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 45--54, 2012. Google ScholarDigital Library
A. Krizhevsky. Learning multiple layers of features from tiny images. Master's thesis, University of Toronto, 2009.Google Scholar
B. Kulis and T. Darrell. Learning to hash with binary reconstructive embeddings. In Proceedings of the Annual Conference on Neural Information Processing Systems, pages 1042--1050, 2009.Google Scholar
B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In Proceedings of the IEEE International Conference on Computer Vision, pages 2130--2137, 2009.Google ScholarCross Ref
B. Kulis, P. Jain, and K. Grauman. Fast similarity search for learned metrics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12):2143--2157, 2009. Google ScholarDigital Library
K. Lange, D. R. Hunter, and I. Yang. Optimization transfer using surrogate objective functions. Journal of Computational and Graphical Statistics, 9(1):1--20, 2000.Google Scholar
X. Li, G. Lin, C. Shen, A. van den Hengel, and A. R. Dick. Learning hash functions using column generation. In Proceedings of the International Conference on Machine Learning, pages 142--150, 2013.Google Scholar
W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashing with kernels. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 2074--2081, 2012. Google ScholarDigital Library
W. Liu, J. Wang, S. Kumar, and S.-F. Chang. Hashing with graphs. In Proceedings of the International Conference on Machine Learning, 2011.Google Scholar
M. Norouzi and D. J. Fleet. Minimal loss hashing for compact binary codes. In Proceedings of the International Conference on Machine Learning, pages 353--360, 2011.Google Scholar
M. Norouzi, D. J. Fleet, and R. Salakhutdinov. Hamming distance metric learning. In Proceedings of the Annual Conference on Neural Information Processing Systems, pages 1070--1078, 2012.Google Scholar
M. Ou, P. Cui, F. Wang, J. Wang, W. Zhu, and S. Yang. Comparing apples to oranges: A scalable solution with heterogeneous hashing. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 230--238, 2013. Google ScholarDigital Library
M. Raginsky and S. Lazebnik. Locality-sensitive binary codes from shift-invariant kernels. In Proceedings of the Annual Conference on Neural Information Processing Systems, pages 1509--1517, 2009.Google Scholar
M. Rastegari, J. Choi, S. Fakhraei, D. Hal, and L. S. Davis. Predictable dual-view hashing. In Proceedings of the International Conference on Machine Learning, pages 1328--1336, 2013.Google Scholar
R. Salakhutdinov and G. E. Hinton. Semantic hashing. International Journal of Approximate Reasoning, 50(7):969--978, 2009. Google ScholarDigital Library
A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349--1380, 2000. Google ScholarDigital Library
J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 785--796, 2013. Google ScholarDigital Library
B. Stein. Principles of hash-based text retrieval. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 527--534, 2007. Google ScholarDigital Library
C. Strecha, A. A. Bronstein, M. M. Bronstein, and P. Fua. Ldahash: Improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(1):66--78, 2012. Google ScholarDigital Library
A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11):1958--1970, 2008. Google ScholarDigital Library
A. Torralba, R. Fergus, and Y. Weiss. Small codes and large image databases for recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008.Google ScholarCross Ref
F. Ture, T. Elsayed, and J. J. Lin. No free lunch: Brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 943--952, 2011. Google ScholarDigital Library
J. Wang, O. Kumar, and S.-F. Chang. Semi-supervised hashing for scalable image retrieval. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3424--3431, 2010.Google ScholarCross Ref
J. Wang, S. Kumar, and S.-F. Chang. Sequential projection learning for hashing with compact codes. In Proceedings of the International Conference on Machine Learning, pages 1127--1134, 2010.Google Scholar
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In Proceedings of the Annual Conference on Neural Information Processing Systems, pages 1753--1760, 2008.Google Scholar
F. Wu, Z. Yu, Y. Yang, S. Tang, Y. Zhang, and Y. Zhuang. Sparse multi-modal hashing. IEEE Transactions on Multimedia, 16(2):427--439, 2014.Google ScholarDigital Library
B. Xu, J. Bu, Y. Lin, C. Chen, X. He, and D. Cai. Harmonious hashing. In Proceedings of the International Joint Conference on Artificial Intelligence, 2013. Google ScholarDigital Library
D. Zhai, H. Chang, Y. Zhen, X. Liu, X. Chen, and W. Gao. Parametric local multimodal hashing for cross-view similarity search. In Proceedings of the International Joint Conference on Artificial Intelligence, 2013. Google ScholarDigital Library
D. Zhang and W.-J. Li. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the AAAI Conference on Artificial Intelligence, 2014.Google Scholar
D. Zhang, F. Wang, and L. Si. Composite hashing with multiple information sources. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 225--234, 2011. Google ScholarDigital Library
D. Zhang, J. Wang, D. Cai, and J. Lu. Self-taught hashing for fast similarity search. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 18--25, 2010. Google ScholarDigital Library
Q. Zhang, Y. Wu, Z. Ding, and X. Huang. Learning hash codes for efficient content reuse detection. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 405--414, 2012. Google ScholarDigital Library
Y. Zhen and D.-Y. Yeung. A probabilistic model for multimodal hash function learning. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 940--948, 2012. Google ScholarDigital Library

Index Terms

Supervised hashing with latent factor models
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Semi-supervised manifold-embedded hashing with joint feature representation and classifier learning

We propose a semi-supervised hashing method which uses very limited labeled data.We integrate manifold embedding, feature representation and classifier learning into a joint optimization framework.We adopt the l2,1-norm in our formulation to obtain a ...
Read More
Semi-Supervised Hashing for Large-Scale Search

Hashing-based approximate nearest neighbor (ANN) search in huge databases has become popular due to its computational and memory efficiency. The popular hashing methods, e.g., Locality Sensitive Hashing and Spectral Hashing, construct hash functions ...
Read More
Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Similarity search is one of the fundamental problems for large scale multimedia applications. Hashing techniques, as one popular strategy, have been intensively investigated owing to the speed and memory efficiency. Recent research has shown that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
July 2014
1330 pages
ISBN:9781450322577
DOI:10.1145/2600428
General Chairs:
Shlomo Geva
Queensland University of Technology
,
Andrew Trotman
University of Dunedin
,
Program Chairs:
Peter Bruza
Queensland University of Technology
,
Charles L.A. Clarke
University of Waterloo
,
Kal Järvelin
University of Tampere
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 July 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
big data
hashing
image retrieval
latent factor model
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '14 Paper Acceptance Rate82of387submissions,21%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 193
  Total Citations
  View Citations
- 1,080
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Supervised hashing with latent factor models

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Semi-supervised manifold-embedded hashing with joint feature representation and classifier learning

Semi-Supervised Hashing for Large-Scale Search

Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search