skip to main content
10.1145/2600428.2609600acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Supervised hashing with latent factor models

Published:03 July 2014Publication History

ABSTRACT

Due to its low storage cost and fast query speed, hashing has been widely adopted for approximate nearest neighbor search in large-scale datasets. Traditional hashing methods try to learn the hash codes in an unsupervised way where the metric (Euclidean) structure of the training data is preserved. Very recently, supervised hashing methods, which try to preserve the semantic structure constructed from the semantic labels of the training points, have exhibited higher accuracy than unsupervised methods. In this paper, we propose a novel supervised hashing method, called latent factor hashing(LFH), to learn similarity-preserving binary codes based on latent factor models. An algorithm with convergence guarantee is proposed to learn the parameters of LFH. Furthermore, a linear-time variant with stochastic learning is proposed for training LFH on large-scale datasets. Experimental results on two large datasets with semantic labels show that LFH can achieve superior accuracy than state-of-the-art methods with comparable training time.

References

  1. A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proceedings of the Annual Symposium on Foundations of Computer Science, pages 459--468, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM, 45(6):891--923, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. Nus-wide: A real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the Annual Symposium on Computational Geometry, pages 253--262, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of the International Conference on Very Large Data Bases, pages 518--529, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 817--824, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Annual ACM Symposium on Theory of Computing, pages 604--613, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W. Kong and W.-J. Li. Double-bit quantization for hashing. In Proceedings of the AAAI Conference on Artificial Intelligence, 2012.Google ScholarGoogle Scholar
  9. W. Kong and W.-J. Li. Isotropic hashing. In Proceedings of the Annual Conference on Neural Information Processing Systems, pages 1655--1663, 2012.Google ScholarGoogle Scholar
  10. W. Kong, W.-J. Li, and M. Guo. Manhattan hashing for large-scale image retrieval. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 45--54, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Krizhevsky. Learning multiple layers of features from tiny images. Master's thesis, University of Toronto, 2009.Google ScholarGoogle Scholar
  12. B. Kulis and T. Darrell. Learning to hash with binary reconstructive embeddings. In Proceedings of the Annual Conference on Neural Information Processing Systems, pages 1042--1050, 2009.Google ScholarGoogle Scholar
  13. B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In Proceedings of the IEEE International Conference on Computer Vision, pages 2130--2137, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  14. B. Kulis, P. Jain, and K. Grauman. Fast similarity search for learned metrics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12):2143--2157, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Lange, D. R. Hunter, and I. Yang. Optimization transfer using surrogate objective functions. Journal of Computational and Graphical Statistics, 9(1):1--20, 2000.Google ScholarGoogle Scholar
  16. X. Li, G. Lin, C. Shen, A. van den Hengel, and A. R. Dick. Learning hash functions using column generation. In Proceedings of the International Conference on Machine Learning, pages 142--150, 2013.Google ScholarGoogle Scholar
  17. W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashing with kernels. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 2074--2081, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Liu, J. Wang, S. Kumar, and S.-F. Chang. Hashing with graphs. In Proceedings of the International Conference on Machine Learning, 2011.Google ScholarGoogle Scholar
  19. M. Norouzi and D. J. Fleet. Minimal loss hashing for compact binary codes. In Proceedings of the International Conference on Machine Learning, pages 353--360, 2011.Google ScholarGoogle Scholar
  20. M. Norouzi, D. J. Fleet, and R. Salakhutdinov. Hamming distance metric learning. In Proceedings of the Annual Conference on Neural Information Processing Systems, pages 1070--1078, 2012.Google ScholarGoogle Scholar
  21. M. Ou, P. Cui, F. Wang, J. Wang, W. Zhu, and S. Yang. Comparing apples to oranges: A scalable solution with heterogeneous hashing. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 230--238, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Raginsky and S. Lazebnik. Locality-sensitive binary codes from shift-invariant kernels. In Proceedings of the Annual Conference on Neural Information Processing Systems, pages 1509--1517, 2009.Google ScholarGoogle Scholar
  23. M. Rastegari, J. Choi, S. Fakhraei, D. Hal, and L. S. Davis. Predictable dual-view hashing. In Proceedings of the International Conference on Machine Learning, pages 1328--1336, 2013.Google ScholarGoogle Scholar
  24. R. Salakhutdinov and G. E. Hinton. Semantic hashing. International Journal of Approximate Reasoning, 50(7):969--978, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349--1380, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 785--796, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Stein. Principles of hash-based text retrieval. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 527--534, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Strecha, A. A. Bronstein, M. M. Bronstein, and P. Fua. Ldahash: Improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(1):66--78, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11):1958--1970, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Torralba, R. Fergus, and Y. Weiss. Small codes and large image databases for recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  31. F. Ture, T. Elsayed, and J. J. Lin. No free lunch: Brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 943--952, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Wang, O. Kumar, and S.-F. Chang. Semi-supervised hashing for scalable image retrieval. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3424--3431, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  33. J. Wang, S. Kumar, and S.-F. Chang. Sequential projection learning for hashing with compact codes. In Proceedings of the International Conference on Machine Learning, pages 1127--1134, 2010.Google ScholarGoogle Scholar
  34. Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In Proceedings of the Annual Conference on Neural Information Processing Systems, pages 1753--1760, 2008.Google ScholarGoogle Scholar
  35. F. Wu, Z. Yu, Y. Yang, S. Tang, Y. Zhang, and Y. Zhuang. Sparse multi-modal hashing. IEEE Transactions on Multimedia, 16(2):427--439, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. B. Xu, J. Bu, Y. Lin, C. Chen, X. He, and D. Cai. Harmonious hashing. In Proceedings of the International Joint Conference on Artificial Intelligence, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Zhai, H. Chang, Y. Zhen, X. Liu, X. Chen, and W. Gao. Parametric local multimodal hashing for cross-view similarity search. In Proceedings of the International Joint Conference on Artificial Intelligence, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Zhang and W.-J. Li. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the AAAI Conference on Artificial Intelligence, 2014.Google ScholarGoogle Scholar
  39. D. Zhang, F. Wang, and L. Si. Composite hashing with multiple information sources. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 225--234, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. D. Zhang, J. Wang, D. Cai, and J. Lu. Self-taught hashing for fast similarity search. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 18--25, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Q. Zhang, Y. Wu, Z. Ding, and X. Huang. Learning hash codes for efficient content reuse detection. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 405--414, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Y. Zhen and D.-Y. Yeung. A probabilistic model for multimodal hash function learning. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 940--948, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Supervised hashing with latent factor models

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
      July 2014
      1330 pages
      ISBN:9781450322577
      DOI:10.1145/2600428

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 July 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGIR '14 Paper Acceptance Rate82of387submissions,21%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader