skip to main content
10.1145/2647868.2654948acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Deep Learning for Content-Based Image Retrieval: A Comprehensive Study

Authors Info & Claims
Published:03 November 2014Publication History

ABSTRACT

Learning effective feature representations and similarity measures are crucial to the retrieval performance of a content-based image retrieval (CBIR) system. Despite extensive research efforts for decades, it remains one of the most challenging open problems that considerably hinders the successes of real-world CBIR systems. The key challenge has been attributed to the well-known ``semantic gap'' issue that exists between low-level image pixels captured by machines and high-level semantic concepts perceived by human. Among various techniques, machine learning has been actively investigated as a possible direction to bridge the semantic gap in the long term. Inspired by recent successes of deep learning techniques for computer vision and other applications, in this paper, we attempt to address an open problem: if deep learning is a hope for bridging the semantic gap in CBIR and how much improvements in CBIR tasks can be achieved by exploring the state-of-the-art deep learning techniques for learning feature representations and similarity measures. Specifically, we investigate a framework of deep learning with application to CBIR tasks with an extensive set of empirical studies by examining a state-of-the-art deep learning method (Convolutional Neural Networks) for CBIR tasks under varied settings. From our empirical studies, we find some encouraging results and summarize some important insights for future research.

References

  1. D. H. Ackley, G. E. Hinton, and T. J. Sejnowski. A learning algorithm for boltzmann machines*. Cognitive science, 9(1):147--169, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  2. A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning distance functions using equivalence relations. In ICML, pages 11--18, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Bay, T. Tuytelaars, and L. J. V. Gool. Surf: Speeded up robust features. In ECCV (1), pages 404--417, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. C. Becker and E. G. Ortiz. Evaluating open-universe face identification on the web. In CVPR Workshops, pages 904--911, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Bengio, A. C. Courville, and P. Vincent. Unsupervised feature learning and deep learning: A review and new perspectives. CoRR, abs/1206.5538, 2012.Google ScholarGoogle Scholar
  6. H. Chang and D.-Y. Yeung. Kernel-based distance metric learning for content-based image retrieval. Image and Vision Computing, 25(5):695--703, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Chechik, V. Sharma, U. Shalit, and S. Bengio. Large scale online learning of image similarity through ranking. Journal of Machine Learning Research, 11:1109--1135, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. C. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber. Deep neural networks segment neuronal membranes in electron microscopy images. In NIPS, pages 2852--2860, 2012.Google ScholarGoogle Scholar
  9. K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551--585, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. W. Senior, P. A. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In NIPS, pages 1232--1240, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Deng. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing, 3:e2, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  12. C. Domeniconi, J. Peng, and D. Gunopulos. Locally adaptive metric nearest-neighbor classification. IEEE Trans. Pattern Anal. Mach. Intell., 24(9):1281--1285, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. CoRR, abs/1310.1531, 2013.Google ScholarGoogle Scholar
  14. R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR, abs/1311.2524, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Guillaumin, J. J. Verbeek, and C. Schmid. Is that you? metric learning approaches for face identification. In ICCV, pages 498--505, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  16. G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 29(6):82--97, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  17. G. E. Hinton, S. Osindero, and Y. W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527--1554, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. C. H. Hoi, W. Liu, M. R. Lyu, and W.-Y. Ma. Learning distance metrics with contextual constraints for image retrieval. In CVPR (2), pages 2072--2078, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. Improving word representations via global context and multiple word prototypes. In ACL (1), pages 873--882, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October 2007.Google ScholarGoogle Scholar
  21. A. K. Jain and A. Vailaya. Image retrieval using color and shape. Pattern Recognition, 29(8):1233--1244, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  22. P. Jain, B. Kulis, I. S. Dhillon, and K. Grauman. Online metric learning and fast similarity search. In NIPS, pages 761--768, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid. Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell., 34(9):1704--1716, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Jin, S. Wang, and Y. Zhou. Regularized distance metric learning: Theory and algorithm. In NIPS, pages 862--870, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Jing and S. Baluja. Visualrank: Applying pagerank to large-scale image search. IEEE Trans. Pattern Anal. Mach. Intell., 30(11):1877--1890, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1106--1114, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar. Attribute and simile classifiers for face verification. In ICCV, pages 365--372, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  28. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  29. J.-E. Lee, R. Jin, and A. K. Jain. Rank-based distance metric learning: An application to image retrieval. In CVPR, 2008.Google ScholarGoogle Scholar
  30. M. S. Lew, N. Sebe, C. Djeraba, and R. Jain. Content-based multimedia information retrieval: State of the art and challenges. TOMCCAP, 2(1):1--19, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. G. Lowe. Object recognition from local scale-invariant features. In ICCV, pages 1150--1157, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. S. Manjunath and W.-Y. Ma. Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell., 18(8):837--842, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. S. Mian, Y. Hu, R. Hartley, and R. A. Owens. Image set based face recognition using self-regularized non-negative coding and adaptive distance metric learning. IEEE Transactions on Image Processing, 22(12):5252--5262, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. T. Mikolov, W. tau Yih, and G. Zweig. Linguistic regularities in continuous space word representations. In HLT-NAACL, pages 746--751, 2013.Google ScholarGoogle Scholar
  35. M. Norouzi, D. J. Fleet, and R. Salakhutdinov. Hamming distance metric learning. In NIPS, pages 1070--1078, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145--175, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Oliva and A. Torralba. Scene-centered description from spatial envelope properties. In Biologically Motivated Computer Vision, pages 263--272, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In CVPR, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  39. A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. CoRR, abs/1403.6382, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Salakhutdinov and G. E. Hinton. Deep Boltzmann machines. In AISTATS, pages 448--455, 2009.Google ScholarGoogle Scholar
  41. R. Salakhutdinov and G. E. Hinton. Semantic hashing. Int. J. Approx. Reasoning, 50(7):969--978, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. R. Salakhutdinov, A. Mnih, and G. E. Hinton. Restricted boltzmann machines for collaborative filtering. In ICML, pages 791--798, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. CoRR, abs/1312.6229, 2013.Google ScholarGoogle Scholar
  44. J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman. Discovering objects and their localization in images. In ICCV, pages 370--377, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell., 22(12):1349--1380, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. D. Wang, S. C. H. Hoi, P. Wu, J. Zhu, Y. He, and C. Miao. Learning to name faces: a multimodal learning scheme for search-based face annotation. In SIGIR, pages 443--452, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Z. Wang, Y. Hu, and L.-T. Chia. Learning image-to-class distance metric for image classification. ACM TIST, 4(2):34, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. Wu and J. M. Rehg. Centrist: A visual descriptor for scene categorization. IEEE Trans. Pattern Anal. Mach. Intell., 33(8):1489--1501, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. L. Wu and S. C. H. Hoi. Enhancing bag-of-words models with semantics-preserving metric learning. IEEE MultiMedia, 18(1):24--37, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. L. Wu, S. C. H. Hoi, and N. Yu. Semantics-preserving bag-of-words models and applications. IEEE Transactions on Image Processing, 19(7):1908--1920, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. P. Wu, S. C. H. Hoi, H. Xia, P. Zhao, D. Wang, and C. Miao. Online multimodal deep similarity learning with application to image retrieval. In ACM Multimedia, pages 153--162, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. H. Xie, Y. Zhang, J. Tan, L. Guo, and J. Li. Contextual query expansion for image retrieval. IEEE Transactions on Multimedia, 16(4):1104--1114, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. J. Yang, Y.-G. Jiang, A. G. Hauptmann, and C.-W. Ngo. Evaluating bag-of-visual-words representations in scene classification. In Multimedia Information Retrieval, pages 197--206, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. D. Yu, M. L. Seltzer, J. Li, J.-T. Huang, and F. Seide. Feature learning in deep neural networks - a study on speech recognition tasks. CoRR, abs/1301.3605, 2013.Google ScholarGoogle Scholar
  56. M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. CoRR, abs/1311.2901, 2013.Google ScholarGoogle Scholar
  57. L. Zhang, Y. Zhang, X. Gu, J. Tang, and Q. Tian. Scalable similarity search with topology preserving hashing. IEEE Transactions on Image Processing, 23(7):3025--3039, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  58. Y. Zhang, L. Zhang, and Q. Tian. A prior-free weighting scheme for binary code ranking. IEEE Transactions on Multimedia, 16(4):1127--1139, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Deep Learning for Content-Based Image Retrieval: A Comprehensive Study

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            MM '14: Proceedings of the 22nd ACM international conference on Multimedia
            November 2014
            1310 pages
            ISBN:9781450330633
            DOI:10.1145/2647868

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 3 November 2014

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            MM '14 Paper Acceptance Rate55of286submissions,19%Overall Acceptance Rate995of4,171submissions,24%

            Upcoming Conference

            MM '24
            MM '24: The 32nd ACM International Conference on Multimedia
            October 28 - November 1, 2024
            Melbourne , VIC , Australia

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader