research-article

Deep Learning for Content-Based Image Retrieval: A Comprehensive Study

Authors:
Ji Wan

Institute of Computing Technology, CAS, Beijing, China

Institute of Computing Technology, CAS, Beijing, China
View Profile

,
Dayong Wang

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore
View Profile

,
Steven Chu Hong Hoi

Singapore Management University, Singapore, Singapore

Singapore Management University, Singapore, Singapore
View Profile

,
Pengcheng Wu

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore
View Profile

,
Jianke Zhu

Zhejiang University, Hangzhou, China

Zhejiang University, Hangzhou, China
View Profile

,
Yongdong Zhang

Institute of Computing Technology, CAS, Beijing, China

Institute of Computing Technology, CAS, Beijing, China
View Profile

,
Jintao Li

Institute of Computing Technology, CAS, Beijing, China

Institute of Computing Technology, CAS, Beijing, China
View Profile

MM '14: Proceedings of the 22nd ACM international conference on MultimediaNovember 2014Pages 157–166https://doi.org/10.1145/2647868.2654948

Published:03 November 2014Publication History

MM '14: Proceedings of the 22nd ACM international conference on Multimedia

Pages 157–166

ABSTRACT

Learning effective feature representations and similarity measures are crucial to the retrieval performance of a content-based image retrieval (CBIR) system. Despite extensive research efforts for decades, it remains one of the most challenging open problems that considerably hinders the successes of real-world CBIR systems. The key challenge has been attributed to the well-known ``semantic gap'' issue that exists between low-level image pixels captured by machines and high-level semantic concepts perceived by human. Among various techniques, machine learning has been actively investigated as a possible direction to bridge the semantic gap in the long term. Inspired by recent successes of deep learning techniques for computer vision and other applications, in this paper, we attempt to address an open problem: if deep learning is a hope for bridging the semantic gap in CBIR and how much improvements in CBIR tasks can be achieved by exploring the state-of-the-art deep learning techniques for learning feature representations and similarity measures. Specifically, we investigate a framework of deep learning with application to CBIR tasks with an extensive set of empirical studies by examining a state-of-the-art deep learning method (Convolutional Neural Networks) for CBIR tasks under varied settings. From our empirical studies, we find some encouraging results and summarize some important insights for future research.

References

D. H. Ackley, G. E. Hinton, and T. J. Sejnowski. A learning algorithm for boltzmann machines*. Cognitive science, 9(1):147--169, 1985.Google ScholarCross Ref
A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning distance functions using equivalence relations. In ICML, pages 11--18, 2003.Google ScholarDigital Library
H. Bay, T. Tuytelaars, and L. J. V. Gool. Surf: Speeded up robust features. In ECCV (1), pages 404--417, 2006. Google ScholarDigital Library
B. C. Becker and E. G. Ortiz. Evaluating open-universe face identification on the web. In CVPR Workshops, pages 904--911, 2013. Google ScholarDigital Library
Y. Bengio, A. C. Courville, and P. Vincent. Unsupervised feature learning and deep learning: A review and new perspectives. CoRR, abs/1206.5538, 2012.Google Scholar
H. Chang and D.-Y. Yeung. Kernel-based distance metric learning for content-based image retrieval. Image and Vision Computing, 25(5):695--703, 2007. Google ScholarDigital Library
G. Chechik, V. Sharma, U. Shalit, and S. Bengio. Large scale online learning of image similarity through ranking. Journal of Machine Learning Research, 11:1109--1135, 2010. Google ScholarDigital Library
D. C. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber. Deep neural networks segment neuronal membranes in electron microscopy images. In NIPS, pages 2852--2860, 2012.Google Scholar
K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551--585, 2006. Google ScholarDigital Library
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. W. Senior, P. A. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In NIPS, pages 1232--1240, 2012.Google ScholarDigital Library
L. Deng. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing, 3:e2, 2014.Google ScholarCross Ref
C. Domeniconi, J. Peng, and D. Gunopulos. Locally adaptive metric nearest-neighbor classification. IEEE Trans. Pattern Anal. Mach. Intell., 24(9):1281--1285, 2002. Google ScholarDigital Library
J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. CoRR, abs/1310.1531, 2013.Google Scholar
R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR, abs/1311.2524, 2013. Google ScholarDigital Library
M. Guillaumin, J. J. Verbeek, and C. Schmid. Is that you? metric learning approaches for face identification. In ICCV, pages 498--505, 2009.Google ScholarCross Ref
G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 29(6):82--97, 2012.Google ScholarCross Ref
G. E. Hinton, S. Osindero, and Y. W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527--1554, 2006. Google ScholarDigital Library
S. C. H. Hoi, W. Liu, M. R. Lyu, and W.-Y. Ma. Learning distance metrics with contextual constraints for image retrieval. In CVPR (2), pages 2072--2078, 2006. Google ScholarDigital Library
E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. Improving word representations via global context and multiple word prototypes. In ACL (1), pages 873--882, 2012. Google ScholarDigital Library
G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October 2007.Google Scholar
A. K. Jain and A. Vailaya. Image retrieval using color and shape. Pattern Recognition, 29(8):1233--1244, 1996.Google ScholarCross Ref
P. Jain, B. Kulis, I. S. Dhillon, and K. Grauman. Online metric learning and fast similarity search. In NIPS, pages 761--768, 2008.Google ScholarDigital Library
H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid. Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell., 34(9):1704--1716, 2012. Google ScholarDigital Library
R. Jin, S. Wang, and Y. Zhou. Regularized distance metric learning: Theory and algorithm. In NIPS, pages 862--870, 2009.Google ScholarDigital Library
Y. Jing and S. Baluja. Visualrank: Applying pagerank to large-scale image search. IEEE Trans. Pattern Anal. Mach. Intell., 30(11):1877--1890, 2008. Google ScholarDigital Library
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1106--1114, 2012.Google ScholarDigital Library
N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar. Attribute and simile classifiers for face verification. In ICCV, pages 365--372, 2009.Google ScholarCross Ref
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.Google ScholarCross Ref
J.-E. Lee, R. Jin, and A. K. Jain. Rank-based distance metric learning: An application to image retrieval. In CVPR, 2008.Google Scholar
M. S. Lew, N. Sebe, C. Djeraba, and R. Jain. Content-based multimedia information retrieval: State of the art and challenges. TOMCCAP, 2(1):1--19, 2006. Google ScholarDigital Library
D. G. Lowe. Object recognition from local scale-invariant features. In ICCV, pages 1150--1157, 1999. Google ScholarDigital Library
B. S. Manjunath and W.-Y. Ma. Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell., 18(8):837--842, 1996. Google ScholarDigital Library
A. S. Mian, Y. Hu, R. Hartley, and R. A. Owens. Image set based face recognition using self-regularized non-negative coding and adaptive distance metric learning. IEEE Transactions on Image Processing, 22(12):5252--5262, 2013. Google ScholarDigital Library
T. Mikolov, W. tau Yih, and G. Zweig. Linguistic regularities in continuous space word representations. In HLT-NAACL, pages 746--751, 2013.Google Scholar
M. Norouzi, D. J. Fleet, and R. Salakhutdinov. Hamming distance metric learning. In NIPS, pages 1070--1078, 2012.Google ScholarDigital Library
A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145--175, 2001. Google ScholarDigital Library
A. Oliva and A. Torralba. Scene-centered description from spatial envelope properties. In Biologically Motivated Computer Vision, pages 263--272, 2002. Google ScholarDigital Library
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In CVPR, 2007.Google ScholarCross Ref
A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. CoRR, abs/1403.6382, 2014. Google ScholarDigital Library
R. Salakhutdinov and G. E. Hinton. Deep Boltzmann machines. In AISTATS, pages 448--455, 2009.Google Scholar
R. Salakhutdinov and G. E. Hinton. Semantic hashing. Int. J. Approx. Reasoning, 50(7):969--978, 2009. Google ScholarDigital Library
R. Salakhutdinov, A. Mnih, and G. E. Hinton. Restricted boltzmann machines for collaborative filtering. In ICML, pages 791--798, 2007. Google ScholarDigital Library
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. CoRR, abs/1312.6229, 2013.Google Scholar
J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman. Discovering objects and their localization in images. In ICCV, pages 370--377, 2005. Google ScholarDigital Library
A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell., 22(12):1349--1380, 2000. Google ScholarDigital Library
D. Wang, S. C. H. Hoi, P. Wu, J. Zhu, Y. He, and C. Miao. Learning to name faces: a multimodal learning scheme for search-based face annotation. In SIGIR, pages 443--452, 2013. Google ScholarDigital Library
Z. Wang, Y. Hu, and L.-T. Chia. Learning image-to-class distance metric for image classification. ACM TIST, 4(2):34, 2013. Google ScholarDigital Library
K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS, 2005.Google ScholarDigital Library
J. Wu and J. M. Rehg. Centrist: A visual descriptor for scene categorization. IEEE Trans. Pattern Anal. Mach. Intell., 33(8):1489--1501, 2011. Google ScholarDigital Library
L. Wu and S. C. H. Hoi. Enhancing bag-of-words models with semantics-preserving metric learning. IEEE MultiMedia, 18(1):24--37, 2011. Google ScholarDigital Library
L. Wu, S. C. H. Hoi, and N. Yu. Semantics-preserving bag-of-words models and applications. IEEE Transactions on Image Processing, 19(7):1908--1920, 2010. Google ScholarDigital Library
P. Wu, S. C. H. Hoi, H. Xia, P. Zhao, D. Wang, and C. Miao. Online multimodal deep similarity learning with application to image retrieval. In ACM Multimedia, pages 153--162, 2013. Google ScholarDigital Library
H. Xie, Y. Zhang, J. Tan, L. Guo, and J. Li. Contextual query expansion for image retrieval. IEEE Transactions on Multimedia, 16(4):1104--1114, 2014. Google ScholarDigital Library
J. Yang, Y.-G. Jiang, A. G. Hauptmann, and C.-W. Ngo. Evaluating bag-of-visual-words representations in scene classification. In Multimedia Information Retrieval, pages 197--206, 2007. Google ScholarDigital Library
D. Yu, M. L. Seltzer, J. Li, J.-T. Huang, and F. Seide. Feature learning in deep neural networks - a study on speech recognition tasks. CoRR, abs/1301.3605, 2013.Google Scholar
M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. CoRR, abs/1311.2901, 2013.Google Scholar
L. Zhang, Y. Zhang, X. Gu, J. Tang, and Q. Tian. Scalable similarity search with topology preserving hashing. IEEE Transactions on Image Processing, 23(7):3025--3039, 2014.Google ScholarCross Ref
Y. Zhang, L. Zhang, and Q. Tian. A prior-free weighting scheme for binary code ranking. IEEE Transactions on Multimedia, 16(4):1127--1139, 2014. Google ScholarDigital Library

Index Terms

Deep Learning for Content-Based Image Retrieval: A Comprehensive Study
1. Computing methodologies
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Content-based image retrieval with compact deep convolutional features

Convolutional neural networks (CNNs) with deep learning have recently achieved a remarkable success with a superior performance in computer vision applications. Most of CNN-based methods extract image features at the last layer using a single CNN ...
Read More
Deep convolutional learning for Content Based Image Retrieval

In this paper we propose a model retraining method for learning more efficient convolutional representations for Content Based Image Retrieval. We employ a deep CNN model to obtain the feature representations from the activations of the convolutional ...
Read More
Content-based image retrieval by clustering
MIR '03: Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval

In a typical content-based image retrieval (CBIR) system, query results are a set of images sorted by feature similarities with respect to the query. However, images with high feature similarities to the query may be very different from the query in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '14: Proceedings of the 22nd ACM international conference on Multimedia
November 2014
1310 pages
ISBN:9781450330633
DOI:10.1145/2647868
General Chairs:
Kien A. Hua
University of Central Florida, USA
,
Yong Rui
Microsoft Research, China
,
Ralf Steinmetz
Technische Universitt Darmstadt, Germany
,
Program Chairs:
Alan Hanjalic
Delft University of Technology, Netherlands
,
Apostol (Paul) Natsev
Google, USA
,
Wenwu Zhu
Tsinghua University, China
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
content-based image retrieval
convolutional neural networks
deep learning
feature representation
Qualifiers
- research-article
Conference

Acceptance Rates
MM '14 Paper Acceptance Rate55of286submissions,19%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 574
  Total Citations
  View Citations
- 6,654
  Total Downloads
- Downloads (Last 12 months)275
- Downloads (Last 6 weeks)36
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Deep Learning for Content-Based Image Retrieval: A Comprehensive Study

MM '14: Proceedings of the 22nd ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Content-based image retrieval with compact deep convolutional features

Deep convolutional learning for Content Based Image Retrieval

Content-based image retrieval by clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Deep Learning for Content-Based Image Retrieval: A Comprehensive Study

MM '14: Proceedings of the 22nd ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Content-based image retrieval with compact deep convolutional features

Deep convolutional learning for Content Based Image Retrieval

Content-based image retrieval by clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media