ABSTRACT
Convolutional Neural Network (CNN) is a very powerful approach to extract discriminative local descriptors for effective image search. Recent work adopts fine-tuned strategies to further improve the discriminative power of the descriptors. Taking a different approach, in this paper, we propose a novel framework to achieve competitive retrieval performance. Firstly, we propose various masking schemes, namely SIFT-mask, SUM-mask, and MAX-mask, to select a representative subset of local convolutional features and remove a large number of redundant features. We demonstrate that this can effectively address the burstiness issue and improve retrieval accuracy. Secondly, we propose to employ recent embedding and aggregating methods to further enhance feature discriminability. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art retrieval accuracy.
- Relja Arandjelović, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition CVPR.Google Scholar
- Relja Arandjelović and Andrew Zisserman. 2012. Three things everyone should know to improve object retrieval CVPR. Google ScholarDigital Library
- Hossein Azizpour, Ali Sharif Razavian, Josephine Sullivan, Atsuto Maki, and Stefan Carlsson. 2015. From generic to specific deep representations for visual recognition CVPR Workshops.Google Scholar
- Artem Babenko and Victor Lempitsky. 2015. Aggregating Local Deep Features for Image Retrieval ICCV.Google Scholar
- Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural codes for image retrieval. In ECCV.Google Scholar
- Y-Lan Boureau, Jean Ponce, and Yann Lecun. 2010. A Theoretical Analysis of Feature Pooling in Visual Recognition ICML. Google ScholarDigital Library
- Jiewei Cao, Zi Huang, Peng Wang, Chao Li, Xiaoshuai Sun, and Heng Tao Shen. 2016. Quartet-net Learning for Visual Instance Retrieval ACM MM. Google ScholarDigital Library
- Jonathan Delhumeau, Philippe-Henri Gosselin, Hervé Jégou, and Patrick Pérez. 2013. Revisiting the VLAD image representation. In ACM MM. Google ScholarDigital Library
- Thanh-Toan Do and Ngai-Man Cheung. 2017. Embedding based on function approximation for large scale image search. TPAMI (2017).Google Scholar
- Thanh-Toan Do, Anh-Dzung Doan, and Ngai-Man Cheung. 2016. Learning to hash with binary deep neural network. ECCV.Google Scholar
- Thanh-Toan Do, Dang-Khoa Le Tan, Trung T Pham, and Ngai-Man Cheung. 2017. Simultaneous Feature Aggregating and Hashing for Large-scale Image Search CVPR.Google Scholar
- Thanh-Toan Do, Quang Tran, and Ngai-Man Cheung. 2015. FAemb: A function approximation-based embedding method for image retrieval CVPR.Google Scholar
- Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation CVPR. Google ScholarDigital Library
- Yunchao Gong, Liwei Wang, Ruiqi Guo, and Svetlana Lazebnik. 2014. Multi-scale orderless pooling of deep convolutional activation features ECCV.Google Scholar
- Albert Gordo, Jon Almazan, Jerome Revaud, and Diane Larlus. 2016. Deep Image Retrieval: Learning Global Representations for Image Search ECCV.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385 (2015).Google Scholar
- Hervé Jégou and Ondvrej Chum. 2012. Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening ECCV.Google Scholar
- Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2009. On the burstiness of visual elements. In CVPR.Google Scholar
- Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2010. Improving Bag-of-Features for Large Scale Image Search. IJCV, Vol. 87, 3 (May. 2010), 316--336. Google ScholarDigital Library
- Hervé Jégou, Matthijs Douze, Cordelia Schmid, and Patrick Pérez. 2010. Aggregating local descriptors into a compact image representation CVPR.Google Scholar
- Hervé Jégou and Andrew Zisserman. 2014. Triangulation embedding and democratic aggregation for image search CVPR.Google Scholar
- Yannis Kalantidis, Clayton Mellina, and Simon Osindero. 2016. Cross-dimensional Weighting for Aggregated Deep Convolutional Features ECCV Workshops.Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks NIPS. Google ScholarDigital Library
- Ying Li, Xiangwei Kong, Liang Zheng, and Qi Tian. 2016. Exploiting Hierarchical Activations of Neural Network for Image Retrieval ACM MM. Google ScholarDigital Library
- David G. Lowe. 1999. Object Recognition from Local Scale-Invariant Features ICCV. Google ScholarDigital Library
- Romain Negrel, David Picard, and P Gosselin. 2013. Web scale image retrieval using compact tensor aggregation of visual descriptors MultiMedia, Vol. Vol. 20. IEEE, 24--33. Google ScholarDigital Library
- Florent Perronnin and Christopher Dance. 2007. Fisher Kernels on Visual Vocabularies for Image Categorization CVPR.Google Scholar
- Florent Perronnin, Jorge Sánchez, and Thomas Mensink. 2010. Improving the fisher kernel for large-scale image classification ECCV. Google ScholarDigital Library
- James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching CVPR.Google Scholar
- James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2008. Lost in quantization: Improving particular object retrieval in large scale image databases CVPR.Google Scholar
- Filip Radenović, Giorgos Tolias, and Ondvrej Chum. 2016. CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples ECCV.Google Scholar
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks NIPS. Google ScholarDigital Library
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. IJCV, Vol. 115, 3 (2015), 211--252. Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Josef Sivic, Andrew Zisserman, and others. 2003. Video Google: a text retrieval approach to object matching in videos ICCV. Google ScholarDigital Library
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In CVPR.Google Scholar
- Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM Vol. 59, 2 (2016), 64--73. Google ScholarDigital Library
- Giorgos Tolias, Yannis Avrithis, and Hervé Jégou. 2013. To Aggregate or Not to aggregate: Selective Match Kernels for Image Search ICCV. Google ScholarDigital Library
- Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2016. Particular object retrieval with integral max-pooling of CNN activations ICLR.Google Scholar
- Andrea Vedaldi and Brian Fulkerson. 2008. VLFeat: An Open and Portable Library of Computer Vision Algorithms. http://www.vlfeat.org/,. (2008).Google Scholar
- Andrea Vedaldi and Karel Lenc. 2014. MatConvNet - Convolutional Neural Networks for MATLAB. CoRR Vol. abs/1412.4564 (2014). http://arxiv.org/abs/1412.4564Google Scholar
- Ke Yan, Yaowei Wang, Dawei Liang, Tiejun Huang, and Yonghong Tian. 2016. CNN vs. SIFT for Image Retrieval: Alternative or Complementary? ACM MM. Google ScholarDigital Library
- Kai Yu and Tong Zhang. 2010. Improved Local Coordinate Coding using Local Tangents ICML. Google ScholarDigital Library
- Matthew D. Zeiler and Rob Fergus. 2013. Visualizing and Understanding Convolutional Networks. CoRR Vol. abs/1311.2901 (2013). http://arxiv.org/abs/1311.2901Google Scholar
Index Terms
- Selective Deep Convolutional Features for Image Retrieval
Recommendations
From Selective Deep Convolutional Features to Compact Binary Representations for Image Retrieval
In the large-scale image retrieval task, the two most important requirements are the discriminability of image representations and the efficiency in computation and storage of representations. Regarding the former requirement, Convolutional Neural ...
Deep convolutional features for image retrieval
Highlights- A comprehensive study that explores deep convolutional features for CBIR.
- The ...
AbstractNowadays, the use of Convolutional Neural Networks (CNNs) has led to tremendous achievements in several computer vision challenges. CNN-based image retrieval methods vary in complexity, growing capacity, and execution time. This work ...
Reproducibility Companion Paper: Selective Deep Convolutional Features for Image Retrieval
MM '20: Proceedings of the 28th ACM International Conference on MultimediaIn this companion paper, firstly, we briefly summarize the contributions of our main manuscript: Selective Deep Convolutional Features for Image Retrieval, published in ACM MultiMedia 2017. In addition, we provide detail instructions together with pre-...
Comments