Bregman pooling: feature-space local pooling for image classification

Najjar, Alameen; Ogawa, Takahiro; Haseyama, Miki

doi:10.1007/s13735-015-0086-z

Bregman pooling: feature-space local pooling for image classification

Regular Paper
Published: 04 September 2015

Volume 4, pages 247–259, (2015)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Alameen Najjar¹,
Takahiro Ogawa¹ &
Miki Haseyama¹

160 Accesses
Explore all metrics

Abstract

In this paper, we propose a novel feature-space local pooling method for the commonly adopted architecture of image classification. While existing methods partition the feature space based on visual appearance to obtain pooling bins, learning more accurate space partitioning that takes semantics into account boosts performance even for a smaller number of bins. To this end, we propose partitioning the feature space over clusters of visual prototypes common to semantically similar images (i.e., images belonging to the same category). The clusters are obtained by Bregman co-clustering applied offline on a subset of training data. Therefore, being aware of the semantic context of the input image, our features have higher discriminative power than do those pooled from appearance-based partitioning. Testing on four datasets (Caltech-101, Caltech-256, 15 Scenes, and 17 Flowers) belonging to three different classification tasks showed that the proposed method outperforms methods in previous works on local pooling in the feature space for less feature dimensionality. Moreover, when implemented within a spatial pyramid, our method achieves comparable results on three of the datasets used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatial locality-preserving feature coding for image classification

Article 21 February 2017

Rediscover flowers structurally

Article 13 April 2017

SPLeaP: Soft Pooling of Learned Parts for Image Classification

Notes

Calculating the Normalized Mutual Information (NMI) between image labels and the obtained clusters. Higher NMI means better co-clustering.

References

Avila S, Thome N, Cord M, Valle E, De AraúJo A (2013) Pooling in image representation: the visual codeword point of view. Comp Vision Image Underst (CVIU) 117(5):453–465
Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha DS (2007) A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. J Mach Learn Res (JMLR) 8:1919–1986
MATH MathSciNet Google Scholar
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comp Vision Image Underst (CVIU) 110(3):346–359
Article Google Scholar
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intel (PAMI) 35(8):1798–1828
Article Google Scholar
Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8
Boureau YL (2012) Learning hierarchical feature extractors for image recognition. PhD thesis, New York University
Boureau YL, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2559–2566
Boureau YL, Le Roux N, Bach F, Ponce J, LeCun Y (2011) Ask the locals: multi-way local pooling for image recognition. In: Proceedings of the International Conference on Computer Vision (ICCV), pp 2651–2658
Bregman L (1967) The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comp Math Math Phys 7(3):200–217
Article Google Scholar
Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the British Machine Vision Conference (BMVC), pp 76.1–76.12
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. arXiv:1405.3531
Chen Q, Song Z, Hua Y, Huang Z, Yan S (2012) Hierarchical matching with side information for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3426–3433
Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the International Society for Computational Biology, pp 93–103
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision (ECCV), pp 1–22
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 886–893
Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 89–98
Everingham M, Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comp Vision (IJCV) 88(2):303–338
Article Google Scholar
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res (JMLR) 9:1871–1874
MATH Google Scholar
Fanello S, Noceti N, Ciliberto C, Metta G, Odone F (2014) Ask the image: supervised pooling to preserve feature locality. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 851–858
Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 524–531
Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 178–178
Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cyber 36:193–202
Article MATH MathSciNet Google Scholar
Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Proceedings of the International Conference on Computer Vision (ICCV), pp 1458–1465
Griffin G, Holub A, Perona P (2007) The Caltech 256. Tech. rep, California institute of technology
Gupta A, Bowden R (2012) Unity in diversity: discovering topics from words: Information theoretic co-clustering for visual categorization. In: Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), pp 628–633
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67(337):123–129
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 346–361
Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160:106–154
Article Google Scholar
Jegou H, Perronnin F, Douze M, Sanchez J, Perez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intel (PAMI) 34(9):1704–1716
Article Google Scholar
Jia Y, Huang C, Darrell T (2012) Beyond spatial pyramids: Receptive field learning for pooled image features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3370–3377
Khan R, Barat C, Muselet D, Ducottet C, Saint-Etienne F, Etienne F (2012) Spatial orientations of visual word pairs to improve bag-of-visual-words model. In: Proceedings of the British Machine Vision Conference (BMVC), pp 102–112
Khan R, Barat C, Muselet D, Ducottet C (2015) Spatial histograms of soft pairwise similar patches to improve the bag-of-visual-words model. Comp Vision Image Underst (CVIU) 132:102–112
Article Google Scholar
Koniusz P, Yan F, Mikolajczyk K (2013) Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection. Comp Vision Image Underst (CVIU) 117(5):479–492
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp 1097–1105
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2169–2178
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Liu J, Shah M (2007) Scene modeling using co-clustering. In: Proceedings of the International Conference on Computer Vision (ICCV), pp 1–7
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comp Vision (IJCV) 60(2):91–110
Article Google Scholar
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, pp 281–297
Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Gool LV (2005) A comparison of affine region detectors. Int J Comp Vision (IJCV) 65(1–2):43–72
Article Google Scholar
Nilsback ME, Zisserman A (2006) A visual vocabulary for flower classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1447–1454
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comp Vision (IJCV) 42(3):145–175
Article MATH Google Scholar
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 143–156
Rematas K, Fritz M, Tuytelaars T (2013) The pooled NBNN kernel: Beyond image-to-class and image-to-image. In: Proceedings of the Asian Conference on Computer Vision (ACCV), pp 176–189
Russakovsky O, Lin Y, Yu K, Fei-Fei L (2012) Object-centric spatial pooling for image classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 1–15
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2014) ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1470–1477
Vapnik VN (1998) Statistical learning theory, 1st edn. Wiley, New York
MATH Google Scholar
Vedaldi A, Fulkerson B (2008) VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/
Wang C, Huang K (2014) How to use bag-of-words model better for image classification. Image Vision Comp. doi:10.1016/j.imavis.2014.10.013
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3360–3367
Wang Z, Feng J, Yan S (2014) Collaborative linear coding for robust image classification. Int J Comp Vision (IJCV) 1–12
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1794–1801
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems (NIPS), pp 487–495
Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 141–154

Download references

Acknowledgments

This work was partly supported by Grant-in-Aid for Scientific Research (B) 25280036, Japan Society for the Promotion of Science (JSPS).

Author information

Authors and Affiliations

Laboratory of Media Dynamics, Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido, 060-0814, Japan
Alameen Najjar, Takahiro Ogawa & Miki Haseyama

Authors

Alameen Najjar
View author publications
You can also search for this author in PubMed Google Scholar
Takahiro Ogawa
View author publications
You can also search for this author in PubMed Google Scholar
Miki Haseyama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alameen Najjar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Najjar, A., Ogawa, T. & Haseyama, M. Bregman pooling: feature-space local pooling for image classification. Int J Multimed Info Retr 4, 247–259 (2015). https://doi.org/10.1007/s13735-015-0086-z

Download citation

Received: 09 April 2015
Revised: 31 July 2015
Accepted: 28 August 2015
Published: 04 September 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s13735-015-0086-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bregman pooling: feature-space local pooling for image classification

Abstract

Access this article

Similar content being viewed by others

Spatial locality-preserving feature coding for image classification

Rediscover flowers structurally

SPLeaP: Soft Pooling of Learned Parts for Image Classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bregman pooling: feature-space local pooling for image classification

Abstract

Access this article

Similar content being viewed by others

Spatial locality-preserving feature coding for image classification

Rediscover flowers structurally

SPLeaP: Soft Pooling of Learned Parts for Image Classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation