Skip to main content
Log in

Bregman pooling: feature-space local pooling for image classification

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

In this paper, we propose a novel feature-space local pooling method for the commonly adopted architecture of image classification. While existing methods partition the feature space based on visual appearance to obtain pooling bins, learning more accurate space partitioning that takes semantics into account boosts performance even for a smaller number of bins. To this end, we propose partitioning the feature space over clusters of visual prototypes common to semantically similar images (i.e., images belonging to the same category). The clusters are obtained by Bregman co-clustering applied offline on a subset of training data. Therefore, being aware of the semantic context of the input image, our features have higher discriminative power than do those pooled from appearance-based partitioning. Testing on four datasets (Caltech-101, Caltech-256, 15 Scenes, and 17 Flowers) belonging to three different classification tasks showed that the proposed method outperforms methods in previous works on local pooling in the feature space for less feature dimensionality. Moreover, when implemented within a spatial pyramid, our method achieves comparable results on three of the datasets used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Calculating the Normalized Mutual Information (NMI) between image labels and the obtained clusters. Higher NMI means better co-clustering.

References

  1. Avila S, Thome N, Cord M, Valle E, De AraúJo A (2013) Pooling in image representation: the visual codeword point of view. Comp Vision Image Underst (CVIU) 117(5):453–465

  2. Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha DS (2007) A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. J Mach Learn Res (JMLR) 8:1919–1986

    MATH  MathSciNet  Google Scholar 

  3. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comp Vision Image Underst (CVIU) 110(3):346–359

    Article  Google Scholar 

  4. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intel (PAMI) 35(8):1798–1828

    Article  Google Scholar 

  5. Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–8

  6. Boureau YL (2012) Learning hierarchical feature extractors for image recognition. PhD thesis, New York University

  7. Boureau YL, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2559–2566

  8. Boureau YL, Le Roux N, Bach F, Ponce J, LeCun Y (2011) Ask the locals: multi-way local pooling for image recognition. In: Proceedings of the International Conference on Computer Vision (ICCV), pp 2651–2658

  9. Bregman L (1967) The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comp Math Math Phys 7(3):200–217

    Article  Google Scholar 

  10. Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the British Machine Vision Conference (BMVC), pp 76.1–76.12

  11. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. arXiv:1405.3531

  12. Chen Q, Song Z, Hua Y, Huang Z, Yan S (2012) Hierarchical matching with side information for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3426–3433

  13. Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the International Society for Computational Biology, pp 93–103

  14. Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision (ECCV), pp 1–22

  15. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 886–893

  16. Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 89–98

  17. Everingham M, Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comp Vision (IJCV) 88(2):303–338

    Article  Google Scholar 

  18. Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res (JMLR) 9:1871–1874

    MATH  Google Scholar 

  19. Fanello S, Noceti N, Ciliberto C, Metta G, Odone F (2014) Ask the image: supervised pooling to preserve feature locality. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 851–858

  20. Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 524–531

  21. Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 178–178

  22. Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cyber 36:193–202

    Article  MATH  MathSciNet  Google Scholar 

  23. Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Proceedings of the International Conference on Computer Vision (ICCV), pp 1458–1465

  24. Griffin G, Holub A, Perona P (2007) The Caltech 256. Tech. rep, California institute of technology

  25. Gupta A, Bowden R (2012) Unity in diversity: discovering topics from words: Information theoretic co-clustering for visual categorization. In: Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), pp 628–633

  26. Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67(337):123–129

    Article  Google Scholar 

  27. He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 346–361

  28. Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160:106–154

    Article  Google Scholar 

  29. Jegou H, Perronnin F, Douze M, Sanchez J, Perez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intel (PAMI) 34(9):1704–1716

    Article  Google Scholar 

  30. Jia Y, Huang C, Darrell T (2012) Beyond spatial pyramids: Receptive field learning for pooled image features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3370–3377

  31. Khan R, Barat C, Muselet D, Ducottet C, Saint-Etienne F, Etienne F (2012) Spatial orientations of visual word pairs to improve bag-of-visual-words model. In: Proceedings of the British Machine Vision Conference (BMVC), pp 102–112

  32. Khan R, Barat C, Muselet D, Ducottet C (2015) Spatial histograms of soft pairwise similar patches to improve the bag-of-visual-words model. Comp Vision Image Underst (CVIU) 132:102–112

    Article  Google Scholar 

  33. Koniusz P, Yan F, Mikolajczyk K (2013) Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection. Comp Vision Image Underst (CVIU) 117(5):479–492

    Article  Google Scholar 

  34. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp 1097–1105

  35. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2169–2178

  36. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  37. Liu J, Shah M (2007) Scene modeling using co-clustering. In: Proceedings of the International Conference on Computer Vision (ICCV), pp 1–7

  38. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comp Vision (IJCV) 60(2):91–110

    Article  Google Scholar 

  39. MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, pp 281–297

  40. Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Gool LV (2005) A comparison of affine region detectors. Int J Comp Vision (IJCV) 65(1–2):43–72

    Article  Google Scholar 

  41. Nilsback ME, Zisserman A (2006) A visual vocabulary for flower classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1447–1454

  42. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comp Vision (IJCV) 42(3):145–175

    Article  MATH  Google Scholar 

  43. Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 143–156

  44. Rematas K, Fritz M, Tuytelaars T (2013) The pooled NBNN kernel: Beyond image-to-class and image-to-image. In: Proceedings of the Asian Conference on Computer Vision (ACCV), pp 176–189

  45. Russakovsky O, Lin Y, Yu K, Fei-Fei L (2012) Object-centric spatial pooling for image classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 1–15

  46. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2014) ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575

  47. Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1470–1477

  48. Vapnik VN (1998) Statistical learning theory, 1st edn. Wiley, New York

    MATH  Google Scholar 

  49. Vedaldi A, Fulkerson B (2008) VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/

  50. Wang C, Huang K (2014) How to use bag-of-words model better for image classification. Image Vision Comp. doi:10.1016/j.imavis.2014.10.013

  51. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3360–3367

  52. Wang Z, Feng J, Yan S (2014) Collaborative linear coding for robust image classification. Int J Comp Vision (IJCV) 1–12

  53. Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1794–1801

  54. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems (NIPS), pp 487–495

  55. Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 141–154

Download references

Acknowledgments

This work was partly supported by Grant-in-Aid for Scientific Research (B) 25280036, Japan Society for the Promotion of Science (JSPS).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alameen Najjar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Najjar, A., Ogawa, T. & Haseyama, M. Bregman pooling: feature-space local pooling for image classification. Int J Multimed Info Retr 4, 247–259 (2015). https://doi.org/10.1007/s13735-015-0086-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-015-0086-z

Keywords

Navigation