Skip to main content
Log in

Understanding bag-of-words model: a statistical framework

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

The bag-of-words model is one of the most popular representation methods for object categorization. The key idea is to quantize each extracted key point into one of visual words, and then represent each image by a histogram of the visual words. For this purpose, a clustering algorithm (e.g., K-means), is generally used for generating the visual words. Although a number of studies have shown encouraging results of the bag-of-words representation for object categorization, theoretical studies on properties of the bag-of-words model is almost untouched, possibly due to the difficulty introduced by using a heuristic clustering process. In this paper, we present a statistical framework which generalizes the bag-of-words representation. In this framework, the visual words are generated by a statistical process rather than using a clustering algorithm, while the empirical performance is competitive to clustering-based method. A theoretical analysis based on statistical consistency is presented for the proposed framework. Moreover, based on the framework we developed two algorithms which do not rely on clustering, while achieving competitive performance in object categorization when compared to clustering-based bag-of-words representations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Abramowitz M, Stegun IA (eds) (1972) Handbook of mathematical functions with formulas, graphs, and mathematical tables. Dover, New York

    MATH  Google Scholar 

  2. Bartlett PL, Wang M (2002) Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482

    Article  MathSciNet  Google Scholar 

  3. Csurka G, Dance C, Fan L, Williamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV workshop on statistical learning in computer vision, Prague, Czech Republic, 2004

  4. Everingham M, Zisserman A, Williams CKI, Van Gool L (2006) The PASCAL visual object classes challenge 2006 (VOC2006) results. http://www.pascal-network.org/challenges/VOC/voc2006/results.pdf

  5. Farquhar J, Szedmak S, Meng H, Shawe-Taylor J (2005) Improving “bag-of-keypoints” image categorisation. Technical report, University of Southampton

  6. Joachims T (1998) Text categorization with suport vector machines: learning with many relevant features. In: Proceedings of the 10th European conference on machine learning. Chemnitz, Germany, pp 137–142

  7. Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: Proceedings of the 10th IEEE international conference on computer vision, Beijing, China, 2005, pp 604–610

  8. Lazebnik S, Raginsky M (2009) Supervised learning of quantizer codebooks by information loss minimization. IEEE Trans Pattern Anal Mach Intell 31(7):1294–1309

    Article  Google Scholar 

  9. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  10. McCallum A, Nigam K (1998) A comparison of event models for naive bayes text classification. In: AAAI workshop on learning for text categorization, Madison, WI

  11. McDiarmid C (1989) On the method of bounded differences. In: Surveys in combinatorics 1989, pp 148–188

  12. Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual codebooks using randomized clustering forests. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems, vol 19. MIT Press, Cambridge, pp 985–992

  13. Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, New York, NY, pp 2161–2168

  14. Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: Proceedings of the 9th European conference on computer vision, Graz, Austria, pp 490–503

  15. Opelt A, Pinz A, Fussenegger M, Auer P (2006) Generic object recognition with boosting. IEEE Trans Pattern Anal Mach Intell 28(3):416–431

    Article  Google Scholar 

  16. Perronnin F, Dance C, Csurka G, Bressian M (2006) Adapted vocabularies for generic visual categorization. In: Proceedings of the 9th European conference on computer vision, Graz, Austria, pp 464–475

  17. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Anchorage, AK

  18. Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge

    Google Scholar 

  19. Shawe-Taylor J, Dolia A (2007) A framework for probability density estimation. In: Proceedings of the 11th international conference on artificial intelligence and statistics, San Juan, Puerto Rico, pp 468–475

  20. Sivic J, Zisserman A (2003) Video Google: A text retrieval approach to object matching in videos. In: Proceedings of the 9th IEEE international conference on computer vision, Nice, France, pp 1470–1477

  21. Tuytelaars T, Schmid C (2007) Vector quantizing feature space with a regular lattice. In: Proceedings of the 11th IEEE international conference on computer vision, Rio de Janeiro, Brazil, pp 1–8

  22. van Gemert JC, Geusebroek J-M, Veenman CJ, Smeulders AWM (2008) Kernel codebooks for scene categorization. In: Proceedings of the 10th European conference on computer vision, Marseille, France, pp 696–709

  23. Vedaldi A, Fulkerson B (2008) VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/

  24. Viitaniemi V, Laaksonen J (2008) Experiments on selection of codebooks for local image feature histograms. In: Proceedings of the 10th international conference series on visual information systems, Salerno, Italy, pp 126–137

  25. Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In: Proceedings of the 10th IEEE international conference on computer vision, Beijing, China, pp 1800–1807

Download references

Acknowledgments

We want to thank the reviewers for helpful comments and suggestions. This research is partially supported by the National Fundamental Research Program of China (2010CB327903), the Jiangsu 333 High-Level Talent Cultivation Program and the National Science Foundation (IIS-0643494). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhi-Hua Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Jin, R. & Zhou, ZH. Understanding bag-of-words model: a statistical framework. Int. J. Mach. Learn. & Cyber. 1, 43–52 (2010). https://doi.org/10.1007/s13042-010-0001-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-010-0001-0

Keywords

Navigation