skip to main content
10.1145/1290082.1290111acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Evaluating bag-of-visual-words representations in scene classification

Published:24 September 2007Publication History

ABSTRACT

Based on keypoints extracted as salient image patches, an image can be described as a "bag of visual words" and this representation has been used in scene classification. The choice of dimension, selection, and weighting of visual words in this representation is crucial to the classification performance but has not been thoroughly studied in previous work. Given the analogy between this representation and the bag-of-words representation of text documents, we apply techniques used in text categorization, including term weighting, stop word removal, feature selection, to generate image representations that differ in the dimension, selection, and weighting of visual words. The impact of these representation choices to scene classification is studied through extensive experiments on the TRECVID and PASCAL collection. This study provides an empirical basis for designing visual-word representations that are likely to produce superior classification performance.

References

  1. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press Series/Addison Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121--167, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Carneiro, A. B. Chan, P. J. Moreno, and N. Vasconcelos. Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 29(3):394--410, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. pages 148--155, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y.-G. Jiang, C.-W. Ngo, and J. Yang. Towards optimal bag-of-features for object categorization and semantic video retrieval. In Proc. of ACM Int'l Conf. on Image and Video Retrieval, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Joachims. Text categorization with suport vector machines: Learning with many relevant features. In Proc. of the 10th European Conf. on Machine Learning, pages 137--142. Springer-Verlag, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Ke and R. Sukthankar. Pca-sift: A more distinctive representation for local image descriptors. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. of 2006 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, volume 2, pages 2169--2178, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F.-F. Li and P. Perona. A bayesian hierarchical model for learning natural scene categories. In Proc. of the 2005 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pages 524--531, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Li and J. Z. Wang. Real-time computerized annotation of pictures. In Proc. of the 14th Annual ACM Int'l Conf. on Multimedia, pages 911--920, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91--110, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Mikolajczyk and C. Schmid. Scale and affine invariant interest point detectors. Int. J. Comput. Vision, 60(1):63--86, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10):1615--1630, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. R. Naphade, L. Kennedy, J. R. Kender, S. F. Chang, J. Smith, P. Over, and A. Hauptmann. A light scale concept ontology for multimedia understanding for TRECVID 2005. In IBM Research Technical Report, 2005.Google ScholarGoogle Scholar
  15. D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In Proc. of 2006 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pages 2161--2168, Los Alamitos, CA, USA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Research and Development in Information Retrieval, pages 275--281, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management: an Int'l Journal, 25(5):513--523, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In Proc. of 9th IEEE Int'l Conf. on Computer Vision, Vol. 2, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Smeaton and P. Over. Trecvid: Benchmarking the effectiveness of infomration retrieval tasks on digital video. In Proc. of the Intl. Conf. on Image and Video Retrieval, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Yang and X. Liu. A re-examination of text categorization methods. In Proc. of the 22nd Annual int'l ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 42--49, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Yang and J. Pedersen. A comparative study on feature selection in text categorization. In 14th Int'l Conf. on Machine Learning, pages 412--420, 1997. Google ScholarGoogle Scholar
  22. J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: An in-depth study. In Technical report, INRIA, 2005.Google ScholarGoogle Scholar
  23. W. Zhao, Y.-G. Jiang, and C.-W. Ngo. Keyframe retrieval by keypoints: Can point-to-point matching help? In Proc. of 5th Int'l Conf. on Image and Video Retrieval (CIVR), pages 72--81, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Evaluating bag-of-visual-words representations in scene classification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MIR '07: Proceedings of the international workshop on Workshop on multimedia information retrieval
        September 2007
        343 pages
        ISBN:9781595937780
        DOI:10.1145/1290082

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 September 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader