skip to main content
10.1145/1282280.1282352acmconferencesArticle/Chapter ViewAbstractPublication PagescivrConference Proceedingsconference-collections
Article

Towards optimal bag-of-features for object categorization and semantic video retrieval

Published:09 July 2007Publication History

ABSTRACT

Bag-of-features (BoF) deriving from local keypoints has recently appeared promising for object and scene classification. Whether BoF can naturally survive the challenges such as reliability and scalability of visual classification, nevertheless, remains uncertain due to various implementation choices. In this paper, we evaluate various factors which govern the performance of BoF. The factors include the choices of detector, kernel, vocabulary size and weighting scheme. We offer some practical insights in how to optimize the performance by choosing good keypoint detector and kernel. For the weighting scheme, we propose a novel soft-weighting method to assess the significance of a visual word to an image. We experimentally show that the proposed soft-weighting scheme can consistently offer better performance than other popular weighting methods. On both PASCAL-2005 and TRECVID-2006 datasets, our BoF setting generates competitive performance compared to the state-of-the-art techniques. We also show that the BoF is highly complementary to global features. By incorporating the BoF with color and texture features, an improvement of 50% is reported on TRECVID-2006 dataset.

References

  1. LSCOM lexicon definitions and annotations. In DTO Challenge Workshop on Large Scale Concept Ontology for Multimedia, Columbia University ADVENT Technical Report #217-2006-3, 2006.Google ScholarGoogle Scholar
  2. A. C. Berg and J. Malik. Geometric blur for template matching. In IEEE CVPR, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Campbell et al. IBM research trecvid-2006 video retrieval system. In TRECVID, 2006.Google ScholarGoogle Scholar
  4. J. Cao et al. Intelligent multimedia group of Tsinghua university at trecvid 2006. In TRECVID, 2006.Google ScholarGoogle Scholar
  5. C. C. Chang and C. J. Lin. LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm, 2001.Google ScholarGoogle Scholar
  6. O. Chapelle, P. Haffner, and V. N. Vapnik. Support vector machines for histogram-based image classification. IEEE Trans. on NN, 10(5), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Everingham et al. The 2005 pascal visual object classes challenge. In LNAI, volume 3944, pages 117--176. Springer-Verlag, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Fowlkes, S. Belongie, F. Chung, and J. Malik. Spectral grouping using the nystrom method. IEEE Trans. on PAMI, 26(2), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. G. Hauptmann et al. Multi-lingual broadcast news retrieval. In TRECVID, 2006.Google ScholarGoogle Scholar
  10. D. Larlus, G. Dorko, and F. Jurie. Creation de vocabulaires visuels efficaces pour la categorisation d'images. In Reconnaissance des Formes et Intelligence Artificielle, 2006.Google ScholarGoogle Scholar
  11. S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE CVPR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Lindeberg. Feature detection with automatic scale selection. Int. J. of Computer Vision, 30:79--116, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Lowe. Distinctive image features from scale-invariant keypoints. Int. Journal on Computer Vision, 60(2):91--110, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Mikolajczyk et. al. A comparison of affine region detectors. Int. Journal on Computer Vision, 65(1/2):43--72, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Mikolajczyk and C. Schmid. Scale and affine invariant interest point detectors. Int. Journal of Computer Vision, 60:63--86, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Agarwal, and B. Triggs. Hyperfeatures - multilevel local coding for visual recognition. In ECCV, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Nowak et al. Sampling strategies for bag-of-features image classification. In ECCV, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Odone et al. Building kernels from binary strings for image matching. IEEE Trans. on IP, 14(2), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Petrov et al. Detecting categories in news video using acoustic, speech, and image features. In TRECVID, 2006.Google ScholarGoogle Scholar
  20. J. Platt. Probabilities for SV machines. In Advances in Large Margin Classifiers, pages 61--74, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  21. J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In ICCV, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. G. M. Snoek et al. The mediamill trecvid 2006 semantic video search engine. In TRECVID, 2006.Google ScholarGoogle Scholar
  23. TREC Video Retrieval Evaluation (TRECVID). http://www-nlpir.nist.gov/projects/trecvid/.Google ScholarGoogle Scholar
  24. V. Vapnik. The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: An in-depth study. In INRIA Technical Report RR-5737, 2005.Google ScholarGoogle Scholar

Index Terms

  1. Towards optimal bag-of-features for object categorization and semantic video retrieval

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIVR '07: Proceedings of the 6th ACM international conference on Image and video retrieval
          July 2007
          655 pages
          ISBN:9781595937339
          DOI:10.1145/1282280

          Copyright © 2007 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 July 2007

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader