ABSTRACT
Bag-of-features (BoF) deriving from local keypoints has recently appeared promising for object and scene classification. Whether BoF can naturally survive the challenges such as reliability and scalability of visual classification, nevertheless, remains uncertain due to various implementation choices. In this paper, we evaluate various factors which govern the performance of BoF. The factors include the choices of detector, kernel, vocabulary size and weighting scheme. We offer some practical insights in how to optimize the performance by choosing good keypoint detector and kernel. For the weighting scheme, we propose a novel soft-weighting method to assess the significance of a visual word to an image. We experimentally show that the proposed soft-weighting scheme can consistently offer better performance than other popular weighting methods. On both PASCAL-2005 and TRECVID-2006 datasets, our BoF setting generates competitive performance compared to the state-of-the-art techniques. We also show that the BoF is highly complementary to global features. By incorporating the BoF with color and texture features, an improvement of 50% is reported on TRECVID-2006 dataset.
- LSCOM lexicon definitions and annotations. In DTO Challenge Workshop on Large Scale Concept Ontology for Multimedia, Columbia University ADVENT Technical Report #217-2006-3, 2006.Google Scholar
- A. C. Berg and J. Malik. Geometric blur for template matching. In IEEE CVPR, 2001.Google ScholarCross Ref
- M. Campbell et al. IBM research trecvid-2006 video retrieval system. In TRECVID, 2006.Google Scholar
- J. Cao et al. Intelligent multimedia group of Tsinghua university at trecvid 2006. In TRECVID, 2006.Google Scholar
- C. C. Chang and C. J. Lin. LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm, 2001.Google Scholar
- O. Chapelle, P. Haffner, and V. N. Vapnik. Support vector machines for histogram-based image classification. IEEE Trans. on NN, 10(5), 1999. Google ScholarDigital Library
- M. Everingham et al. The 2005 pascal visual object classes challenge. In LNAI, volume 3944, pages 117--176. Springer-Verlag, 2005. Google ScholarDigital Library
- C. Fowlkes, S. Belongie, F. Chung, and J. Malik. Spectral grouping using the nystrom method. IEEE Trans. on PAMI, 26(2), 2004. Google ScholarDigital Library
- A. G. Hauptmann et al. Multi-lingual broadcast news retrieval. In TRECVID, 2006.Google Scholar
- D. Larlus, G. Dorko, and F. Jurie. Creation de vocabulaires visuels efficaces pour la categorisation d'images. In Reconnaissance des Formes et Intelligence Artificielle, 2006.Google Scholar
- S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE CVPR, 2006. Google ScholarDigital Library
- T. Lindeberg. Feature detection with automatic scale selection. Int. J. of Computer Vision, 30:79--116, 1998. Google ScholarDigital Library
- D. Lowe. Distinctive image features from scale-invariant keypoints. Int. Journal on Computer Vision, 60(2):91--110, 2004. Google ScholarDigital Library
- K. Mikolajczyk et. al. A comparison of affine region detectors. Int. Journal on Computer Vision, 65(1/2):43--72, 2005. Google ScholarDigital Library
- K. Mikolajczyk and C. Schmid. Scale and affine invariant interest point detectors. Int. Journal of Computer Vision, 60:63--86, 2004. Google ScholarDigital Library
- A. Agarwal, and B. Triggs. Hyperfeatures - multilevel local coding for visual recognition. In ECCV, 2006. Google ScholarDigital Library
- E. Nowak et al. Sampling strategies for bag-of-features image classification. In ECCV, 2006. Google ScholarDigital Library
- F. Odone et al. Building kernels from binary strings for image matching. IEEE Trans. on IP, 14(2), 2005. Google ScholarDigital Library
- S. Petrov et al. Detecting categories in news video using acoustic, speech, and image features. In TRECVID, 2006.Google Scholar
- J. Platt. Probabilities for SV machines. In Advances in Large Margin Classifiers, pages 61--74, 2000.Google ScholarCross Ref
- J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In ICCV, 2003. Google ScholarDigital Library
- C. G. M. Snoek et al. The mediamill trecvid 2006 semantic video search engine. In TRECVID, 2006.Google Scholar
- TREC Video Retrieval Evaluation (TRECVID). http://www-nlpir.nist.gov/projects/trecvid/.Google Scholar
- V. Vapnik. The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995. Google ScholarDigital Library
- J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: An in-depth study. In INRIA Technical Report RR-5737, 2005.Google Scholar
Index Terms
Towards optimal bag-of-features for object categorization and semantic video retrieval
Recommendations
Random interest regions for object recognition based on texture descriptors and bag of features
In this work we propose a novel method for object recognition based on a random selection of interest regions, texture features (local binary/ternary patterns and local phase quantization) for describing each region, a bag-of-features approach for ...
Evaluation of local features and classifiers in BOW model for image classification
Bag-of-word (BOW) is used in many state-of-the-art methods of image classification, and it is especially suitable for multi-class classification. Many kinds of local features and classifiers are applicable for the BOW model. However, it is unclear which ...
Reducing the dimensionality of the SIFT descriptor and increasing its effectiveness and efficiency in image retrieval via bag-of-features
WebMedia '12: Proceedings of the 18th Brazilian symposium on Multimedia and the webThe Bag-of-Features is a popular approach to describe multimedia information by using visual words. The SIFT (Scale Invariant Feature Transform) is one of the most utilized descriptor to model multimedia information in Bag of-Features. The data is ...
Comments