ABSTRACT
This paper addresses the challenge of Multimedia Event Detection by proposing a novel method for high-level and low-level features fusion based on collective classification. Generally, the method consists of three steps: training a classifier from low-level features; encoding high-level features into graphs; and diffusing the scores on the established graph to obtain the final prediction. The final prediction is derived from multiple graphs each of which corresponds to a high-level feature. The paper investigates two graph construction methods using logarithmic and exponential loss functions, respectively and two collective classification algorithms, i.e. Gibbs sampling and Markov random walk. The theoretical analysis demonstrates that the proposed method converges and is computationally scalable and the empirical analysis on TRECVID 2011 Multimedia Event Detection dataset validates its outstanding performance compared to state-of-the-art methods, with an added benefit of interpretability.
- Laptev, T. Lindeberg. Space-time interest points. In ICCV, pages 432--439, Nice, France, 2003. Google ScholarDigital Library
- Li-Jia Li, Hao Su, Eric Xing, Fei-Fei Li. Object bank: a high-level image representation for scene classification and semantic feature sparsification. In NIPS, pages 1378--1386, Vancouver, Canada, 2010.Google Scholar
- C. Snoek, M. Worring, A. W. M. Smeulders. Early versus late fusion in semantic video analysis. In ACM Multimedia, pages 399--402, Singapore, 2005. Google ScholarDigital Library
- T. Pham, N. Maillot, J. Lim, J. Chevallet. Latent semantic fusion model for image retrieval and annotation. In CIKM, pages 439--444, Lisbon, Portugal, 2007. Google ScholarDigital Library
- H. Escalante, C. Hernández, L. Sucar, M. Montes. Late fusion of heterogeneous methods for multimedia image retrieval. In ACM MIR, pages 172--179, Vancouver, Canada, 2008. Google ScholarDigital Library
- J. Kludas, E. Bruno, S. Marchand-Maillet. Information fusion in multimedia information retrieval. In Adaptive Multimedia Retrieval, pages 147--159, Paris, France, 2007.Google Scholar
- L. Bao et al. Informedia@TRECVID 2011. In Trecvid Video Retrieval Evaluation Workshop, NIST, Gaitherburg, USA, 2011.Google Scholar
- H. Eldardiry, J. Neville. Across-Model collective ensemble classification. In AAAI, to appear, San Francisco, USA, 2011.Google Scholar
- P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad. Collective classification in network data. AI Magazine, 29(3):93--106, 2008.Google ScholarDigital Library
- S. Macskassy, and F. Provost. Classification in networked data: A toolkit and a univariate case study. JMLR, 8:935--983, 2007. Google ScholarDigital Library
- S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, pages 2169--2178, New York, USA, 2006. Google ScholarDigital Library
- A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In CIVR, pages 401--408, Amsterdam, Netherlands, 2007. Google ScholarDigital Library
- Y. Wu, E. Y. Chang, K. C. Chang, J. R. Smith. Optimal multimodal fusion for multimedia data analysis. In ACM Multimedia, pages 572--579, New York, USA, 2004. Google ScholarDigital Library
- N. Rasiwasia, JC. Pereira, E. Coviello, G. Doyle, G. Lanckriet, R. Levy, N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM Multimedia, pages 251--260, Firenze, Italy, 2010. Google ScholarDigital Library
- L. K. McDowell, K.M. Gupta, D.W. Aha. Cautious inference in collective classification. In AAAI, pages 596--601, Vancouver, Canada, 2007. Google ScholarDigital Library
- W. R. Gilks,S. Richardson and D. J. Spiegelhalter. Markov chain Monte Carlo in Practice. Chapman Hall/CRC Interdisciplinary Statistics, 1996.Google Scholar
- J. Gemert, J. Geusebroek, C. Veenman, A. Smeulders. Kernel codebooks for scene categorization. In ECCV, pages 696--709, Marseille, France, 2008. Google ScholarDigital Library
- H. Hotelling. Relations between two sets of variates. Biometrika, 28:321--377, 1936.Google ScholarCross Ref
- P. Over, G. Awad, J. Fiscus, B. Antonishek, and M. Michel. Trecvid 2010 - an overview of the goals, tasks, data, evaluation mechanisms and metrics. In Trecvid Video Retrieval Evaluation Workshop, NIST, Gaitherburg, USA, 2010.Google Scholar
- Doeblin, W. Exposé sur la théorie des chaînes simples constantes de Markoff à un nombre fini d'états. Rev. Math. Union Interbalkanique, 2:77--105, 1938.Google Scholar
Index Terms
- Leveraging high-level and low-level features for multimedia event detection
Recommendations
Layout-driven RTL binding techniques for high-level synthesis
ISSS '96: Proceedings of the 9th international symposium on System synthesisThe importance of effective and efficient accounting of layout effects is well-established in high-level synthesis (HLS), since it allows more realistic exploration of the design space and the generation of solutions with predictable metrics. This ...
High-Level Test Synthesis: A Survey from Synthesis Process Flow Perspective
High-level test synthesis is a special class of high-level synthesis having testability as one of the important components. This article presents a detailed survey on recent developments in high-level test synthesis from a synthesis process flow ...
3D Face Recognition Using Multi-level Multi-feature Fusion
PSIVT '10: Proceedings of the 2010 Fourth Pacific-Rim Symposium on Image and Video TechnologyThis paper proposed a novel 3D face recognition algorithm using multi-level multi-feature fusions. A new face representation method named average edge image is proposed in addition to traditional ones such as maximal principal curvature image and range ...
Comments