skip to main content
10.1145/2393347.2393412acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Leveraging high-level and low-level features for multimedia event detection

Published:29 October 2012Publication History

ABSTRACT

This paper addresses the challenge of Multimedia Event Detection by proposing a novel method for high-level and low-level features fusion based on collective classification. Generally, the method consists of three steps: training a classifier from low-level features; encoding high-level features into graphs; and diffusing the scores on the established graph to obtain the final prediction. The final prediction is derived from multiple graphs each of which corresponds to a high-level feature. The paper investigates two graph construction methods using logarithmic and exponential loss functions, respectively and two collective classification algorithms, i.e. Gibbs sampling and Markov random walk. The theoretical analysis demonstrates that the proposed method converges and is computationally scalable and the empirical analysis on TRECVID 2011 Multimedia Event Detection dataset validates its outstanding performance compared to state-of-the-art methods, with an added benefit of interpretability.

References

  1. Laptev, T. Lindeberg. Space-time interest points. In ICCV, pages 432--439, Nice, France, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Li-Jia Li, Hao Su, Eric Xing, Fei-Fei Li. Object bank: a high-level image representation for scene classification and semantic feature sparsification. In NIPS, pages 1378--1386, Vancouver, Canada, 2010.Google ScholarGoogle Scholar
  3. C. Snoek, M. Worring, A. W. M. Smeulders. Early versus late fusion in semantic video analysis. In ACM Multimedia, pages 399--402, Singapore, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Pham, N. Maillot, J. Lim, J. Chevallet. Latent semantic fusion model for image retrieval and annotation. In CIKM, pages 439--444, Lisbon, Portugal, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. Escalante, C. Hernández, L. Sucar, M. Montes. Late fusion of heterogeneous methods for multimedia image retrieval. In ACM MIR, pages 172--179, Vancouver, Canada, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Kludas, E. Bruno, S. Marchand-Maillet. Information fusion in multimedia information retrieval. In Adaptive Multimedia Retrieval, pages 147--159, Paris, France, 2007.Google ScholarGoogle Scholar
  7. L. Bao et al. Informedia@TRECVID 2011. In Trecvid Video Retrieval Evaluation Workshop, NIST, Gaitherburg, USA, 2011.Google ScholarGoogle Scholar
  8. H. Eldardiry, J. Neville. Across-Model collective ensemble classification. In AAAI, to appear, San Francisco, USA, 2011.Google ScholarGoogle Scholar
  9. P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad. Collective classification in network data. AI Magazine, 29(3):93--106, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Macskassy, and F. Provost. Classification in networked data: A toolkit and a univariate case study. JMLR, 8:935--983, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, pages 2169--2178, New York, USA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In CIVR, pages 401--408, Amsterdam, Netherlands, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Wu, E. Y. Chang, K. C. Chang, J. R. Smith. Optimal multimodal fusion for multimedia data analysis. In ACM Multimedia, pages 572--579, New York, USA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Rasiwasia, JC. Pereira, E. Coviello, G. Doyle, G. Lanckriet, R. Levy, N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM Multimedia, pages 251--260, Firenze, Italy, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. K. McDowell, K.M. Gupta, D.W. Aha. Cautious inference in collective classification. In AAAI, pages 596--601, Vancouver, Canada, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. R. Gilks,S. Richardson and D. J. Spiegelhalter. Markov chain Monte Carlo in Practice. Chapman Hall/CRC Interdisciplinary Statistics, 1996.Google ScholarGoogle Scholar
  17. J. Gemert, J. Geusebroek, C. Veenman, A. Smeulders. Kernel codebooks for scene categorization. In ECCV, pages 696--709, Marseille, France, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Hotelling. Relations between two sets of variates. Biometrika, 28:321--377, 1936.Google ScholarGoogle ScholarCross RefCross Ref
  19. P. Over, G. Awad, J. Fiscus, B. Antonishek, and M. Michel. Trecvid 2010 - an overview of the goals, tasks, data, evaluation mechanisms and metrics. In Trecvid Video Retrieval Evaluation Workshop, NIST, Gaitherburg, USA, 2010.Google ScholarGoogle Scholar
  20. Doeblin, W. Exposé sur la théorie des chaînes simples constantes de Markoff à un nombre fini d'états. Rev. Math. Union Interbalkanique, 2:77--105, 1938.Google ScholarGoogle Scholar

Index Terms

  1. Leveraging high-level and low-level features for multimedia event detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '12: Proceedings of the 20th ACM international conference on Multimedia
      October 2012
      1584 pages
      ISBN:9781450310895
      DOI:10.1145/2393347

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader