research-article

Leveraging high-level and low-level features for multimedia event detection

Authors:
Lu Jiang

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Alexander G. Hauptmann

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Guang Xiang

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

MM '12: Proceedings of the 20th ACM international conference on MultimediaOctober 2012Pages 449–458https://doi.org/10.1145/2393347.2393412

Published:29 October 2012Publication History

MM '12: Proceedings of the 20th ACM international conference on Multimedia

Pages 449–458

ABSTRACT

This paper addresses the challenge of Multimedia Event Detection by proposing a novel method for high-level and low-level features fusion based on collective classification. Generally, the method consists of three steps: training a classifier from low-level features; encoding high-level features into graphs; and diffusing the scores on the established graph to obtain the final prediction. The final prediction is derived from multiple graphs each of which corresponds to a high-level feature. The paper investigates two graph construction methods using logarithmic and exponential loss functions, respectively and two collective classification algorithms, i.e. Gibbs sampling and Markov random walk. The theoretical analysis demonstrates that the proposed method converges and is computationally scalable and the empirical analysis on TRECVID 2011 Multimedia Event Detection dataset validates its outstanding performance compared to state-of-the-art methods, with an added benefit of interpretability.

References

Laptev, T. Lindeberg. Space-time interest points. In ICCV, pages 432--439, Nice, France, 2003. Google ScholarDigital Library
Li-Jia Li, Hao Su, Eric Xing, Fei-Fei Li. Object bank: a high-level image representation for scene classification and semantic feature sparsification. In NIPS, pages 1378--1386, Vancouver, Canada, 2010.Google Scholar
C. Snoek, M. Worring, A. W. M. Smeulders. Early versus late fusion in semantic video analysis. In ACM Multimedia, pages 399--402, Singapore, 2005. Google ScholarDigital Library
T. Pham, N. Maillot, J. Lim, J. Chevallet. Latent semantic fusion model for image retrieval and annotation. In CIKM, pages 439--444, Lisbon, Portugal, 2007. Google ScholarDigital Library
H. Escalante, C. Hernández, L. Sucar, M. Montes. Late fusion of heterogeneous methods for multimedia image retrieval. In ACM MIR, pages 172--179, Vancouver, Canada, 2008. Google ScholarDigital Library
J. Kludas, E. Bruno, S. Marchand-Maillet. Information fusion in multimedia information retrieval. In Adaptive Multimedia Retrieval, pages 147--159, Paris, France, 2007.Google Scholar
L. Bao et al. Informedia@TRECVID 2011. In Trecvid Video Retrieval Evaluation Workshop, NIST, Gaitherburg, USA, 2011.Google Scholar
H. Eldardiry, J. Neville. Across-Model collective ensemble classification. In AAAI, to appear, San Francisco, USA, 2011.Google Scholar
P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad. Collective classification in network data. AI Magazine, 29(3):93--106, 2008.Google ScholarDigital Library
S. Macskassy, and F. Provost. Classification in networked data: A toolkit and a univariate case study. JMLR, 8:935--983, 2007. Google ScholarDigital Library
S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, pages 2169--2178, New York, USA, 2006. Google ScholarDigital Library
A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In CIVR, pages 401--408, Amsterdam, Netherlands, 2007. Google ScholarDigital Library
Y. Wu, E. Y. Chang, K. C. Chang, J. R. Smith. Optimal multimodal fusion for multimedia data analysis. In ACM Multimedia, pages 572--579, New York, USA, 2004. Google ScholarDigital Library
N. Rasiwasia, JC. Pereira, E. Coviello, G. Doyle, G. Lanckriet, R. Levy, N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM Multimedia, pages 251--260, Firenze, Italy, 2010. Google ScholarDigital Library
L. K. McDowell, K.M. Gupta, D.W. Aha. Cautious inference in collective classification. In AAAI, pages 596--601, Vancouver, Canada, 2007. Google ScholarDigital Library
W. R. Gilks,S. Richardson and D. J. Spiegelhalter. Markov chain Monte Carlo in Practice. Chapman Hall/CRC Interdisciplinary Statistics, 1996.Google Scholar
J. Gemert, J. Geusebroek, C. Veenman, A. Smeulders. Kernel codebooks for scene categorization. In ECCV, pages 696--709, Marseille, France, 2008. Google ScholarDigital Library
H. Hotelling. Relations between two sets of variates. Biometrika, 28:321--377, 1936.Google ScholarCross Ref
P. Over, G. Awad, J. Fiscus, B. Antonishek, and M. Michel. Trecvid 2010 - an overview of the goals, tasks, data, evaluation mechanisms and metrics. In Trecvid Video Retrieval Evaluation Workshop, NIST, Gaitherburg, USA, 2010.Google Scholar
Doeblin, W. Exposé sur la théorie des chaînes simples constantes de Markoff à un nombre fini d'états. Rev. Math. Union Interbalkanique, 2:77--105, 1938.Google Scholar

Index Terms

Leveraging high-level and low-level features for multimedia event detection
1. Information systems
  1. Information systems applications

Recommendations

Layout-driven RTL binding techniques for high-level synthesis
ISSS '96: Proceedings of the 9th international symposium on System synthesis

The importance of effective and efficient accounting of layout effects is well-established in high-level synthesis (HLS), since it allows more realistic exploration of the design space and the generation of solutions with predictable metrics. This ...
Read More
High-Level Test Synthesis: A Survey from Synthesis Process Flow Perspective

High-level test synthesis is a special class of high-level synthesis having testability as one of the important components. This article presents a detailed survey on recent developments in high-level test synthesis from a synthesis process flow ...
Read More
3D Face Recognition Using Multi-level Multi-feature Fusion
PSIVT '10: Proceedings of the 2010 Fourth Pacific-Rim Symposium on Image and Video Technology

This paper proposed a novel 3D face recognition algorithm using multi-level multi-feature fusions. A new face representation method named average edge image is proposed in addition to traditional ones such as maximal principal curvature image and range ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '12: Proceedings of the 20th ACM international conference on Multimedia
October 2012
1584 pages
ISBN:9781450310895
DOI:10.1145/2393347
General Chairs:
Noboru Babaguchi
Osaka University, Japan
,
Kiyoharu Aizawa
The University of Tokyo, Japan
,
John Smith
IBM, USA
,
Program Chairs:
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Thomas Plagemann
University of Oslo, Norway
,
Xian-Sheng Hua
Microsoft, USA
,
Rong Yan
Facebook, USA
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
collective classification
feature fusion
multi-modal integration
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 55
  Total Citations
  View Citations
- 441
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Leveraging high-level and low-level features for multimedia event detection

MM '12: Proceedings of the 20th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Layout-driven RTL binding techniques for high-level synthesis

High-Level Test Synthesis: A Survey from Synthesis Process Flow Perspective

3D Face Recognition Using Multi-level Multi-feature Fusion

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Leveraging high-level and low-level features for multimedia event detection

MM '12: Proceedings of the 20th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Layout-driven RTL binding techniques for high-level synthesis

High-Level Test Synthesis: A Survey from Synthesis Process Flow Perspective

3D Face Recognition Using Multi-level Multi-feature Fusion

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media