Parallel neural networks for multimodal video genre classification

Montagnuolo, Maurizio; Messina, Alberto

doi:10.1007/s11042-008-0222-3

Parallel neural networks for multimodal video genre classification

Published: 19 September 2008

Volume 41, pages 125–159, (2009)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Maurizio Montagnuolo¹ &
Alberto Messina²

684 Accesses
35 Citations
3 Altmetric
Explore all metrics

Abstract

Improvements in digital technology have made possible the production and distribution of huge quantities of digital multimedia data. Tools for high-level multimedia documentation are becoming indispensable to efficiently access and retrieve desired content from such data. In this context, automatic genre classification provides a simple and effective solution to describe multimedia contents in a structured and well understandable way. We propose in this article a methodology for classifying the genre of television programmes. Features are extracted from four informative sources, which include visual-perceptual information (colour, texture and motion), structural information (shot length, shot distribution, shot rhythm, shot clusters duration and saturation), cognitive information (face properties, such as number, positions and dimensions) and aural information (transcribed text, sound characteristics). These features are used for training a parallel neural network system able to distinguish between seven video genres: football, cartoons, music, weather forecast, newscast, talk show and commercials. Experiments conducted on more than 100 h of audiovisual material confirm the effectiveness of the proposed method, which reaches a classification accuracy rate of 95%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Consider, for example, the YouTube video site—http://www.youtube.com/ (last accessed: May 13th, 2008), which allows users to upload, watch and share multimedia video files.
http://www.aber.ac.uk/media/Documents/intgenre/intgenre.html (last accessed: November 14th, 2007)
Available online at: http://www-nlpir.nist.gov/projects/t2002v/t2002v.html (last accessed: September 14th, 2007).
Richard Bellman coined the term “curse of dimensionality” to describe the difficulty of evaluating Probability Density Functions on high-dimensional feature spaces [2].
Available online at: http://www.nlog-project.org (Last accessed: September 28th, 2007).
See http://www.facedetection.com to find many available resources about the face detection task (last accessed: October 10th, 2007).
Using the Fraunhofer IIS Real Time Face Detector tool, http://www.iis.fraunhofer.de (last accessed: March 28th, 2008).
See Table 1 to recall the meaning of the acronyms for the programme surrogate features.
See http://www.museum.tv/archives/etv/T/htmlT/talkshows/talkshows.htm for an introduction to the history of TV talk shows (Last accessed: September 27th, 2007).
A demo version (valid for 60 days) of the RTFaceDetection library is available for download at: http://www.iis.fraunhofer.de/EN/bf/bv/kognitiv/biom/dd.jsp (last accessed: November 21st, 2007).
http://www.itc.it/irst (last accessed: May 13th, 2008).
http://leenissen.dk/fann (last accessed: May 13th, 2008).

References

Albiol A, Fullá MJCh, Albiol A, Torres L (2004) Commercials detection using HMMs. In: International workshop on image analysis for multimedia interactive services. Lisboa, Portugal
Bellman R (1961) Adaptive control processes: a guided tour. Princeton Univ. Press
Blum DW (1992) Method and apparatus for identifying and eliminating specific material from video signals. US Patent no. 5151788
Boggs J, Petrie DW (2006) The art of watching films with tutorial CD-ROM. McGraw-Hill
Brugnara F, Cettolo M, Federico M, Giuliani D (2000) A system for the segmentation and transcription of italian radio news. In: RIAO, content-based multimedia information access. Paris, France
Ćalić J (2004) Highly efficient low-level feature extraction for video representation and retrieval. PhD thesis, University of London
Chellappa R, Wilson CL, Sirohey S (1995) Human and machine recognition of faces: a survey. Proc IEEE 83(5):705–740 (May)
Article Google Scholar
Cheng W, Liu C, Wang X (2006) A rough set approach to video genre classification. In: 8th international conference on advanced concepts for intelligent vision systems (ACIVS’06). Antwerp, Belgium, pp 1210–1220 (September)
Covell M, Baluja S, Fink M (2006) Advertisement detection and replacement using acoustic and visual repetition. In: IEEE 8th workshop on multimedia signal processing (MMSP2006). Victoria, BC, pp 461–466 (October)
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Article MATH Google Scholar
Dimitrova N, Agnihotri L, Wei G (2000) Video classification based on HMM using text and faces. In: European conference on signal processing. Tampere, Finland
Google Scholar
Dimitrova N, Jeannin S, Nesvadba J, McGee T, Agnihotri L, Mekenkam G (2002) Real time commercial detection using MPEG features. In: Proc. 9th int. conf. on information processing and management of uncertainty in knowledge-based systems (IPMU 2002). Annecy, France, pp 481–486 (Invited paper)
Dinh PQ, Dorai C, Venkatesh S (2002) Video genre categorization using audio wavelet coefficients. In: ACCV2002: the 5th Asian conference on computer vision. Melbourne, Australia (January)
Dorado A, Calic J, Izquierdo E (2004) A rule-based video annotation system. IEEE Trans Circuits Syst Video Technol 14(5):622–633
Article Google Scholar
EBU-UER (2007) Escort 2007. Technical Review 3322, EBU
Fischer S, Lienhart R, Effelsberg W (1995) Automatic recognition of film genres. In: ACM multimedia 1995. San Francisco, CA, pp 295–304 (November)
Glasberg R, Samour A, Elazouzi K, Sikora T (2005) Cartoon-recognition using video & audio-descriptors. In: 13th European signal processing conference (EUSIPCO2005). Antalya, Turkey (September)
Goh KS, Miyahara K, Radhakrishan R, Xiong Z, Divakaran A (2004) Audio-visual event detection based on mining of semantic audio-visual labels. Technical Report 2004-008, Mitsubishi Electric Research Laboratory (MERL)
Ianeva TI, de Vries AP, Rohrig H (2003) Detecting cartoons: a case study in automatic video-genre classification. In: IEEE international conference on multimedia and expo (ICME’03), pp 449–452 (July)
Igel C, Hüsken M (2000) Improving the Rprop learning algorithm. In: Proceedings of the second international symposium on neural computation, NC2000
Jolliffe IT (2002) Principal component analysis. Springer
Liu Z, Huang J, Wang Y (1998) Classification of TV programs based on audio information using hidden Markov model. In: IEEE 2nd workshop on multimedia signal processing (MMSP ’98). Redonda Beach, CA, USA, pp 27–32 (December)
Liu Z, Huang J, Wang Y, Chen T (1997) Audio feature extraction and analysis for scene classification. In: IEEE workshop on multimedia signal processing (MMSP’97), pp 343–348
Lo Iacono A, Colamussi M (2005) Rai click—“I want my own TV”. Technical Review 303, EBU (July)
Messina A, Montagnuolo M (2008) Fuzzy mining of multimedia genre applied to television archives. In: IEEE international conference on multimedia and expo. Hannover, Germany, 23–26 June 2008
Messina A, Montagnuolo M (2008) Multimedia genre characterisation with fuzzy embedding classifiers. In: International workshop on ambient media delivery and interactive television (AMDIT2008). Quebec City, Canada (February)
Messina A, Montagnuolo M, Sapino ML (2006) Characterizing multimedia objects through multimodal content analysis and fuzzy fingerprints. In: IEEE international conference on signal-image technology and internet-based systems (SITIS’06). Hammamet, Tunisia (December)
Montagnuolo M, Messina A (2007) Automatic genre classification of TV programmes using Gaussian mixture models and neural networks. In: DEXA workshops. Regensurg, Germany, pp 99–103 (September)
Montagnuolo M, Messina A (2007) Multimedia knowledge representation for automatic annotation of broadcast TV archives. J Digit Inf Manag 5(2):67–74
Google Scholar
Montagnuolo M, Messina A (2008) Multimodal genre analysis applied to digital television archives. In: Second international workshop on multimedia data mining and management (DEXA-MDMM’08). Turin, Italy, 2 September 2008
Novak AP (1988) Method and system for editing unwanted program material from broadcast signals. US Patent no. 4750213
Parnal S, Pizzi S (2003) TV anytime: a new standard. EBU diffusion online, 2003/33, August
Poli JP, Carrive J (2006) Improving program guides for reducing tv stream structuring problem to a simple alignment problem. In: CIMCA ’06: proceedings of the international conference on computational inteligence for modelling control and automation and international conference on intelligent agents web technologies and international commerce, p 31
Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45
Article Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Article Google Scholar
Roach M, Mason JS, Pawlewski M (2001) Motion-based classification of cartoons. In: IEEE international symposium on intelligent multimedia, video and speech processing (ISIMP2001), pp 146–149
Roach MJ (2002) Video genre classification. PhD thesis, University of Wales Swansea
Roach MJ, Mason JSD, Pawlewski M (2001) Video genre classification using dynamics. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP’01), pp 1557–1560
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Parallel distributed processing: volume 1: foundations. The MIT Press, pp 318–362
Safavian SR, Landgrebe DA (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21(3):660–674
Article MathSciNet Google Scholar
Sánchez JM, Binefa X, Vitriá J, Radeva P (1999) Local color analysis for scene break detection applied to TV commercials recognition. In: VISUAL ’99: proceedings of the third international conference on visual information and information systems, pp 237–244
Satterwhite B, Marques O (2004) Automatic detection of television commercials. IEEE Potentials 23(2):9–12
Article Google Scholar
Snoek C, Worring M (2005) Multimodal video indexing: a review of the state-of-the-art. Multimedia Tools and Applications 25(1):5–35
Article Google Scholar
Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis 7(1):11–32 (November)
Article Google Scholar
Takagi S, Hattori S, Yokoyama K, Kodate A, Tominaga H (2003) Sports video categorizing method using camera motion parameters. In: IEEE 2003 international conference on multimedia and expo (ICME’03), pp 461–464 (July)
Tamura H, Mori S, Yamawaki T (1978) Texture features corresponding to visual perception. IEEE Trans Syst Man Cybern 8(6):460–473
Article Google Scholar
Taskiran CM, Delp EJ (2001) Distribution of shot lengths for video analysis. In: Proceedings of SPIE, vol. 4676, pp 276–284
Taskiran CM, Pollak I, Bouman CA, Delp EJ (2003) Stochastic models of video structure for program genre detection. In: 8th international workshop on visual content processing and representation (VLBV 2003). Madrid, Spain, pp 84–92 (September)
Tekalp M (1995) Digital video processing. Prentice Hall
Tomasi C (2005) Estimating Gaussian mixture densities with EM—a tutorial. Technical report, Duke University
Truong BT, Venkatesh S, Dorai C (2000) Automatic genre identification for content-based video categorization. In: IEEE 15th international conference on pattern recognition (ICPP’00). IEEE Computer Society, pp 230–233
Vakkalanka S, Mohan CK, Kumaraswamy R, Yegnanarayana B (2005) Combining multiple evidence for video classification. In: IEEE international conference on intelligent sensing and information processing (ICISIP’05), pp 187–192 (January)
Vapnik VN (1999) The nature of statistical learning theory. Springer
Vasconcelos N, Lippman A (2000) Statistical models of video structure for content analysis and characterization. IEEE Trans Image Process 9(1):3–19
Article Google Scholar
Vroomen JHM, Collier R, Mozziconacci S (1993) Duration and intonation in emotional speech. In: Eurospeech 1993, pp 577–580
Wang J, Xu C, Chang E (2006) Automatic sports video genre classification using pseudo-2D-HMM. In: IEEE 18th international conference on pattern recognition (ICPR’06), pp 778–781
Wickenberg-Bolin U, Göransson H, Fryknäs M, Gustafsson MG, Isaksson A (2006) Improved variance estimation of classification performance via reduction of bias caused by small sample size. BMC Bioinformatics 7:127
Google Scholar
Xu LQ, Li Y (2003) Video classification using spatial-temporal features and PCA. In: IEEE international conference on multimedia and expo (ICME’03), pp 485–488 (July)
Yuan X, Lai W, Mei T, Hua XS, Wu XQ, Li S (2006) Automatic video genre categorization using hierarchical SVM. In: IEEE international conference on image processing (ICIP’06). Atlanta, GA, pp 2905–2908 (October)
Yuan Y, Song QB, Shen JY (2002) Automatic video classification using decision tree method. In: IEEE 1st international conference on machine learning and cybernetics, vol. 3. Beijing, pp 1153–1157
Zhiwen Y, Xingshe Z, Jianhua G, Zhiyi Y (2004) Fuzzy clustering for tv program classification. In: IEEE international conference on information technology: coding and computing (ICIT’04), pp 658–662 (April)

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Turin, Corso Svizzera 185, 10149, Turin, Italy
Maurizio Montagnuolo
Centre for Research and Technological Innovation, RAI Radiotelevisione Italiana, Corso Giambone 68, 10135, Turin, Italy
Alberto Messina

Authors

Maurizio Montagnuolo
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Messina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maurizio Montagnuolo.

Additional information

Maurizio Montagnuolo is a PhD student supported by EuriX s.r.l., Turin, Italy— http://www.eurixgroup.com.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Montagnuolo, M., Messina, A. Parallel neural networks for multimodal video genre classification. Multimed Tools Appl 41, 125–159 (2009). https://doi.org/10.1007/s11042-008-0222-3

Download citation

Published: 19 September 2008
Issue Date: January 2009
DOI: https://doi.org/10.1007/s11042-008-0222-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel neural networks for multimodal video genre classification

Abstract

Access this article

Similar content being viewed by others

Deep learning for time series classification: a review

Visualizing and Understanding Convolutional Networks

Siamese Neural Networks: An Overview

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parallel neural networks for multimodal video genre classification

Abstract

Access this article

Similar content being viewed by others

Deep learning for time series classification: a review

Visualizing and Understanding Convolutional Networks

Siamese Neural Networks: An Overview

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation