skip to main content
research-article

ELVIS: Entertainment-led video summaries

Published:27 August 2010Publication History
Skip Abstract Section

Abstract

Video summaries present the user with a condensed and succinct representation of the content of a video stream. Usually this is achieved by attaching degrees of importance to low-level image, audio and text features. However, video content elicits strong and measurable physiological responses in the user, which are potentially rich indicators of what video content is memorable to or emotionally engaging for an individual user. This article proposes a technique that exploits such physiological responses to a given video stream by a given user to produce Entertainment-Led VIdeo Summaries (ELVIS). ELVIS is made up of five analysis phases which correspond to the analyses of five physiological response measures: electro-dermal response (EDR), heart rate (HR), blood volume pulse (BVP), respiration rate (RR), and respiration amplitude (RA). Through these analyses, the temporal locations of the most entertaining video subsegments, as they occur within the video stream as a whole, are automatically identified. The effectiveness of the ELVIS technique is verified through a statistical analysis of data collected during a set of user trials. Our results show that ELVIS is more consistent than RANDOM, EDR, HR, BVP, RR and RA selections in identifying the most entertaining video subsegments for content in the comedy, horror/comedy, and horror genres. Subjective user reports also reveal that ELVIS video summaries are comparatively easy to understand, enjoyable, and informative.

References

  1. Agius, H., Crockford, C., and Money, A. G. 2008. Emotion and multimedia content. In Encyclopedia of Multimedia 2nd Ed. B. Furht, Ed. Springer, New York, 204--205.Google ScholarGoogle Scholar
  2. Aizawa, K., Tancharoen, D., Kawasaki, S., and Yamasaki, T. 2004. Efficient retrieval of life log based on context and content. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experiences (CARPE'04). ACM Press, 22--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Allanson, J. and Fairclough, S. H. 2004. A research agenda for physiological computing. Interact. Comput. 16, 857--878.Google ScholarGoogle ScholarCross RefCross Ref
  4. Amenabar, A. 2001. The Others. Miramax.Google ScholarGoogle Scholar
  5. Athanasiadis, T., Mylonas, P., Avrithis, Y., and Kollias, S. 2007. Semantic image segmentation and object labeling. IEEE Trans. Circ. Syst. Video Techn. 17, 298--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Babaguchi, N., Kawai, Y., and Kitahashi, T. 2001. Generation of personalized abstract of sports video. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'01). IEEE, 800--803.Google ScholarGoogle Scholar
  7. Babaguchi, N., Kawai, Y., Ogura, T., and Kitahashi, T. 2004. Personalized abstraction of broadcasted American football video by highlight selection. IEEE Trans. Multimedia 6, 575--586. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bailer, W., Lee, F., and Thallinger, G. 2007. Skimming rushes video using retake detection. In Proceedings of the IEEE International Workshop on TRECVID Video Summarization. IEEE, 60--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Barbieri, M., Agnihotri, L., and Dimitrova, N. 2003. Video summarization: Methods and landscape. In Internet Multimedia Management Systems IV. J. R. Smith, S. Panchanathan, and T. Zhang, Eds. SPIE, 1--13.Google ScholarGoogle Scholar
  10. Brown, W. A., Corriveau, D. P., and Monti, P. M. 1977. Anger arousal by a motion picture: A methodological note. Amer. J. Psyc. 134, 930--931.Google ScholarGoogle ScholarCross RefCross Ref
  11. Cacioppo, J. T., Berntson, G. G., Klein, D. J., and Poehlmann, K. M. 1997. The psychophysiology of emotion across the lifespan. Ann. Rev. Gerontolo. Geriat. 17, 27--74.Google ScholarGoogle Scholar
  12. Cacioppo, J. T., Tassinary, L. G., and Berntson, G. G. 2007. Handbook of Psychphysiogy 3rd Ed. Cambridge University Press.Google ScholarGoogle Scholar
  13. Carlson, N. R. 2001. Psychology of Behaviour 7th Ed. Allyn and Bacon.Google ScholarGoogle Scholar
  14. Chen, F., Cooper, M., and Adcock, J. 2007. Video summarization preserving dynamic content. In Proceedings of the IEEE International Workshop on TRECVID Video Summarization. IEEE, 40--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Christie, I. C. and Friedman, B. H. 2004. Autonomic specificity of discrete emotion and dimensions of affective space: A multivariate approach. Int. J. Psychophy. 51, 143--153.Google ScholarGoogle ScholarCross RefCross Ref
  16. Clark-Carter, D. 1997. Doing Quantitative Psychological Research: From Design to Report. Psychology Press, London.Google ScholarGoogle Scholar
  17. Damnjanovic, U., Piatrik, T., Djordjevic, D., and Izquierdo, E. 2007. Video summarisation for surveillance and news domian. In Proceedings of the the 2nd International Conference on Semantic and Digital Media Technologies. Springer-Verlag, 99--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Davidson, R. J. 1995. Cerebral asymmetry, emotion, and affective style. In Brain Asymmetry, R. J. Davidson and K. Hugdahl Eds. MIT Press, Cambridge, MA, 361--387.Google ScholarGoogle Scholar
  19. de Silva, G., Yamasaki, T., and Aizawa, K. 2005. Evaluation of video summarization for a large number of cameras in ubiquitous home. In Proceedings of the 13th ACM International Conference on Multimedia. ACM Press, 820--828. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. de Wied, M., Hoffman, K., and Roskos-Ewoldsen, D. R. 1997. Forewarning of graphic portrayal of violence and the experience of suspenseful drama. Cogni. Emot. 11, 481--494.Google ScholarGoogle ScholarCross RefCross Ref
  21. Detenber, B. H., Simons, R. F., and Bennett, G. 1998. Roll 'em!: The effects of picture motion on emotional responses. J. Broadcast. Electro. Media 42, 113--127.Google ScholarGoogle ScholarCross RefCross Ref
  22. Detyniecki, M. and Marsala, C. 2007. Video rushes summarization by adaptive acceleration and stacking of shots. In Proceedings of the IEEE International Workshop on TRECVID Video Summarization. IEEE, 65--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ekman, P., Levenson, R. W., and Friesen, W. V. 1983. Autonomic nervous system activity distinguished between emotion. Science 221, 1208--1210.Google ScholarGoogle ScholarCross RefCross Ref
  24. Frazier, T. W., Strauss, M. E., and Steinhauer, S. R. 2004. Respiratory sinus arrhythmia as an index of emotional response in young adults. Psychophys. 41, 75--83.Google ScholarGoogle ScholarCross RefCross Ref
  25. Fridja, N. 1986. The Emotions. Cambridge University Press, Cambridge.Google ScholarGoogle Scholar
  26. Furini, M. and Ghini, V. 2006. An audio-video summarisation scheme based on audio and video analysis. In Proceedings of the IEEE Consumer Communications and Networking Conference (CCNC'06). IEEE, 1209--1213.Google ScholarGoogle Scholar
  27. Gleitman, H., Reisberg, D., and Gross, A. 2007. Psychology 7th Ed. W. W. Norton, New York.Google ScholarGoogle Scholar
  28. Gomez, P. and Danuser, B. 2004. Affective and physiological responses to environmental noises and music. Int. J. Psychophys. 53, 93--103.Google ScholarGoogle ScholarCross RefCross Ref
  29. Gomez, P., Stahel, W., and Danuser, B. 2004. Respiratory responses during affective picture viewing. Biological Psych. 67, 359--373.Google ScholarGoogle ScholarCross RefCross Ref
  30. Greenwald, M. K., Cook, E. W., and Lang, P. J. 1989. Affective judgement and psychophysiological response: Dimensional covariation in the evaluation of pictorial stimuli. J. Pyschophys. 3, 51--64.Google ScholarGoogle Scholar
  31. Gross, J. J. and Levenson, R. W. 1995. Emotion elicitation using films. Cogn. Emot. 9, 87--108.Google ScholarGoogle ScholarCross RefCross Ref
  32. Hanjalic, A. 2003. Generic approach to highlight extraction in a sport video. In Proceedings of the IEEE International Conference on Image Processing (ICIP'03). IEEE, 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  33. Hanjalic, A. 2005. Adaptive extraction of highlights from a sport video based on excitement modeling. IEEE Trans. Multimedia 7, 1114--1122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Healey, J. A. 2000. Wearable and automotive systems for affect recognition from physiology. PhD Thesis. Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jaimes, A., Echigo, T., Teraguchi, M. and Satoh, F. 2002. Learning personalized video highlights from detailed MPEG-7 metadata. In Proceedings of the IEEE International Conference on Image Processing (ICIP'02). IEEE, 133--136.Google ScholarGoogle Scholar
  36. Jung, B., Song, J., and Lee, Y. 2007. A narrative-based abstraction framework for story-oriented video. ACM Trans. Multimedia Comput. Commun. Appl. 3, 1--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kawai, Y., Sumiyoshi, H., and Yagi, N. 2007. Automated production of TV program trailer using electronic program guide. In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR'07). ACM Press, 49--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Kim, J. and Andre, E. 2008. Emotion-specific dichotomous classification and feature-level fusion of multichannel biosignals for automatic emotion recognition. In Proceedings of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems. IEEE, 114--118.Google ScholarGoogle Scholar
  39. Kramer, A. F. 1991. Physiological metrics of mental workload: A review of recent progress. In Multiple-Task-Performance, D. L. Damos, Ed. Taylor & Francis, London, 329--360.Google ScholarGoogle Scholar
  40. Lang, A., Bolls, P., Potter, R., and Kawahara, K. 1999. The effects of production pacing and arousing content on the information processing of television messages. J. Broadcast. Electro. Media 43, 451--476.Google ScholarGoogle ScholarCross RefCross Ref
  41. Lang, A., Dhillon, K., and Dong, Q. 1995. The effects of emotional arousal and valence on television viewers' cognitive capacity and memory. J. Broad. Electron. Media 39, 313--327.Google ScholarGoogle ScholarCross RefCross Ref
  42. Lee, L. L. and Dey, A. K. 2008. Lifelogging memory appliance for people with episodic memory impairment. In Proceedings of the 10th ACM International Conference on Ubiquitous Computing. ACM Press, 44--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Leonhardt, S., Falck, T., and Mähönen, P. 2007. Proceedings of the 4th International Workshop on Wearable and Implantable Body Sensor Networks, Springer-Verlag.Google ScholarGoogle Scholar
  44. Lew, M. S., Sebe, N., Djeraba, C., and Jain, R. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Li, Y., Lee, S., Yeh, C., and Kuo, C. 2006. Semantic retrieval of multimedia. IEEE Signal Process. Mag. 23, 79--89.Google ScholarGoogle ScholarCross RefCross Ref
  46. Lie, W. and Hsu, K. 2008. Video summarization based on semantic feature analysis and user preference. In Proceedings of the IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing. IEEE, 486--491. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. McIntyre, G. and Göcke, R. 2007. The composite sensing of affect. In Affect and Emotion in Human-Computer Interaction. Lecture Notes in Computer Science, vol. 4868, C. Peter and R. Beale, Eds. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Millet, C., Bloch, I., Hede, P., and Moellic, P. 2005. Using relative spatial relationships to improve individual region recognition. In Proceedings of the 2nd European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology (EWIMT'05). IEEE, 119--126.Google ScholarGoogle Scholar
  49. Money, A. and Agius, H. 2006. Are affective video summaries feasible? In Joint Proceedings of the 2005, 2006, and 2007 International Workshops at the BCS HCI Group Annual Conferences. C. Peter, R. Beale, E. Crane, L. Axelrod, and G. Blyth Eds. IRB Verlag, 142--149.Google ScholarGoogle Scholar
  50. Money, A. G. and Agius, H. 2005. ‘Once more, with feeling’: An emotional approach to multimedia content analysis. In Proceedings of the 9th IASTED International Conference on Internet and Multimedia Systems and Applications (IMSA'05). ACTA Press, Anaheim, CA, 436--441.Google ScholarGoogle Scholar
  51. Money, A. G. and Agius, H. 2008a. Feasibility of personalized affective video summaries. In Affect and Emotion in Human-Computer Interaction. Lecture Notes in Computer Science, vol. 4868, C. Peter and R. Beale, Eds. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Money, A. G. and Agius, H. 2008b. Video summarisation: A conceptual framework and survey of the state of the art. J. Vis. Commun. Image Represent. 19, 121--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Moriyama, T. and Sakauchi, M. 2002. Video summarization based on the psychological unfolding of drama. Syst. Comput. Japan 33, 1122--1131.Google ScholarGoogle ScholarCross RefCross Ref
  54. Morrone-Strupinsky, J. V., and Depue, R. A. 2004. Differential relation of two distinct, film-induced positive emotional states to affiliative and agentic extraversion. Personal. Individ. Diff. 36, 1109--1126.Google ScholarGoogle ScholarCross RefCross Ref
  55. Naphade, R. M. and Huang, T. S. 2001. A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Trans. Multimedia 3, 141--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Nasoz, F., Alvarez, K., Lisetti, C. L., and Finkelstein, N. 2003. Emotion recognition from physiological signals for presence technologies. Int. J. Cogn. 6, 1--32.Google ScholarGoogle Scholar
  57. Ngo, C., Ma, Y., and Zhang, H. 2005. Video summarization and scene detection by graph modeling. IEEE Trans. Circ. Syst. Video Techn. 15, 296--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Over, P., Smeaton, A. F., and Kelly, P. 2007. The TRECVID rushes summarization evaluation pilot. In Proceedings of the TVS—TRECVID BBC Rushes Summarization Workshop. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Palomba, D. and Stegagno, L. 1993. Physiology, perceived emotion and memory: responding to film sequences. In The Structure of Emotion: Psychophysiological, Cognitive, and Clinical Aspects, N. Birbaumer and A. Ohman, Eds. Hogrefe & Huber, Toronto, 158--168.Google ScholarGoogle Scholar
  60. Philippot, P., Chapelle, C., and Blairy, S. 2002. Respiratory feedback in the generation of emotion. Cogn. Emot. 16, 605--627.Google ScholarGoogle ScholarCross RefCross Ref
  61. Picard, R. W. 1995. Affective Computing. Tech. rep. No. 321, MIT Media Laboratory Perceptual Computing Section, http://vismod.media.mit.edu/tech-reports/TR-321.pdf.Google ScholarGoogle Scholar
  62. Picard, R. W. 1997. Affective Computing. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Piferi, R. L., Kline, K. A., Younger, J., and Lawler, K. A. 2000. An alternative approach for achieving cardiovascular baseline: Viewing an aquatic video. Int. J. Psychophys. 37, 207--217.Google ScholarGoogle ScholarCross RefCross Ref
  64. Power, M. and Dalgliesh, T. 1998. Cognition and Emotion: From Order to Disorder. Psychology Press, Guildford, Surrey.Google ScholarGoogle Scholar
  65. Rikkard, N. S. 2004. Intense emotional responses to music: A test of the physiological arousal hypothesis Psych. Music 32, 371--388.Google ScholarGoogle Scholar
  66. Rui, Y., Gupta, A., and Acero, A. 2000. Automatically extracting highlights for TV Baseball programs. In Proceedings of the 8th ACM International Conference on Multimedia. ACM Press, 105--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Rui, Y., Zhou, S. X., and Huang, T. S. 1999. Efficient access to video content in a unified framework. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems (ICMCS'99). IEEE, 735--740. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Scheirer, J., Fernandez, P., Klein, J. and Picard, R. J. 2002. Frustrating the user on purpose: A step toward building an affective computer. Interact. Comput. 14, 93--118.Google ScholarGoogle ScholarCross RefCross Ref
  69. Sebe, N., Cohen, I., Gevers, T., and Huang, T. S. 2005. Multimodal approaches for emotion recognition: A survey. In Proceedings of the SPIE Conference on Internet Imaging.Google ScholarGoogle Scholar
  70. Shipman, S., Divakaran, A., and Flynn, M. 2007. Highlight scene detection and video summarization for PVR-enabled television systems. In Proceedings of the IEEE International Conference on Consumer Electronics. IEEE, 1--2.Google ScholarGoogle Scholar
  71. Simon, H. A. 1982. Comments. In Affect and Cognition. C. Sydnor and S. T. Fiske, Eds. Lawrence Erlbaum Associates, Hillsdale, NJ, 333--342.Google ScholarGoogle Scholar
  72. Simons, R. F., Detenber, B. H., Reiss, J. E., and Shults, C. W. 2000. Image motion and context: A between- and within-subject comparison. Psychophys. 37, 706--710.Google ScholarGoogle ScholarCross RefCross Ref
  73. Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE Trans. Patt. Anal. Mach. Intell. 22, 1349--1380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Spiers, B. 1979. The Psychiatrist. Fawlty Towers, Series 2. BBC Television.Google ScholarGoogle Scholar
  75. Steinbeis, N., Koelsch, S., and Sloboda, J. A. 2006. The role of harmonic expectancy violations in musical emotions: Evidence from subjective, physiological, and neural responses. J. Cogn. Neurosci. 18, 1380--1393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Suziki, J., Hiroshi, N., and Hori, T. 2004. Level of interest in video clips modulates event-related potentials to auditory probes. Int. J. Psychophys. 55, 35--43.Google ScholarGoogle ScholarCross RefCross Ref
  77. Takahashi, Y., Nitta, N., and Babaguchi, N. 2005. Video summarization for large sports video archives. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'05). IEEE, 1170--1173.Google ScholarGoogle Scholar
  78. Tjondronegoro, D., Chen, Y. P., and Pham, B. 2003. Sports video summarization using highlights and play-breaks. In Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR'03). ACM Press, 201--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Truong, B. T. and Venkatesh, S. 2007. Video abstraction: A systematic review and classification. ACM Trans. Multimedia Comput. Commun. Appl. 3, 1--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Van Diest, I., Winters, W., Devriese, S., Vercamst, E., Han, J. N., Van de Woestijne, K. P., and Van den Bergh, O. 2001. Hyperventilation beyond fight/flight: respiratory responses during emotional imagery. Psychophys. 38, 961--968.Google ScholarGoogle ScholarCross RefCross Ref
  81. van Reekum, C. M. and Johnstone, T. 2004. Psychophysiological responses to appraisal dimensions in a computer game. Cogn. Emot. 18, 663--688.Google ScholarGoogle ScholarCross RefCross Ref
  82. Wang, H., Prendinger, H., and Igarashi, T. 2004. Communicating emotions in online chat using physiological sensors and animated text. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'04). ACM Press, 1171--1174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Wang, T., Gao, Y., Li, J., Wang, P. P., Tong, X., Hu, W., Zhang, Y., and Li, J. 2007. THU-ICRC at rush summarization of TRECVID 2007. In Proceedings of the IEEE International Workshop on TRECVID Video Summarization. IEEE, 79--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Winton, W. M., Putnam, L. E., and Krauss, R. M. 1984. Facial and autonomic manifestations of the dimensional structure of emotion. J. Exper. Soc. Psychol. 20, 195--216.Google ScholarGoogle ScholarCross RefCross Ref
  85. Wright, E. 2004. Shaun of the Dead. Universal Pictures.Google ScholarGoogle Scholar
  86. Xu, C., Wang, J., Wan, K., Li, Y., and Duan, L. 2006. Live sports detection based on broadcast video and Web-casting text. In Proceedings of the 14th ACM International Conference on Multimedia. ACM Press, 221--230. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ELVIS: Entertainment-led video summaries

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM Transactions on Multimedia Computing, Communications, and Applications
                ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 6, Issue 3
                August 2010
                203 pages
                ISSN:1551-6857
                EISSN:1551-6865
                DOI:10.1145/1823746
                Issue’s Table of Contents

                Copyright © 2010 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 27 August 2010
                • Accepted: 1 August 2009
                • Revised: 1 November 2008
                • Received: 1 June 2008
                Published in tomm Volume 6, Issue 3

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Research
                • Refereed

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader