Abstract
Video summaries present the user with a condensed and succinct representation of the content of a video stream. Usually this is achieved by attaching degrees of importance to low-level image, audio and text features. However, video content elicits strong and measurable physiological responses in the user, which are potentially rich indicators of what video content is memorable to or emotionally engaging for an individual user. This article proposes a technique that exploits such physiological responses to a given video stream by a given user to produce Entertainment-Led VIdeo Summaries (ELVIS). ELVIS is made up of five analysis phases which correspond to the analyses of five physiological response measures: electro-dermal response (EDR), heart rate (HR), blood volume pulse (BVP), respiration rate (RR), and respiration amplitude (RA). Through these analyses, the temporal locations of the most entertaining video subsegments, as they occur within the video stream as a whole, are automatically identified. The effectiveness of the ELVIS technique is verified through a statistical analysis of data collected during a set of user trials. Our results show that ELVIS is more consistent than RANDOM, EDR, HR, BVP, RR and RA selections in identifying the most entertaining video subsegments for content in the comedy, horror/comedy, and horror genres. Subjective user reports also reveal that ELVIS video summaries are comparatively easy to understand, enjoyable, and informative.
- Agius, H., Crockford, C., and Money, A. G. 2008. Emotion and multimedia content. In Encyclopedia of Multimedia 2nd Ed. B. Furht, Ed. Springer, New York, 204--205.Google Scholar
- Aizawa, K., Tancharoen, D., Kawasaki, S., and Yamasaki, T. 2004. Efficient retrieval of life log based on context and content. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experiences (CARPE'04). ACM Press, 22--31. Google ScholarDigital Library
- Allanson, J. and Fairclough, S. H. 2004. A research agenda for physiological computing. Interact. Comput. 16, 857--878.Google ScholarCross Ref
- Amenabar, A. 2001. The Others. Miramax.Google Scholar
- Athanasiadis, T., Mylonas, P., Avrithis, Y., and Kollias, S. 2007. Semantic image segmentation and object labeling. IEEE Trans. Circ. Syst. Video Techn. 17, 298--312. Google ScholarDigital Library
- Babaguchi, N., Kawai, Y., and Kitahashi, T. 2001. Generation of personalized abstract of sports video. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'01). IEEE, 800--803.Google Scholar
- Babaguchi, N., Kawai, Y., Ogura, T., and Kitahashi, T. 2004. Personalized abstraction of broadcasted American football video by highlight selection. IEEE Trans. Multimedia 6, 575--586. Google ScholarDigital Library
- Bailer, W., Lee, F., and Thallinger, G. 2007. Skimming rushes video using retake detection. In Proceedings of the IEEE International Workshop on TRECVID Video Summarization. IEEE, 60--64. Google ScholarDigital Library
- Barbieri, M., Agnihotri, L., and Dimitrova, N. 2003. Video summarization: Methods and landscape. In Internet Multimedia Management Systems IV. J. R. Smith, S. Panchanathan, and T. Zhang, Eds. SPIE, 1--13.Google Scholar
- Brown, W. A., Corriveau, D. P., and Monti, P. M. 1977. Anger arousal by a motion picture: A methodological note. Amer. J. Psyc. 134, 930--931.Google ScholarCross Ref
- Cacioppo, J. T., Berntson, G. G., Klein, D. J., and Poehlmann, K. M. 1997. The psychophysiology of emotion across the lifespan. Ann. Rev. Gerontolo. Geriat. 17, 27--74.Google Scholar
- Cacioppo, J. T., Tassinary, L. G., and Berntson, G. G. 2007. Handbook of Psychphysiogy 3rd Ed. Cambridge University Press.Google Scholar
- Carlson, N. R. 2001. Psychology of Behaviour 7th Ed. Allyn and Bacon.Google Scholar
- Chen, F., Cooper, M., and Adcock, J. 2007. Video summarization preserving dynamic content. In Proceedings of the IEEE International Workshop on TRECVID Video Summarization. IEEE, 40--44. Google ScholarDigital Library
- Christie, I. C. and Friedman, B. H. 2004. Autonomic specificity of discrete emotion and dimensions of affective space: A multivariate approach. Int. J. Psychophy. 51, 143--153.Google ScholarCross Ref
- Clark-Carter, D. 1997. Doing Quantitative Psychological Research: From Design to Report. Psychology Press, London.Google Scholar
- Damnjanovic, U., Piatrik, T., Djordjevic, D., and Izquierdo, E. 2007. Video summarisation for surveillance and news domian. In Proceedings of the the 2nd International Conference on Semantic and Digital Media Technologies. Springer-Verlag, 99--102. Google ScholarDigital Library
- Davidson, R. J. 1995. Cerebral asymmetry, emotion, and affective style. In Brain Asymmetry, R. J. Davidson and K. Hugdahl Eds. MIT Press, Cambridge, MA, 361--387.Google Scholar
- de Silva, G., Yamasaki, T., and Aizawa, K. 2005. Evaluation of video summarization for a large number of cameras in ubiquitous home. In Proceedings of the 13th ACM International Conference on Multimedia. ACM Press, 820--828. Google ScholarDigital Library
- de Wied, M., Hoffman, K., and Roskos-Ewoldsen, D. R. 1997. Forewarning of graphic portrayal of violence and the experience of suspenseful drama. Cogni. Emot. 11, 481--494.Google ScholarCross Ref
- Detenber, B. H., Simons, R. F., and Bennett, G. 1998. Roll 'em!: The effects of picture motion on emotional responses. J. Broadcast. Electro. Media 42, 113--127.Google ScholarCross Ref
- Detyniecki, M. and Marsala, C. 2007. Video rushes summarization by adaptive acceleration and stacking of shots. In Proceedings of the IEEE International Workshop on TRECVID Video Summarization. IEEE, 65--69. Google ScholarDigital Library
- Ekman, P., Levenson, R. W., and Friesen, W. V. 1983. Autonomic nervous system activity distinguished between emotion. Science 221, 1208--1210.Google ScholarCross Ref
- Frazier, T. W., Strauss, M. E., and Steinhauer, S. R. 2004. Respiratory sinus arrhythmia as an index of emotional response in young adults. Psychophys. 41, 75--83.Google ScholarCross Ref
- Fridja, N. 1986. The Emotions. Cambridge University Press, Cambridge.Google Scholar
- Furini, M. and Ghini, V. 2006. An audio-video summarisation scheme based on audio and video analysis. In Proceedings of the IEEE Consumer Communications and Networking Conference (CCNC'06). IEEE, 1209--1213.Google Scholar
- Gleitman, H., Reisberg, D., and Gross, A. 2007. Psychology 7th Ed. W. W. Norton, New York.Google Scholar
- Gomez, P. and Danuser, B. 2004. Affective and physiological responses to environmental noises and music. Int. J. Psychophys. 53, 93--103.Google ScholarCross Ref
- Gomez, P., Stahel, W., and Danuser, B. 2004. Respiratory responses during affective picture viewing. Biological Psych. 67, 359--373.Google ScholarCross Ref
- Greenwald, M. K., Cook, E. W., and Lang, P. J. 1989. Affective judgement and psychophysiological response: Dimensional covariation in the evaluation of pictorial stimuli. J. Pyschophys. 3, 51--64.Google Scholar
- Gross, J. J. and Levenson, R. W. 1995. Emotion elicitation using films. Cogn. Emot. 9, 87--108.Google ScholarCross Ref
- Hanjalic, A. 2003. Generic approach to highlight extraction in a sport video. In Proceedings of the IEEE International Conference on Image Processing (ICIP'03). IEEE, 1--4.Google ScholarCross Ref
- Hanjalic, A. 2005. Adaptive extraction of highlights from a sport video based on excitement modeling. IEEE Trans. Multimedia 7, 1114--1122. Google ScholarDigital Library
- Healey, J. A. 2000. Wearable and automotive systems for affect recognition from physiology. PhD Thesis. Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA. Google ScholarDigital Library
- Jaimes, A., Echigo, T., Teraguchi, M. and Satoh, F. 2002. Learning personalized video highlights from detailed MPEG-7 metadata. In Proceedings of the IEEE International Conference on Image Processing (ICIP'02). IEEE, 133--136.Google Scholar
- Jung, B., Song, J., and Lee, Y. 2007. A narrative-based abstraction framework for story-oriented video. ACM Trans. Multimedia Comput. Commun. Appl. 3, 1--28. Google ScholarDigital Library
- Kawai, Y., Sumiyoshi, H., and Yagi, N. 2007. Automated production of TV program trailer using electronic program guide. In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR'07). ACM Press, 49--56. Google ScholarDigital Library
- Kim, J. and Andre, E. 2008. Emotion-specific dichotomous classification and feature-level fusion of multichannel biosignals for automatic emotion recognition. In Proceedings of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems. IEEE, 114--118.Google Scholar
- Kramer, A. F. 1991. Physiological metrics of mental workload: A review of recent progress. In Multiple-Task-Performance, D. L. Damos, Ed. Taylor & Francis, London, 329--360.Google Scholar
- Lang, A., Bolls, P., Potter, R., and Kawahara, K. 1999. The effects of production pacing and arousing content on the information processing of television messages. J. Broadcast. Electro. Media 43, 451--476.Google ScholarCross Ref
- Lang, A., Dhillon, K., and Dong, Q. 1995. The effects of emotional arousal and valence on television viewers' cognitive capacity and memory. J. Broad. Electron. Media 39, 313--327.Google ScholarCross Ref
- Lee, L. L. and Dey, A. K. 2008. Lifelogging memory appliance for people with episodic memory impairment. In Proceedings of the 10th ACM International Conference on Ubiquitous Computing. ACM Press, 44--53. Google ScholarDigital Library
- Leonhardt, S., Falck, T., and Mähönen, P. 2007. Proceedings of the 4th International Workshop on Wearable and Implantable Body Sensor Networks, Springer-Verlag.Google Scholar
- Lew, M. S., Sebe, N., Djeraba, C., and Jain, R. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1--19. Google ScholarDigital Library
- Li, Y., Lee, S., Yeh, C., and Kuo, C. 2006. Semantic retrieval of multimedia. IEEE Signal Process. Mag. 23, 79--89.Google ScholarCross Ref
- Lie, W. and Hsu, K. 2008. Video summarization based on semantic feature analysis and user preference. In Proceedings of the IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing. IEEE, 486--491. Google ScholarDigital Library
- McIntyre, G. and Göcke, R. 2007. The composite sensing of affect. In Affect and Emotion in Human-Computer Interaction. Lecture Notes in Computer Science, vol. 4868, C. Peter and R. Beale, Eds. Springer-Verlag. Google ScholarDigital Library
- Millet, C., Bloch, I., Hede, P., and Moellic, P. 2005. Using relative spatial relationships to improve individual region recognition. In Proceedings of the 2nd European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology (EWIMT'05). IEEE, 119--126.Google Scholar
- Money, A. and Agius, H. 2006. Are affective video summaries feasible? In Joint Proceedings of the 2005, 2006, and 2007 International Workshops at the BCS HCI Group Annual Conferences. C. Peter, R. Beale, E. Crane, L. Axelrod, and G. Blyth Eds. IRB Verlag, 142--149.Google Scholar
- Money, A. G. and Agius, H. 2005. ‘Once more, with feeling’: An emotional approach to multimedia content analysis. In Proceedings of the 9th IASTED International Conference on Internet and Multimedia Systems and Applications (IMSA'05). ACTA Press, Anaheim, CA, 436--441.Google Scholar
- Money, A. G. and Agius, H. 2008a. Feasibility of personalized affective video summaries. In Affect and Emotion in Human-Computer Interaction. Lecture Notes in Computer Science, vol. 4868, C. Peter and R. Beale, Eds. Springer-Verlag. Google ScholarDigital Library
- Money, A. G. and Agius, H. 2008b. Video summarisation: A conceptual framework and survey of the state of the art. J. Vis. Commun. Image Represent. 19, 121--143. Google ScholarDigital Library
- Moriyama, T. and Sakauchi, M. 2002. Video summarization based on the psychological unfolding of drama. Syst. Comput. Japan 33, 1122--1131.Google ScholarCross Ref
- Morrone-Strupinsky, J. V., and Depue, R. A. 2004. Differential relation of two distinct, film-induced positive emotional states to affiliative and agentic extraversion. Personal. Individ. Diff. 36, 1109--1126.Google ScholarCross Ref
- Naphade, R. M. and Huang, T. S. 2001. A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Trans. Multimedia 3, 141--151. Google ScholarDigital Library
- Nasoz, F., Alvarez, K., Lisetti, C. L., and Finkelstein, N. 2003. Emotion recognition from physiological signals for presence technologies. Int. J. Cogn. 6, 1--32.Google Scholar
- Ngo, C., Ma, Y., and Zhang, H. 2005. Video summarization and scene detection by graph modeling. IEEE Trans. Circ. Syst. Video Techn. 15, 296--305. Google ScholarDigital Library
- Over, P., Smeaton, A. F., and Kelly, P. 2007. The TRECVID rushes summarization evaluation pilot. In Proceedings of the TVS—TRECVID BBC Rushes Summarization Workshop. Google ScholarDigital Library
- Palomba, D. and Stegagno, L. 1993. Physiology, perceived emotion and memory: responding to film sequences. In The Structure of Emotion: Psychophysiological, Cognitive, and Clinical Aspects, N. Birbaumer and A. Ohman, Eds. Hogrefe & Huber, Toronto, 158--168.Google Scholar
- Philippot, P., Chapelle, C., and Blairy, S. 2002. Respiratory feedback in the generation of emotion. Cogn. Emot. 16, 605--627.Google ScholarCross Ref
- Picard, R. W. 1995. Affective Computing. Tech. rep. No. 321, MIT Media Laboratory Perceptual Computing Section, http://vismod.media.mit.edu/tech-reports/TR-321.pdf.Google Scholar
- Picard, R. W. 1997. Affective Computing. MIT Press, Cambridge, MA. Google ScholarDigital Library
- Piferi, R. L., Kline, K. A., Younger, J., and Lawler, K. A. 2000. An alternative approach for achieving cardiovascular baseline: Viewing an aquatic video. Int. J. Psychophys. 37, 207--217.Google ScholarCross Ref
- Power, M. and Dalgliesh, T. 1998. Cognition and Emotion: From Order to Disorder. Psychology Press, Guildford, Surrey.Google Scholar
- Rikkard, N. S. 2004. Intense emotional responses to music: A test of the physiological arousal hypothesis Psych. Music 32, 371--388.Google Scholar
- Rui, Y., Gupta, A., and Acero, A. 2000. Automatically extracting highlights for TV Baseball programs. In Proceedings of the 8th ACM International Conference on Multimedia. ACM Press, 105--115. Google ScholarDigital Library
- Rui, Y., Zhou, S. X., and Huang, T. S. 1999. Efficient access to video content in a unified framework. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems (ICMCS'99). IEEE, 735--740. Google ScholarDigital Library
- Scheirer, J., Fernandez, P., Klein, J. and Picard, R. J. 2002. Frustrating the user on purpose: A step toward building an affective computer. Interact. Comput. 14, 93--118.Google ScholarCross Ref
- Sebe, N., Cohen, I., Gevers, T., and Huang, T. S. 2005. Multimodal approaches for emotion recognition: A survey. In Proceedings of the SPIE Conference on Internet Imaging.Google Scholar
- Shipman, S., Divakaran, A., and Flynn, M. 2007. Highlight scene detection and video summarization for PVR-enabled television systems. In Proceedings of the IEEE International Conference on Consumer Electronics. IEEE, 1--2.Google Scholar
- Simon, H. A. 1982. Comments. In Affect and Cognition. C. Sydnor and S. T. Fiske, Eds. Lawrence Erlbaum Associates, Hillsdale, NJ, 333--342.Google Scholar
- Simons, R. F., Detenber, B. H., Reiss, J. E., and Shults, C. W. 2000. Image motion and context: A between- and within-subject comparison. Psychophys. 37, 706--710.Google ScholarCross Ref
- Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE Trans. Patt. Anal. Mach. Intell. 22, 1349--1380. Google ScholarDigital Library
- Spiers, B. 1979. The Psychiatrist. Fawlty Towers, Series 2. BBC Television.Google Scholar
- Steinbeis, N., Koelsch, S., and Sloboda, J. A. 2006. The role of harmonic expectancy violations in musical emotions: Evidence from subjective, physiological, and neural responses. J. Cogn. Neurosci. 18, 1380--1393. Google ScholarDigital Library
- Suziki, J., Hiroshi, N., and Hori, T. 2004. Level of interest in video clips modulates event-related potentials to auditory probes. Int. J. Psychophys. 55, 35--43.Google ScholarCross Ref
- Takahashi, Y., Nitta, N., and Babaguchi, N. 2005. Video summarization for large sports video archives. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'05). IEEE, 1170--1173.Google Scholar
- Tjondronegoro, D., Chen, Y. P., and Pham, B. 2003. Sports video summarization using highlights and play-breaks. In Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR'03). ACM Press, 201--208. Google ScholarDigital Library
- Truong, B. T. and Venkatesh, S. 2007. Video abstraction: A systematic review and classification. ACM Trans. Multimedia Comput. Commun. Appl. 3, 1--37. Google ScholarDigital Library
- Van Diest, I., Winters, W., Devriese, S., Vercamst, E., Han, J. N., Van de Woestijne, K. P., and Van den Bergh, O. 2001. Hyperventilation beyond fight/flight: respiratory responses during emotional imagery. Psychophys. 38, 961--968.Google ScholarCross Ref
- van Reekum, C. M. and Johnstone, T. 2004. Psychophysiological responses to appraisal dimensions in a computer game. Cogn. Emot. 18, 663--688.Google ScholarCross Ref
- Wang, H., Prendinger, H., and Igarashi, T. 2004. Communicating emotions in online chat using physiological sensors and animated text. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'04). ACM Press, 1171--1174. Google ScholarDigital Library
- Wang, T., Gao, Y., Li, J., Wang, P. P., Tong, X., Hu, W., Zhang, Y., and Li, J. 2007. THU-ICRC at rush summarization of TRECVID 2007. In Proceedings of the IEEE International Workshop on TRECVID Video Summarization. IEEE, 79--83. Google ScholarDigital Library
- Winton, W. M., Putnam, L. E., and Krauss, R. M. 1984. Facial and autonomic manifestations of the dimensional structure of emotion. J. Exper. Soc. Psychol. 20, 195--216.Google ScholarCross Ref
- Wright, E. 2004. Shaun of the Dead. Universal Pictures.Google Scholar
- Xu, C., Wang, J., Wan, K., Li, Y., and Duan, L. 2006. Live sports detection based on broadcast video and Web-casting text. In Proceedings of the 14th ACM International Conference on Multimedia. ACM Press, 221--230. Google ScholarDigital Library
Index Terms
- ELVIS: Entertainment-led video summaries
Recommendations
'Mind the gap': evaluating user physiological response for multi-genre video summarisation
BCS-HCI '13: Proceedings of the 27th International BCS Human Computer Interaction ConferenceExisting video summarisation techniques are often only capable of summarising video from pre-specified content genres and are often not able to produce personalised summaries as they are not able to source relevant user specific data. Because users ...
Video digest based on heart rate
VIIP '07: The Seventh IASTED International Conference on Visualization, Imaging and Image ProcessingIn video digesting, not only features and keywords extracted from a content itself but viewer's input are essential to incorporate subjective impressions and perceived importance. The present study aims at providing heart rate based arousal level as ...
PIV: Placement, Pattern, and Personalization of an Inconspicuous Vibrotactile Breathing Pacer
We describe the design and evaluation of PIV, a personalizable and inconspicuous vibrotactile breathing pacer. Given the prevalence and adverse impact of anxiety and anxiety disorders, our goal is to develop a technology that helps people regulate their ...
Comments