Abstract
This chapter is focused on methods and tools for video fragmentation and reverse search on the web. These technologies can assist journalists when they are dealing with fake news—which nowadays are being rapidly spread via social media platforms—that rely on the reuse of a previously posted video from a past event with the intention to mislead the viewers about a contemporary event. The fragmentation of a video into visually and temporally coherent parts and the extraction of a representative keyframe for each defined fragment enables the provision of a complete and concise keyframe-based summary of the video. Contrary to straightforward approaches that sample video frames with a constant step, the generated summary through video fragmentation and keyframe extraction is considerably more effective for discovering the video content and performing a fragment-level search for the video on the web. This chapter starts by explaining the nature and characteristics of this type of reuse-based fake news in its introductory part, and continues with an overview of existing approaches for temporal fragmentation of single-shot videos into sub-shots (the most appropriate level of temporal granularity when dealing with user-generated videos) and tools for performing reverse search of a video on the web. Subsequently, it describes two state-of-the-art methods for video sub-shot fragmentation—one relying on the assessment of the visual coherence over sequences of frames, and another one that is based on the identification of camera activity during the video recording—and presents the InVID web application that enables the fine-grained (at the fragment-level) reverse search for near-duplicates of a given video on the web. In the sequel, the chapter reports the findings of a series of experimental evaluations regarding the efficiency of the above-mentioned technologies, which indicate their competence to generate a concise and complete keyframe-based summary of the video content, and the use of this fragment-level representation for fine-grained reverse video search on the web. Finally, it draws conclusions about the effectiveness of the presented technologies and outlines our future plans for further advancing them.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
Available at: http://www.invid-project.eu/verify/.
- 14.
Some works reported in Sect. 3.2.1 use certain datasets (TRECVid 2007 rushes summarization, UT Ego, ADL and GTEA Gaze) which were designed for assessing the efficiency of methods targeting specific types of analysis, such as video rushes fragmentation [12] and the identification of everyday activities [6] and thus, ground-truth sub-shot fragmentation is not available for them.
- 15.
- 16.
Both of these approaches were implemented using the FFmpeg framework that is available at: https://www.ffmpeg.org/.
- 17.
Available at: http://www.invid-project.eu/verify/.
References
Kelm P, Schmiedeke S, Sikora T (2009) Feature-based video key frame extraction for low quality video sequences. In: 2009 10th workshop on image analysis for multimedia interactive services, pp 25–28 (2009). https://doi.org/10.1109/WIAMIS.2009.5031423
Cooray SH, Bredin H, Xu LQ, O’Connor NE (2009) An interactive and multi-level framework for summarising user generated videos. In: Proceedings of the 17th ACM international conference on multimedia, MM ’09. ACM, New York, NY, USA, pp 685–688 (2009). https://doi.org/10.1145/1631272.1631388
Mei T, Tang LX, Tang J, Hua XS (2013) Near-lossless semantic video summarization and its applications to video analysis. ACM Trans Multimed Comput Commun Appl 9(3):16:1–16:23 (2013). https://doi.org/10.1145/2487268.2487269
González-Díaz I, Martínez-Cortés T, Gallardo-Antolín A, Díaz-de María F (2015) Temporal segmentation and keyframe selection methods for user-generated video search-based annotation. Expert Syst Appl 42(1):488–502. https://doi.org/10.1016/j.eswa.2014.08.001
Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: Proceedings of the 2013 IEEE conference on computer vision and pattern recognition, CVPR ’13. IEEE Computer Society, Washington, DC, USA, pp. 2714–2721. https://doi.org/10.1109/CVPR.2013.350
Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J.M., Singh, V.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [345], pp 2235–2244. http://dblp.uni-trier.de/db/conf/cvpr/cvpr2015.html#XuMLWRS15
Karaman S, Benois-Pineau J, Dovgalecs V, Mégret R, Pinquier J, André-Obrecht R, Gaëstel Y, Dartigues JF (2014) Hierarchical hidden markov model in detecting activities of daily living in wearable videos for studies of dementia. Multimed Tools Appl 69(3):743–771. https://doi.org/10.1007/s11042-012-1117-x
Chu WT, Chuang PC, Yu, JY (2010) Video copy detection based on bag of trajectory and two-level approximate sequence. In: Matching, Proceedings of IPPR conference on computer vision, graphics, and image processing conference (2010)
Luo J, Papin C, Costello K (2009) Towards extracting semantically meaningful key frames from personal video clips: From humans to computers. IEEE Transactions Circuits and Systems for Video Technology 19(2):289–301. https://doi.org/10.1109/TCSVT.2008.2009241
Dumont E, Merialdo B, Essid S, Bailer W et al (2008) Rushes video summarization using a collaborative approach. In: TRECVID 2008, ACM International Conference on Multimedia Information Retrieval 2008, October 27-November 01, 2008, Vancouver, BC, Canada. Vancouver, CANADA. https://doi.org/10.1145/1463563.1463579. URL http://www.eurecom.fr/publication/2576
Liu Y, Liu Y, Ren T, Chan K (2008) Rushes video summarization using audio-visual information and sequence alignment. In: Proceedings of the 2nd ACM TRECVid video summarization workshop, TVS ’08. ACM, New York, NY, USA, pp. 114–118. https://doi.org/10.1145/1463563.1463584
Bai L, Hu Y, Lao S, Smeaton AF, O’Connor NE (2010) Automatic summarization of rushes video using bipartite graphs. Multimed Tools Appl 49(1):63–80. https://doi.org/10.1007/s11042-009-0398-1
Pan CM, Chuang YY, Hsu WH (2007) NTU TRECVID-2007 fast rushes summarization system. In: Proceedings of the international workshop on TRECVID video summarization, TVS ’07. ACM, New York, NY, USA, pp 74–78. https://doi.org/10.1145/1290031.1290045
Teyssou D, Leung JM, Apostolidis E, Apostolidis K, Papadopoulos S, Zampoglou M, Papadopoulou O, Mezaris V (2017) The invid plug-in: web video verification on the browser. In: Proceedings of the first international workshop on multimedia verification, MuVer ’17. ACM, New York, NY, USA, pp 23–30. https://doi.org/10.1145/3132384.3132387
Ojutkangas O, Peltola J, Järvinen S (2012) Location based abstraction of user generated mobile videos. Springer, Berlin, Heidelberg, pp 295–306. https://doi.org/10.1007/978-3-642-30419-4_25
Kim, J.G., Chang, H.S., Kim, J., Kim, H.M.: Efficient camera motion characterization for mpeg video indexing. In: 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proc.. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532), vol. 2, pp. 1171–1174 vol.2 (2000). https://doi.org/10.1109/ICME.2000.871569
Durik M, Benois-Pineau J (2001) Robust motion characterisation for video indexing based on MPEG2 optical flow. In: International workshop on content-based multimedia indexing, CBMI01, pp 57–64
Nitta N, Babaguchi N (2013) [invited paper] content analysis for home videos. ITE Trans Media Technol Appl 1(2):91–100. https://doi.org/10.3169/mta.1.91
Cooray SH, O’Connor NE (2010) Identifying an efficient and robust sub-shot segmentation method for home movie summarisation. In: 2010 10th international conference on intelligent systems design and applications, pp 1287–1292. https://doi.org/10.1109/ISDA.2010.5687086
Lowe D.G (1999) Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE international conference on computer vision, vol 2, pp 1150–1157
Bay H, Ess A, Tuytelaars T, Gool LV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359. https://doi.org/10.1016/j.cviu.2007.09.014
Bouguet JY (2001) Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corp 5(1–10):4
Apostolidis K, Apostolidis E, Mezaris V (2018) A motion-driven approach for fine-grained temporal segmentation of user-generated videos. In: Schoeffmann K, Chalidabhongse TH, Ngo CW, Aramvith S, O’Connor NE, Ho YS, Gabbouj M, Elgammal A (eds) MultiMedia modeling. Springer International Publishing, Cham, pp 29–41
Haller M et al (2007) A generic approach for motion-based video parsing. In: 15th European signal processing conference, pp 713–717 (2007)
Abdollahian G, Taskiran CM, Pizlo Z, Delp EJ (2010) Camera motion-based analysis of user generated video. IEEE Trans Multimed 12(1):28–41. https://doi.org/10.1109/TMM.2009.2036286
Lan, D.J., Ma, Y.F., Zhang, H.J.: A novel motion-based representation for video mining. In: Proc. of the 2003 International Conference on Multimedia and Expo (ICME ’03), vol. 3, pp. III–469–72 vol.3 (2003). https://doi.org/10.1109/ICME.2003.1221350
Benois-Pineau J, Lovell BC, Andrews RJ (2013) Motion estimation in colour image sequences. Springer New York, NY, pp 377–395. https://doi.org/10.1007/978-1-4419-6190-7_11
Koprinska I, Carrato S (1998) Video segmentation of mpeg compressed data. In: 1998 IEEE international conference on electronics, circuits and systems, vol 2. Surfing the Waves of Science and Technology (Cat No 98EX196), pp 243–246. https://doi.org/10.1109/ICECS.1998.814872
Grana C, Cucchiara R (2006) Sub-shot summarization for MPEG-7 based fast browsing. In: Post-Proceedings of the second Italian research conference on digital library management systems (IRCDL 2006), Padova, 27th Jan 2006 [113], pp. 80–84
Wang G, Seo B, Zimmermann R (2012) Motch: an automatic motion type characterization system for sensor-rich videos. In: Proceedings of the 20th ACM international conference on multimedia, MM ’12. ACM, New York, NY, USA, pp 1319–1320 (2012). https://doi.org/10.1145/2393347.2396462
Cricri F, Dabov K, Curcio IDD, Mate S, Gabbouj M (2011) Multimodal event detection in user generated videos. In: 2011 IEEE international symposium on multimedia, pp 263–270 (2011). https://doi.org/10.1109/ISM.2011.49
Ngo CW, Pong TC, Zhang HJ (2003) Motion analysis and segmentation through spatio-temporal slices processing. IEEE Trans Image Process 12(3):341–355. https://doi.org/10.1109/TIP.2003.809020
Ngo CW, Ma YF, Zhang HJ (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–305. https://doi.org/10.1109/TCSVT.2004.841694
Mohanta PP, Saha SK, Chanda B (2008) Detection of representative frames of a shot using multivariate wald-wolfowitz test. In: 2008 19th international conference on pattern recognition, pp 1–4. https://doi.org/10.1109/ICPR.2008.4761403
Omidyeganeh M, Ghaemmaghami S, Shirmohammadi S (2011) Video keyframe analysis using a segment-based statistical metric in a visually sensitive parametric space. IEEE Trans Image Process 20(10):2730–2737. https://doi.org/10.1109/TIP.2011.2143421
Guo Y, Xu Q, Sun S, Luo X, Sbert M (2016) Selecting video key frames based on relative entropy and the extreme studentized deviate test. Entropy 18(3):73 (2016). http://dblp.uni-trier.de/db/journals/entropy/entropy18.html#GuoXSLS16a
Kasutani E, Yamada A (2001) The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval. In: Proceedings of 2001 international conference on image processing (Cat. No.01CH37205), vol 1, pp 674–677. https://doi.org/10.1109/ICIP.2001.959135
Shi J et al (1994) Good features to track. In: Proceedigns of the IEEE conference on computer vision and pattern recognition, pp 593–600
Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. In: Proceedings of the IEEE international conference on computer vision (ICCV 2011), pp 2564–2571
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. ACM Commun 24(6):381–395. https://doi.org/10.1145/358669.358692
Apostolidis E, Mezaris V (2014) Fast shot segmentation combining global and local visual descriptors. In: Proceedings of the 2014 IEEE international conference on acoustics, speech and signal processing, pp 6583–6587 (2014)
Acknowledgements
The work reported in this chapter was supported by the EUs Horizon 2020 research and innovation program under grant agreements H2020-687786 InVID and H2020-732665 EMMA.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Apostolidis, E., Apostolidis, K., Patras, I., Mezaris, V. (2019). Video Fragmentation and Reverse Search on the Web. In: Mezaris, V., Nixon, L., Papadopoulos, S., Teyssou, D. (eds) Video Verification in the Fake News Era. Springer, Cham. https://doi.org/10.1007/978-3-030-26752-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-26752-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26751-3
Online ISBN: 978-3-030-26752-0
eBook Packages: Computer ScienceComputer Science (R0)