Video Fragmentation and Reverse Search on the Web

Apostolidis, Evlampios; Apostolidis, Konstantinos; Patras, Ioannis; Mezaris, Vasileios

doi:10.1007/978-3-030-26752-0_3

Evlampios Apostolidis^5,6,
Konstantinos Apostolidis⁵,
Ioannis Patras⁶ &
…
Vasileios Mezaris⁵

930 Accesses
1 Citations

Abstract

This chapter is focused on methods and tools for video fragmentation and reverse search on the web. These technologies can assist journalists when they are dealing with fake news—which nowadays are being rapidly spread via social media platforms—that rely on the reuse of a previously posted video from a past event with the intention to mislead the viewers about a contemporary event. The fragmentation of a video into visually and temporally coherent parts and the extraction of a representative keyframe for each defined fragment enables the provision of a complete and concise keyframe-based summary of the video. Contrary to straightforward approaches that sample video frames with a constant step, the generated summary through video fragmentation and keyframe extraction is considerably more effective for discovering the video content and performing a fragment-level search for the video on the web. This chapter starts by explaining the nature and characteristics of this type of reuse-based fake news in its introductory part, and continues with an overview of existing approaches for temporal fragmentation of single-shot videos into sub-shots (the most appropriate level of temporal granularity when dealing with user-generated videos) and tools for performing reverse search of a video on the web. Subsequently, it describes two state-of-the-art methods for video sub-shot fragmentation—one relying on the assessment of the visual coherence over sequences of frames, and another one that is based on the identification of camera activity during the video recording—and presents the InVID web application that enables the fine-grained (at the fragment-level) reverse search for near-duplicates of a given video on the web. In the sequel, the chapter reports the findings of a series of experimental evaluations regarding the efficiency of the above-mentioned technologies, which indicate their competence to generate a concise and complete keyframe-based summary of the video content, and the use of this fragment-level representation for fine-grained reverse video search on the web. Finally, it draws conclusions about the effectiveness of the presented technologies and outlines our future plans for further advancing them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://citizenevidence.amnestyusa.org/.
2.
https://tineye.com/.
3.
http://karmadecay.com/.
4.
https://berify.com/.
5.
http://www.revimg.com/.
6.
http://www.videntifier.com.
7.
https://citizenevidence.amnestyusa.org/.
8.
https://tineye.com/.
9.
http://karmadecay.com/.
10.
https://berify.com/.
11.
http://www.revimg.com/.
12.
http://www.videntifier.com.
13.
Available at: http://www.invid-project.eu/verify/.
14.
Some works reported in Sect. 3.2.1 use certain datasets (TRECVid 2007 rushes summarization, UT Ego, ADL and GTEA Gaze) which were designed for assessing the efficiency of methods targeting specific types of analysis, such as video rushes fragmentation [12] and the identification of everyday activities [6] and thus, ground-truth sub-shot fragmentation is not available for them.
15.
https://mklab.iti.gr/results/annotated-dataset-for-sub-shot-segmentation-evaluation.
16.
Both of these approaches were implemented using the FFmpeg framework that is available at: https://www.ffmpeg.org/.
17.
Available at: http://www.invid-project.eu/verify/.

References

Kelm P, Schmiedeke S, Sikora T (2009) Feature-based video key frame extraction for low quality video sequences. In: 2009 10th workshop on image analysis for multimedia interactive services, pp 25–28 (2009). https://doi.org/10.1109/WIAMIS.2009.5031423
Cooray SH, Bredin H, Xu LQ, O’Connor NE (2009) An interactive and multi-level framework for summarising user generated videos. In: Proceedings of the 17th ACM international conference on multimedia, MM ’09. ACM, New York, NY, USA, pp 685–688 (2009). https://doi.org/10.1145/1631272.1631388
Mei T, Tang LX, Tang J, Hua XS (2013) Near-lossless semantic video summarization and its applications to video analysis. ACM Trans Multimed Comput Commun Appl 9(3):16:1–16:23 (2013). https://doi.org/10.1145/2487268.2487269
Article Google Scholar
González-Díaz I, Martínez-Cortés T, Gallardo-Antolín A, Díaz-de María F (2015) Temporal segmentation and keyframe selection methods for user-generated video search-based annotation. Expert Syst Appl 42(1):488–502. https://doi.org/10.1016/j.eswa.2014.08.001
Article Google Scholar
Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: Proceedings of the 2013 IEEE conference on computer vision and pattern recognition, CVPR ’13. IEEE Computer Society, Washington, DC, USA, pp. 2714–2721. https://doi.org/10.1109/CVPR.2013.350
Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J.M., Singh, V.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [345], pp 2235–2244. http://dblp.uni-trier.de/db/conf/cvpr/cvpr2015.html#XuMLWRS15
Karaman S, Benois-Pineau J, Dovgalecs V, Mégret R, Pinquier J, André-Obrecht R, Gaëstel Y, Dartigues JF (2014) Hierarchical hidden markov model in detecting activities of daily living in wearable videos for studies of dementia. Multimed Tools Appl 69(3):743–771. https://doi.org/10.1007/s11042-012-1117-x
Article Google Scholar
Chu WT, Chuang PC, Yu, JY (2010) Video copy detection based on bag of trajectory and two-level approximate sequence. In: Matching, Proceedings of IPPR conference on computer vision, graphics, and image processing conference (2010)
Google Scholar
Luo J, Papin C, Costello K (2009) Towards extracting semantically meaningful key frames from personal video clips: From humans to computers. IEEE Transactions Circuits and Systems for Video Technology 19(2):289–301. https://doi.org/10.1109/TCSVT.2008.2009241
Article Google Scholar
Dumont E, Merialdo B, Essid S, Bailer W et al (2008) Rushes video summarization using a collaborative approach. In: TRECVID 2008, ACM International Conference on Multimedia Information Retrieval 2008, October 27-November 01, 2008, Vancouver, BC, Canada. Vancouver, CANADA. https://doi.org/10.1145/1463563.1463579. URL http://www.eurecom.fr/publication/2576
Liu Y, Liu Y, Ren T, Chan K (2008) Rushes video summarization using audio-visual information and sequence alignment. In: Proceedings of the 2nd ACM TRECVid video summarization workshop, TVS ’08. ACM, New York, NY, USA, pp. 114–118. https://doi.org/10.1145/1463563.1463584
Bai L, Hu Y, Lao S, Smeaton AF, O’Connor NE (2010) Automatic summarization of rushes video using bipartite graphs. Multimed Tools Appl 49(1):63–80. https://doi.org/10.1007/s11042-009-0398-1
Article Google Scholar
Pan CM, Chuang YY, Hsu WH (2007) NTU TRECVID-2007 fast rushes summarization system. In: Proceedings of the international workshop on TRECVID video summarization, TVS ’07. ACM, New York, NY, USA, pp 74–78. https://doi.org/10.1145/1290031.1290045
Teyssou D, Leung JM, Apostolidis E, Apostolidis K, Papadopoulos S, Zampoglou M, Papadopoulou O, Mezaris V (2017) The invid plug-in: web video verification on the browser. In: Proceedings of the first international workshop on multimedia verification, MuVer ’17. ACM, New York, NY, USA, pp 23–30. https://doi.org/10.1145/3132384.3132387
Ojutkangas O, Peltola J, Järvinen S (2012) Location based abstraction of user generated mobile videos. Springer, Berlin, Heidelberg, pp 295–306. https://doi.org/10.1007/978-3-642-30419-4_25
Chapter Google Scholar
Kim, J.G., Chang, H.S., Kim, J., Kim, H.M.: Efficient camera motion characterization for mpeg video indexing. In: 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proc.. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532), vol. 2, pp. 1171–1174 vol.2 (2000). https://doi.org/10.1109/ICME.2000.871569
Durik M, Benois-Pineau J (2001) Robust motion characterisation for video indexing based on MPEG2 optical flow. In: International workshop on content-based multimedia indexing, CBMI01, pp 57–64
Google Scholar
Nitta N, Babaguchi N (2013) [invited paper] content analysis for home videos. ITE Trans Media Technol Appl 1(2):91–100. https://doi.org/10.3169/mta.1.91
Article Google Scholar
Cooray SH, O’Connor NE (2010) Identifying an efficient and robust sub-shot segmentation method for home movie summarisation. In: 2010 10th international conference on intelligent systems design and applications, pp 1287–1292. https://doi.org/10.1109/ISDA.2010.5687086
Lowe D.G (1999) Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE international conference on computer vision, vol 2, pp 1150–1157
Google Scholar
Bay H, Ess A, Tuytelaars T, Gool LV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359. https://doi.org/10.1016/j.cviu.2007.09.014
Article Google Scholar
Bouguet JY (2001) Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corp 5(1–10):4
Google Scholar
Apostolidis K, Apostolidis E, Mezaris V (2018) A motion-driven approach for fine-grained temporal segmentation of user-generated videos. In: Schoeffmann K, Chalidabhongse TH, Ngo CW, Aramvith S, O’Connor NE, Ho YS, Gabbouj M, Elgammal A (eds) MultiMedia modeling. Springer International Publishing, Cham, pp 29–41
Chapter Google Scholar
Haller M et al (2007) A generic approach for motion-based video parsing. In: 15th European signal processing conference, pp 713–717 (2007)
Google Scholar
Abdollahian G, Taskiran CM, Pizlo Z, Delp EJ (2010) Camera motion-based analysis of user generated video. IEEE Trans Multimed 12(1):28–41. https://doi.org/10.1109/TMM.2009.2036286
Article Google Scholar
Lan, D.J., Ma, Y.F., Zhang, H.J.: A novel motion-based representation for video mining. In: Proc. of the 2003 International Conference on Multimedia and Expo (ICME ’03), vol. 3, pp. III–469–72 vol.3 (2003). https://doi.org/10.1109/ICME.2003.1221350
Benois-Pineau J, Lovell BC, Andrews RJ (2013) Motion estimation in colour image sequences. Springer New York, NY, pp 377–395. https://doi.org/10.1007/978-1-4419-6190-7_11
Google Scholar
Koprinska I, Carrato S (1998) Video segmentation of mpeg compressed data. In: 1998 IEEE international conference on electronics, circuits and systems, vol 2. Surfing the Waves of Science and Technology (Cat No 98EX196), pp 243–246. https://doi.org/10.1109/ICECS.1998.814872
Grana C, Cucchiara R (2006) Sub-shot summarization for MPEG-7 based fast browsing. In: Post-Proceedings of the second Italian research conference on digital library management systems (IRCDL 2006), Padova, 27th Jan 2006 [113], pp. 80–84
Google Scholar
Wang G, Seo B, Zimmermann R (2012) Motch: an automatic motion type characterization system for sensor-rich videos. In: Proceedings of the 20th ACM international conference on multimedia, MM ’12. ACM, New York, NY, USA, pp 1319–1320 (2012). https://doi.org/10.1145/2393347.2396462
Cricri F, Dabov K, Curcio IDD, Mate S, Gabbouj M (2011) Multimodal event detection in user generated videos. In: 2011 IEEE international symposium on multimedia, pp 263–270 (2011). https://doi.org/10.1109/ISM.2011.49
Ngo CW, Pong TC, Zhang HJ (2003) Motion analysis and segmentation through spatio-temporal slices processing. IEEE Trans Image Process 12(3):341–355. https://doi.org/10.1109/TIP.2003.809020
Article Google Scholar
Ngo CW, Ma YF, Zhang HJ (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–305. https://doi.org/10.1109/TCSVT.2004.841694
Article Google Scholar
Mohanta PP, Saha SK, Chanda B (2008) Detection of representative frames of a shot using multivariate wald-wolfowitz test. In: 2008 19th international conference on pattern recognition, pp 1–4. https://doi.org/10.1109/ICPR.2008.4761403
Omidyeganeh M, Ghaemmaghami S, Shirmohammadi S (2011) Video keyframe analysis using a segment-based statistical metric in a visually sensitive parametric space. IEEE Trans Image Process 20(10):2730–2737. https://doi.org/10.1109/TIP.2011.2143421
Article MathSciNet MATH Google Scholar
Guo Y, Xu Q, Sun S, Luo X, Sbert M (2016) Selecting video key frames based on relative entropy and the extreme studentized deviate test. Entropy 18(3):73 (2016). http://dblp.uni-trier.de/db/journals/entropy/entropy18.html#GuoXSLS16a
Article Google Scholar
Kasutani E, Yamada A (2001) The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval. In: Proceedings of 2001 international conference on image processing (Cat. No.01CH37205), vol 1, pp 674–677. https://doi.org/10.1109/ICIP.2001.959135
Shi J et al (1994) Good features to track. In: Proceedigns of the IEEE conference on computer vision and pattern recognition, pp 593–600
Google Scholar
Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. In: Proceedings of the IEEE international conference on computer vision (ICCV 2011), pp 2564–2571
Google Scholar
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. ACM Commun 24(6):381–395. https://doi.org/10.1145/358669.358692
Article MathSciNet Google Scholar
Apostolidis E, Mezaris V (2014) Fast shot segmentation combining global and local visual descriptors. In: Proceedings of the 2014 IEEE international conference on acoustics, speech and signal processing, pp 6583–6587 (2014)
Google Scholar

Download references

Acknowledgements

The work reported in this chapter was supported by the EUs Horizon 2020 research and innovation program under grant agreements H2020-687786 InVID and H2020-732665 EMMA.

Author information

Authors and Affiliations

Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Evlampios Apostolidis, Konstantinos Apostolidis & Vasileios Mezaris
School of Electronic Engineering and Computer Science, Queen Mary University, London, UK
Evlampios Apostolidis & Ioannis Patras

Authors

Evlampios Apostolidis
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos Apostolidis
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Patras
View author publications
You can also search for this author in PubMed Google Scholar
Vasileios Mezaris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Evlampios Apostolidis .

Editor information

Editors and Affiliations

Centre for Research and Technology Hellas, Information Technologies Institute, Thermi, Thessaloniki, Greece
Vasileios Mezaris
MODUL Technology GmbH, MODUL University Vienna, Vienna, Austria
Lyndon Nixon
Centre for Research and Technology Hellas, Information Technologies Institute, Thermi, Thessaloniki, Greece
Symeon Papadopoulos
Agence France-Presse, Paris, France
Denis Teyssou

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Apostolidis, E., Apostolidis, K., Patras, I., Mezaris, V. (2019). Video Fragmentation and Reverse Search on the Web. In: Mezaris, V., Nixon, L., Papadopoulos, S., Teyssou, D. (eds) Video Verification in the Fake News Era. Springer, Cham. https://doi.org/10.1007/978-3-030-26752-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-26752-0_3
Published: 18 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26751-3
Online ISBN: 978-3-030-26752-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics