Skip to main content

Video Fragmentation and Reverse Search on the Web

  • Chapter
  • First Online:
Video Verification in the Fake News Era

Abstract

This chapter is focused on methods and tools for video fragmentation and reverse search on the web. These technologies can assist journalists when they are dealing with fake news—which nowadays are being rapidly spread via social media platforms—that rely on the reuse of a previously posted video from a past event with the intention to mislead the viewers about a contemporary event. The fragmentation of a video into visually and temporally coherent parts and the extraction of a representative keyframe for each defined fragment enables the provision of a complete and concise keyframe-based summary of the video. Contrary to straightforward approaches that sample video frames with a constant step, the generated summary through video fragmentation and keyframe extraction is considerably more effective for discovering the video content and performing a fragment-level search for the video on the web. This chapter starts by explaining the nature and characteristics of this type of reuse-based fake news in its introductory part, and continues with an overview of existing approaches for temporal fragmentation of single-shot videos into sub-shots (the most appropriate level of temporal granularity when dealing with user-generated videos) and tools for performing reverse search of a video on the web. Subsequently, it describes two state-of-the-art methods for video sub-shot fragmentation—one relying on the assessment of the visual coherence over sequences of frames, and another one that is based on the identification of camera activity during the video recording—and presents the InVID web application that enables the fine-grained (at the fragment-level) reverse search for near-duplicates of a given video on the web. In the sequel, the chapter reports the findings of a series of experimental evaluations regarding the efficiency of the above-mentioned technologies, which indicate their competence to generate a concise and complete keyframe-based summary of the video content, and the use of this fragment-level representation for fine-grained reverse video search on the web. Finally, it draws conclusions about the effectiveness of the presented technologies and outlines our future plans for further advancing them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://citizenevidence.amnestyusa.org/.

  2. 2.

    https://tineye.com/.

  3. 3.

    http://karmadecay.com/.

  4. 4.

    https://berify.com/.

  5. 5.

    http://www.revimg.com/.

  6. 6.

    http://www.videntifier.com.

  7. 7.

    https://citizenevidence.amnestyusa.org/.

  8. 8.

    https://tineye.com/.

  9. 9.

    http://karmadecay.com/.

  10. 10.

    https://berify.com/.

  11. 11.

    http://www.revimg.com/.

  12. 12.

    http://www.videntifier.com.

  13. 13.

    Available at: http://www.invid-project.eu/verify/.

  14. 14.

    Some works reported in Sect. 3.2.1 use certain datasets (TRECVid 2007 rushes summarization, UT Ego, ADL and GTEA Gaze) which were designed for assessing the efficiency of methods targeting specific types of analysis, such as video rushes fragmentation [12] and the identification of everyday activities [6] and thus, ground-truth sub-shot fragmentation is not available for them.

  15. 15.

    https://mklab.iti.gr/results/annotated-dataset-for-sub-shot-segmentation-evaluation.

  16. 16.

    Both of these approaches were implemented using the FFmpeg framework that is available at: https://www.ffmpeg.org/.

  17. 17.

    Available at: http://www.invid-project.eu/verify/.

References

  1. Kelm P, Schmiedeke S, Sikora T (2009) Feature-based video key frame extraction for low quality video sequences. In: 2009 10th workshop on image analysis for multimedia interactive services, pp 25–28 (2009). https://doi.org/10.1109/WIAMIS.2009.5031423

  2. Cooray SH, Bredin H, Xu LQ, O’Connor NE (2009) An interactive and multi-level framework for summarising user generated videos. In: Proceedings of the 17th ACM international conference on multimedia, MM ’09. ACM, New York, NY, USA, pp 685–688 (2009). https://doi.org/10.1145/1631272.1631388

  3. Mei T, Tang LX, Tang J, Hua XS (2013) Near-lossless semantic video summarization and its applications to video analysis. ACM Trans Multimed Comput Commun Appl 9(3):16:1–16:23 (2013). https://doi.org/10.1145/2487268.2487269

    Article  Google Scholar 

  4. González-Díaz I, Martínez-Cortés T, Gallardo-Antolín A, Díaz-de María F (2015) Temporal segmentation and keyframe selection methods for user-generated video search-based annotation. Expert Syst Appl 42(1):488–502. https://doi.org/10.1016/j.eswa.2014.08.001

    Article  Google Scholar 

  5. Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: Proceedings of the 2013 IEEE conference on computer vision and pattern recognition, CVPR ’13. IEEE Computer Society, Washington, DC, USA, pp. 2714–2721. https://doi.org/10.1109/CVPR.2013.350

  6. Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J.M., Singh, V.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [345], pp 2235–2244. http://dblp.uni-trier.de/db/conf/cvpr/cvpr2015.html#XuMLWRS15

  7. Karaman S, Benois-Pineau J, Dovgalecs V, Mégret R, Pinquier J, André-Obrecht R, Gaëstel Y, Dartigues JF (2014) Hierarchical hidden markov model in detecting activities of daily living in wearable videos for studies of dementia. Multimed Tools Appl 69(3):743–771. https://doi.org/10.1007/s11042-012-1117-x

    Article  Google Scholar 

  8. Chu WT, Chuang PC, Yu, JY (2010) Video copy detection based on bag of trajectory and two-level approximate sequence. In: Matching, Proceedings of IPPR conference on computer vision, graphics, and image processing conference (2010)

    Google Scholar 

  9. Luo J, Papin C, Costello K (2009) Towards extracting semantically meaningful key frames from personal video clips: From humans to computers. IEEE Transactions Circuits and Systems for Video Technology 19(2):289–301. https://doi.org/10.1109/TCSVT.2008.2009241

    Article  Google Scholar 

  10. Dumont E, Merialdo B, Essid S, Bailer W et al (2008) Rushes video summarization using a collaborative approach. In: TRECVID 2008, ACM International Conference on Multimedia Information Retrieval 2008, October 27-November 01, 2008, Vancouver, BC, Canada. Vancouver, CANADA. https://doi.org/10.1145/1463563.1463579. URL http://www.eurecom.fr/publication/2576

  11. Liu Y, Liu Y, Ren T, Chan K (2008) Rushes video summarization using audio-visual information and sequence alignment. In: Proceedings of the 2nd ACM TRECVid video summarization workshop, TVS ’08. ACM, New York, NY, USA, pp. 114–118. https://doi.org/10.1145/1463563.1463584

  12. Bai L, Hu Y, Lao S, Smeaton AF, O’Connor NE (2010) Automatic summarization of rushes video using bipartite graphs. Multimed Tools Appl 49(1):63–80. https://doi.org/10.1007/s11042-009-0398-1

    Article  Google Scholar 

  13. Pan CM, Chuang YY, Hsu WH (2007) NTU TRECVID-2007 fast rushes summarization system. In: Proceedings of the international workshop on TRECVID video summarization, TVS ’07. ACM, New York, NY, USA, pp 74–78. https://doi.org/10.1145/1290031.1290045

  14. Teyssou D, Leung JM, Apostolidis E, Apostolidis K, Papadopoulos S, Zampoglou M, Papadopoulou O, Mezaris V (2017) The invid plug-in: web video verification on the browser. In: Proceedings of the first international workshop on multimedia verification, MuVer ’17. ACM, New York, NY, USA, pp 23–30. https://doi.org/10.1145/3132384.3132387

  15. Ojutkangas O, Peltola J, Järvinen S (2012) Location based abstraction of user generated mobile videos. Springer, Berlin, Heidelberg, pp 295–306. https://doi.org/10.1007/978-3-642-30419-4_25

    Chapter  Google Scholar 

  16. Kim, J.G., Chang, H.S., Kim, J., Kim, H.M.: Efficient camera motion characterization for mpeg video indexing. In: 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proc.. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532), vol. 2, pp. 1171–1174 vol.2 (2000). https://doi.org/10.1109/ICME.2000.871569

  17. Durik M, Benois-Pineau J (2001) Robust motion characterisation for video indexing based on MPEG2 optical flow. In: International workshop on content-based multimedia indexing, CBMI01, pp 57–64

    Google Scholar 

  18. Nitta N, Babaguchi N (2013) [invited paper] content analysis for home videos. ITE Trans Media Technol Appl 1(2):91–100. https://doi.org/10.3169/mta.1.91

    Article  Google Scholar 

  19. Cooray SH, O’Connor NE (2010) Identifying an efficient and robust sub-shot segmentation method for home movie summarisation. In: 2010 10th international conference on intelligent systems design and applications, pp 1287–1292. https://doi.org/10.1109/ISDA.2010.5687086

  20. Lowe D.G (1999) Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE international conference on computer vision, vol 2, pp 1150–1157

    Google Scholar 

  21. Bay H, Ess A, Tuytelaars T, Gool LV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359. https://doi.org/10.1016/j.cviu.2007.09.014

    Article  Google Scholar 

  22. Bouguet JY (2001) Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corp 5(1–10):4

    Google Scholar 

  23. Apostolidis K, Apostolidis E, Mezaris V (2018) A motion-driven approach for fine-grained temporal segmentation of user-generated videos. In: Schoeffmann K, Chalidabhongse TH, Ngo CW, Aramvith S, O’Connor NE, Ho YS, Gabbouj M, Elgammal A (eds) MultiMedia modeling. Springer International Publishing, Cham, pp 29–41

    Chapter  Google Scholar 

  24. Haller M et al (2007) A generic approach for motion-based video parsing. In: 15th European signal processing conference, pp 713–717 (2007)

    Google Scholar 

  25. Abdollahian G, Taskiran CM, Pizlo Z, Delp EJ (2010) Camera motion-based analysis of user generated video. IEEE Trans Multimed 12(1):28–41. https://doi.org/10.1109/TMM.2009.2036286

    Article  Google Scholar 

  26. Lan, D.J., Ma, Y.F., Zhang, H.J.: A novel motion-based representation for video mining. In: Proc. of the 2003 International Conference on Multimedia and Expo (ICME ’03), vol. 3, pp. III–469–72 vol.3 (2003). https://doi.org/10.1109/ICME.2003.1221350

  27. Benois-Pineau J, Lovell BC, Andrews RJ (2013) Motion estimation in colour image sequences. Springer New York, NY, pp 377–395. https://doi.org/10.1007/978-1-4419-6190-7_11

    Google Scholar 

  28. Koprinska I, Carrato S (1998) Video segmentation of mpeg compressed data. In: 1998 IEEE international conference on electronics, circuits and systems, vol 2. Surfing the Waves of Science and Technology (Cat No 98EX196), pp 243–246. https://doi.org/10.1109/ICECS.1998.814872

  29. Grana C, Cucchiara R (2006) Sub-shot summarization for MPEG-7 based fast browsing. In: Post-Proceedings of the second Italian research conference on digital library management systems (IRCDL 2006), Padova, 27th Jan 2006 [113], pp. 80–84

    Google Scholar 

  30. Wang G, Seo B, Zimmermann R (2012) Motch: an automatic motion type characterization system for sensor-rich videos. In: Proceedings of the 20th ACM international conference on multimedia, MM ’12. ACM, New York, NY, USA, pp 1319–1320 (2012). https://doi.org/10.1145/2393347.2396462

  31. Cricri F, Dabov K, Curcio IDD, Mate S, Gabbouj M (2011) Multimodal event detection in user generated videos. In: 2011 IEEE international symposium on multimedia, pp 263–270 (2011). https://doi.org/10.1109/ISM.2011.49

  32. Ngo CW, Pong TC, Zhang HJ (2003) Motion analysis and segmentation through spatio-temporal slices processing. IEEE Trans Image Process 12(3):341–355. https://doi.org/10.1109/TIP.2003.809020

    Article  Google Scholar 

  33. Ngo CW, Ma YF, Zhang HJ (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–305. https://doi.org/10.1109/TCSVT.2004.841694

    Article  Google Scholar 

  34. Mohanta PP, Saha SK, Chanda B (2008) Detection of representative frames of a shot using multivariate wald-wolfowitz test. In: 2008 19th international conference on pattern recognition, pp 1–4. https://doi.org/10.1109/ICPR.2008.4761403

  35. Omidyeganeh M, Ghaemmaghami S, Shirmohammadi S (2011) Video keyframe analysis using a segment-based statistical metric in a visually sensitive parametric space. IEEE Trans Image Process 20(10):2730–2737. https://doi.org/10.1109/TIP.2011.2143421

    Article  MathSciNet  MATH  Google Scholar 

  36. Guo Y, Xu Q, Sun S, Luo X, Sbert M (2016) Selecting video key frames based on relative entropy and the extreme studentized deviate test. Entropy 18(3):73 (2016). http://dblp.uni-trier.de/db/journals/entropy/entropy18.html#GuoXSLS16a

    Article  Google Scholar 

  37. Kasutani E, Yamada A (2001) The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval. In: Proceedings of 2001 international conference on image processing (Cat. No.01CH37205), vol 1, pp 674–677. https://doi.org/10.1109/ICIP.2001.959135

  38. Shi J et al (1994) Good features to track. In: Proceedigns of the IEEE conference on computer vision and pattern recognition, pp 593–600

    Google Scholar 

  39. Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. In: Proceedings of the IEEE international conference on computer vision (ICCV 2011), pp 2564–2571

    Google Scholar 

  40. Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. ACM Commun 24(6):381–395. https://doi.org/10.1145/358669.358692

    Article  MathSciNet  Google Scholar 

  41. Apostolidis E, Mezaris V (2014) Fast shot segmentation combining global and local visual descriptors. In: Proceedings of the 2014 IEEE international conference on acoustics, speech and signal processing, pp 6583–6587 (2014)

    Google Scholar 

Download references

Acknowledgements

The work reported in this chapter was supported by the EUs Horizon 2020 research and innovation program under grant agreements H2020-687786 InVID and H2020-732665 EMMA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Evlampios Apostolidis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Apostolidis, E., Apostolidis, K., Patras, I., Mezaris, V. (2019). Video Fragmentation and Reverse Search on the Web. In: Mezaris, V., Nixon, L., Papadopoulos, S., Teyssou, D. (eds) Video Verification in the Fake News Era. Springer, Cham. https://doi.org/10.1007/978-3-030-26752-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26752-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26751-3

  • Online ISBN: 978-3-030-26752-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics