Дослідження методів пошуку ключових кадрів у відеопотоці з використанням нейронних мереж для систем пошуку – Вісник Хмельницького національного університету

ДОСЛІДЖЕННЯ МЕТОДІВ ПОШУКУ КЛЮЧОВИХ КАДРІВ У ВІДЕОПОТОЦІ З ВИКОРИСТАННЯМ НЕЙРОННИХ МЕРЕЖ ДЛЯ СИСТЕМ ПОШУКУ

RESEARCH OF METHODS OF SEARCHING KEY FRAMES IN VIDEO FLOW WITH THE USE OF NEURAL NETWORKS FOR SEARCH SYSTEMS

Сторінки: 55–60. Номер: №3, 2022 (309)
Автори:
МЕЛЬНИКОВА Н. І.
Національний університет ”Львівська Політехніка”
https://orcid.org/0000-0002-2114-3436
e-mail: melnykovanatalia@gmail.com
ПОБЕРЕЙКО П. Б.
Національний університет ”Львівська Політехніка”
https://orcid.org/0000-0002-8884-1255
e-mail: pobereyko.petro26@gmail.com
Nataliia MELNYKOVA, Petro POBEREIKO
Lviv Polytechnic National University
DOI: https://www.doi.org/10.31891/2307-5732-2022-309-3-55-60
Анотація мовою оригіналу
У роботі викладено порівняльний аналіз сучасних досліджень в області аналізу відеоконтенту і на його основі встановлено, що ефективними методами аналізу цих даних є методи визначення ключових кадрів у відеопотоці. Особливо цінними є методи порівняння та пошуку збігів кадрів (фрагментів), а саме: методи пошуку послідовностей (виявлення об’єктів чи певних дій на кадрах); методи класифікації (визначення вмісту кадрів та розподілення їх до певних категорій); методи декодування кадрів (опис характеристик конкретного зображення) та методи виявлення аномалій у відеопотоці (пошук об’єктів, символів, які є унікальними властивостями фрагмента відносно інших). Особливо перспективними є методи засновані на технологіях машинного навчання, реалізація яких полягає у моделюванні тимчасових залежностей змінного діапазону з використанням загорткових нейронних мереж та функцій із спеціальними механізмами “уваги”. Показано, що саме розвиток цих методів сприяє стрімкому розвитку інформаційних систем, за допомогою яких можна успішно здійснити аналіз відеоконтенту та розпізнати його оригінал.
Ключові слова: ключові кадри, нейронні мережі, навчання без вчителя, міра подібності

Розширена анотація англійською мовою

The paper presents a comparative analysis of current research in the field of data analysis in the format of video content and regarding it, that effective methods of analysis of these data are methods of search keyframes in the video stream. The analysis shows that the application of a method of processing visual data is determined by the structure of this data. Therefore, in order to simplify their analysis, they were divided into the following categories: consistent comparison; global comparison, based on clustering, and those that use events or objects. Especially valuable are the methods of comparing and matching matches (fragments), namely: methods of sequence search (detection of objects or certain actions on frames); methods of classifications (determining the content of personnel and their distribution to certain categories); frame decoding methods (description of the characteristics of a particular image) and methods for detecting anomalies in the video stream (search for objects, characters that are unique properties of the fragment relative to others). It shows that the most optimal of the considered methods there are methods that are based on technologies of artificial intelligence and machine learning. And also shows the difference and efficiency of deep learning methods in relation to сonventional methods. Particularly promising are the methods, the implementation of which is to model the temporal dependences of the variable range using convolutional neural networks and functions with special attention mechanisms. Methods that use an Actor-Critic model embedded in a Generative adversarial network have also demonstrated their effectiveness. It is shown that the development of these methods contributes to the rapid development of information systems with which you can successfully analyze video content and recognize its origin.
Keywords: keyframes, neural network, unsupervised learning, similarity measure, generative adversarial networks

Література

Julien L., Olivier B., Valérie G., Nozha B. Robust voting algorithm based on labels of behavior for video copy detection. 14th ACM International Conference on Multimedia, Santa Barbara, CA, USA, October 23-27. 2006. DOI: https://doi.org/10.1145/1180639.1180826
Tang H., Liu H., Xiao W., Sebe N. Fast and robust dynamic hand gesture recognition via key frames extraction and feature fusion. Neurocomputing. 2019. Volume 331. P. 424-433. DOI: https://doi.org/10.1016/j.neucom.2018.11.038.
VáZquez–MartíN R., Bandera A. Spatio–temporal feature–based keyframe detection from video shots using spectral clustering. Pattern Recognit. Lett. 2013. Volume 34. P. 770–779.
Qu Z., Lin L., Gao T., Wang Y. An improved keyframe extraction method based on HSV color space. Journal of Software. 2013. Vol. 8, Iss. 7. P. 1751–1758.
Yang H., Wang B., Lin S., Wipf D., Guo M., Guo B. Unsupervised extraction of video highlights via robust recurrent auto–encoders. IEEE International Conference on Computer Vision, Santiago, Chile, 7-13 December 2015. P. 4633–4641.
Mahasseni B., Lam M., Todorovic S. Unsupervised video summarization with adversarial LSTM networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 21-26 July 2017. P. 1–10.
Jian M., Zhang S., Wu L., Zhang S., Wang X., He Y., Deep key frame extraction for sport training. Neurocomputing. 2019. Volume 328. P. 147–156.
Zhou K., Xiang T., Cavallaro A. Video Summarisation by Classification with Deep Reinforcement Learning. British Machine Vision Conf. (BMVC). 2018.
Lal S., Duggal S., Sreedevi I. Online video summarization: Predicting future to better summarize present. In IEEE Winter Conf. on Applications of Computer Vision (WACV). IEEE. 2019. P. 471–480.
Yuan Y., Li H., Wang Q. Spatiotemporal modeling for video summarization using convolutional recurrent neural network. IEEE Access. 2019. Volume 7. P. 676–685.
Elfeki M., Borji A. Video Summarization Via Actionness Ranking. IEEE Winter Conf. on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7-11 January 2019. P. 754–763.
Cho K., B. van Merrienboer, Gulcehre C., Bahdanau C., Bougares F., Schwenk H., Bengio Y. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics, October 2014. P. 1724–1734.
Mahasseni B., Lam M., Todorovic S. Unsupervised Video Summarization with Adversarial LSTM Networks. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). P. 2982–2991
Apostolidis E., Metsai A., Adamantidou E., Mezaris V., Patras I. A stepwise, label–based approach for improving the adversarial training in unsupervised video summarization. In Proc. of the 1st Int. Workshop on AI for Smart TV Content Production, Access and Delivery (AI4TV ’19). New York, NY, USA: ACM. 2019. P. 17–25.
Apostolidis E., Adamantidou E., Metsai A., Mezaris V., Patras I. AC–SUM–GAN: Connecting Actor–Critic and Generative Adversarial Networks for Unsupervised Video Summarization. IEEE Trans. on Circuits and Systems for Video Technology. 2020. P. 1.
He X., Hua Y., Song T., Zhang Z., Xue Z., Ma R., Robertson N., Guan H. Unsupervised Video Summarization with Attentive Conditional Generative Adversarial Networks. In Proc. of the 27th ACM Int. Conf. on Multimedia (MM ’19). New York, NY, USA: ACM. 2019. P. 2296–2304.
Graves A., Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks. 2005. Volume 18. P. 5- DOI: https://doi.org/10.1016/j.neunet.2005.06.042.
Zhou K., Qiao Y. Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity–Representativeness Reward. Proc. of the 2018 AAAI Conf. on Artificial Intelligence. 2018.
Li S., Li W., Cook C., Zhu C., Gao Y. Independently recurrent neural network (indrnn): Building a longer and deeper rnn. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). 2018. P. 5457–5466
Wang L., Xiong Z., Wang Z., Qiao Y., Lin D., Tang X., Van Gool L. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition,” in Europ. Conf. on Computer Vision (ECCV). Cham: Springer International Publishing. 2016. P. 20–36.
Zhang Y., Liang X., Zhang D., Tan M., Xing P. Unsupervised object–level video summarization with online motion auto–encoder. Pattern Recognition Letters 2018.

Post Author: Горященко Сергій