skip to main content
research-article

A Novel Lightweight Audio-visual Saliency Model for Videos

Authors Info & Claims
Published:27 February 2023Publication History
Skip Abstract Section

Abstract

Audio information has not been considered an important factor in visual attention models regardless of many psychological studies that have shown the importance of audio information in the human visual perception system. Since existing visual attention models only utilize visual information, their performance is limited but also requires high-computational complexity due to the limited information available. To overcome these problems, we propose a lightweight audio-visual saliency (LAVS) model for video sequences. To the best of our knowledge, this article is the first trial to utilize audio cues for an efficient deep-learning model for the video saliency estimation. First, spatial-temporal visual features are extracted by the lightweight receptive field block (RFB) with the bidirectional ConvLSTM units. Then, audio features are extracted by using an improved lightweight environment sound classification model. Subsequently, deep canonical correlation analysis (DCCA) aims at capturing the correspondence between audio and spatial-temporal visual features, thus obtaining a spatial-temporal auditory saliency. Lastly, the spatial-temporal visual and auditory saliency are fused to obtain the audio-visual saliency map. Extensive comparative experiments and ablation studies validate the performance of the LAVS model in terms of effectiveness and complexity.

REFERENCES

  1. [1] Andrew Galen, Arora Raman, Bilmes Jeff, and Livescu Karen. 2013. Deep canonical correlation analysis. In Proceedings of the International Conference on Machine Learning. PMLR, 12471255.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Aytar Yusuf, Vondrick Carl, and Torralba Antonio. 2016. Soundnet: Learning sound representations from unlabeled video. Advances in Neural Information Processing Systems 29, 1 (2016), 892–900.Google ScholarGoogle Scholar
  3. [3] Bak Cagdas, Kocak Aysun, Erdem Erkut, and Erdem Aykut. 2017. Spatio-temporal saliency networks for dynamic saliency prediction. IEEE Transactions on Multimedia 20, 7 (2017), 16881698.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Boccignone Giuseppe, Cuculo Vittorio, D’Amelio Alessandro, Grossi Giuliano, and Lanzarotti Raffaella. 2018. Give ear to my face: Modelling multimodal attention to social interactions. In Proceedings of the European Conference on Computer Vision. 00.Google ScholarGoogle Scholar
  5. [5] Borji Ali, Cheng Ming-Ming, Jiang Huaizu, and Li Jia. 2015. Salient object detection: A benchmark. IEEE Transactions on Image Processing 24, 12 (2015), 57065722.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Borji Ali and Itti Laurent. 2012. State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2012), 185207.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Borji Ali, Sihite Dicky N., and Itti Laurent. 2012. Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. IEEE Transactions on Image Processing 22, 1 (2012), 5569.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Bylinskii Zoya, Judd Tilke, Oliva Aude, Torralba Antonio, and Durand Frédo. 2018. What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 3 (2018), 740757.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Chen Yanxiang, Nguyen Tam V., Kankanhalli Mohan, Yuan Jun, Yan Shuicheng, and Wang Meng. 2014. Audio matters in visual attention. IEEE Transactions on Circuits and Systems for Video Technology 24, 11 (2014), 19922003.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Cheng Ming-Ming, Mitra Niloy J., Huang Xiaolei, Torr Philip H. S., and Hu Shi-Min. 2014. Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 3 (2014), 569582.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Cheng Zhiyong and Shen Jialie. 2016. On effective location-aware music recommendation. ACM Transactions on Information Systems 34, 2 (2016), 132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Cheng Zhiyong, Shen Jialie, Nie Liqiang, Chua Tat-Seng, and Kankanhalli Mohan. 2017. Exploring user-specific information in music retrieval. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 655664.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Cornia Marcella, Baraldi Lorenzo, Serra Giuseppe, and Cucchiara Rita. 2018. Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Transactions on Image Processing 27, 10 (2018), 51425154.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Coutrot Antoine and Guyader Nathalie. 2014. An audio-visual attention model for natural conversation scenes. In Proceedings of the 2014 IEEE International Conference on Image Processing. IEEE, 11001104.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Coutrot Antoine and Guyader Nathalie. 2014. How saliency, faces, and sound influence gaze in dynamic social scenes. Journal of Vision 14, 8 (2014), 55.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Coutrot Antoine and Guyader Nathalie. 2015. An efficient audio-visual saliency model to predict eye positions when looking at conversations. In Proceedings of the 2015 23rd European Signal Processing Conference. IEEE, 15311535.Google ScholarGoogle Scholar
  17. [17] Coutrot Antoine and Guyader Nathalie. 2016. Multimodal saliency models for videos. In Proceedings of the From Human Attention to Computational Attention. Springer, 291304.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Coutrot Antoine, Guyader Nathalie, Ionescu Gelu, and Caplier Alice. 2012. Influence of soundtrack on eye movements during video exploration. Journal of Eye Movement Research (2012).Google ScholarGoogle Scholar
  19. [19] Dang Tung, Papachristos Christos, and Alexis Kostas. 2018. Visual saliency-aware receding horizon autonomous exploration with application to aerial robotics. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation. IEEE, 25262533.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Davis Steven and Mermelstein Paul. 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 28, 4 (1980), 357366.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Deng Zijun, Hu Xiaowei, Zhu Lei, Xu Xuemiao, Qin Jing, Han Guoqiang, and Heng Pheng-Ann. 2018. R3net: Recurrent residual refinement network for saliency detection. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI, 684690.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Fan Hehe, Zhu Linchao, and Yang Yi. 2019. Cubic LSTMs for video prediction. In Proceedings of the AAAI Conference on Artificial Intelligence. 82638270.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Gygli Michael, Grabner Helmut, Riemenschneider Hayko, and Gool Luc Van. 2014. Creating summaries from user videos. In Proceedings of the European Conference on Computer Vision. Springer, 505520.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Hadizadeh Hadi and Bajić Ivan V.. 2013. Saliency-aware video compression. IEEE Transactions on Image Processing 23, 1 (2013), 1933.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] He Sen, Tavakoli Hamed R., Borji Ali, Mi Yang, and Pugeault Nicolas. 2019. Understanding and visualizing deep visual saliency models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1020610215.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Huang Xun, Shen Chengyao, Boix Xavier, and Zhao Qi. 2015. Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 262270.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Jiang Lai, Xu Mai, Liu Tie, Qiao Minglang, and Wang Zulin. 2018. Deepvs: A deep learning based video saliency prediction approach. In Proceedings of the European Conference on Computer Vision. 602617.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Koutras Petros and Maragos Petros. 2015. A perceptually based spatio-temporal computational framework for visual saliency estimation. Signal Processing: Image Communication 38, 5 (2015), 1531.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Kovács Gábor, Kunii Yasuharu, Maeda Takao, and Hashimoto Hideki. 2019. Saliency and spatial information-based landmark selection for mobile robot navigation in natural environments. Advanced Robotics 33, 10 (2019), 520535.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Kroner Alexander, Senden Mario, Driessens Kurt, and Goebel Rainer. 2020. Contextual encoder-decoder network for visual saliency prediction. Neural Networks 129, 8 (2020), 261–270.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Kruthiventi Srinivas S. S., Ayush Kumar, and Babu R. Venkatesh. 2017. Deepfix: A fully convolutional neural network for predicting human eye fixations. IEEE Transactions on Image Processing 26, 9 (2017), 44464456.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Kümmerer M., Theis L., and Bethge M.. 2014. Deep gaze I: Boosting saliency prediction with feature maps trained on ImageNet. In Proceedings of the International Conference on Learning Representations. 112.Google ScholarGoogle Scholar
  33. [33] Kummerer Matthias, Wallis Thomas S. A., Gatys Leon A., and Bethge Matthias. 2017. Understanding low-and high-level contributions to fixation prediction. In Proceedings of the IEEE International Conference on Computer Vision. 47894798.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Lai Qiuxia, Wang Wenguan, Sun Hanqiu, and Shen Jianbing. 2019. Video saliency prediction using spatiotemporal residual attentive networks. IEEE Transactions on Image Processing 29, 10 (2019), 11131126.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Lee Gayoung, Tai Yu-Wing, and Kim Junmo. 2016. Deep saliency with encoded low level distance map and high level features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 660668.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Li Guanbin and Yu Yizhou. 2015. Visual saliency based on multi-scale deep features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 54555463.Google ScholarGoogle Scholar
  37. [37] Linardos Panagiotis, Mohedano Eva, Nieto Juan Jose, O’Connor Noel, Nieto Xavier Giró, and McGuinness Kevin. 2019. Simple vs complex temporal recurrences for video saliency prediction. In Proceedings of the 30th British Machine Vision Conference. 112.Google ScholarGoogle Scholar
  38. [38] Liu Nian, Han Junwei, and Yang Ming-Hsuan. 2018. Picanet: Learning pixel-wise contextual attention for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 30893098.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Songtao Liu, Di Huang, and Yunbo Wang. 2018. Receptive feld block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision. 385–400.Google ScholarGoogle Scholar
  40. [40] Long Jonathan, Shelhamer Evan, and Darrell Trevor. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 34313440.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Majd Mahshid and Safabakhsh Reza. 2019. A motion-aware ConvLSTM network for action recognition. Applied Intelligence 49, 7 (2019), 25152521.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Marat Sophie, Guironnet Mickäel, and Pellerin Denis. 2007. Video summarization using a visual attention model. In Proceedings of the 2007 15th European Signal Processing Conference. IEEE, 17841788.Google ScholarGoogle Scholar
  43. [43] McGurk Harry and MacDonald John. 1976. Hearing lips and seeing voices. Nature 264, 5588 (1976), 746748.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Min Kyle and Corso Jason J.. 2019. TASED-net: Temporally-aggregating spatial encoder-decoder network for video saliency detection. In Proceedings of the IEEE International Conference on Computer Vision. 23942403.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Min Xiongkuo, Zhai Guangtao, Gu Ke, and Yang Xiaokang. 2016. Fixation prediction through multimodal analysis. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 1 (2016), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Mital Parag K., Smith Tim J., Hill Robin L., and Henderson John M.. 2011. Clustering of gaze during dynamic scene viewing is predicted by motion. Cognitive Computation 3, 1 (2011), 524.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O’Connor, Jordi Torres, Elisa Sayrol, Xavier, and Giro-iNieto. 2017. SalGAN: Visual saliency prediction with generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Scene Understanding Workshop 2017.Google ScholarGoogle Scholar
  48. [48] Pan Junting, Sayrol Elisa, Nieto Xavier Giro-i, McGuinness Kevin, and O’Connor Noel E.. 2016. Shallow and deep convolutional networks for saliency prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 598606.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Rahman Tanzila, Yang Mengyu, and Sigal Leonid. 2021. TriBERT: Human-centric audio-visual representation learning. Advances in Neural Information Processing Systems 34, 10 (2021), 97749787.Google ScholarGoogle Scholar
  50. [50] Ramenahalli Sudarshan, Mendat Daniel R., Dura-Bernal Salvador, Culurciello Eugenio, Niebur Ernst, and Andreou Andreas. 2013. Audio-visual saliency map: Overview, basic models and hardware implementation. In Proceedings of the 2013 47th Annual Conference on Information Sciences and Systems. IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Ratajczak Rémi, Pellerin Denis, Labourey Quentin, and Garbay Catherine. 2016. A fast audio-visual attention model for human detection and localization on a companion robot. In Proceedings of the 1st International Conference on Applications and Systems of Visual Paradigms.Google ScholarGoogle Scholar
  52. [52] Rebuffi Sylvestre-Alvise, Fong Ruth, Ji Xu, and Vedaldi Andrea. 2020. There and back again: Revisiting backpropagation saliency methods. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 88398848.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Ruesch Jonas, Lopes Manuel, Bernardino Alexandre, Hornstein Jonas, Santos-Victor José, and Pfeifer Rolf. 2008. Multimodal saliency-based bottom-up attention a framework for the humanoid robot icub. In Proceedings of the 2008 IEEE International Conference on Robotics and Automation. IEEE, 962967.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Schauerte Boris, Kühn Benjamin, Kroschel Kristian, and Stiefelhagen Rainer. 2011. Multimodal saliency-based attention for object-based scene analysis. In Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 11731179.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Schörkhuber Christian and Klapuri Anssi. 2010. Constant-Q transform toolbox for music processing. In Proceedings of the 7th Sound and Music Computing Conference, Barcelona, Spain. 364.Google ScholarGoogle Scholar
  56. [56] Shao Yang, Jin Zhaozhang, Wang DeLiang, and Srinivasan Soundararajan. 2009. An auditory-based feature for robust speech recognition. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, 46254628.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Sharma Jivitesh, Granmo Ole-Christoffer, and Goodwin Morten. 2019. Environment sound classification using multiple feature channels and attention based deep convolutional neural network. In Proceeding of the Interspeech, 1186–1190.Google ScholarGoogle Scholar
  58. [58] Sidaty Naty, Larabi Mohamed-Chaker, and Saadane Abdelhakim. 2017. Toward an audio-visual attention model for multimodal video content. Neurocomputing 259, 10 (2017), 94111.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Sitzmann Vincent, Serrano Ana, Pavel Amy, Agrawala Maneesh, Gutierrez Diego, Masia Belen, and Wetzstein Gordon. 2018. Saliency in VR: How do people explore virtual environments? IEEE Transactions on Visualization and Computer Graphics 24, 4 (2018), 16331642.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Song Guanghan, Pellerin Denis, and Granjon Lionel. 2013. Different types of sounds influence gaze differently in videos. Journal of Eye Movement Research 6, 4 (2013).Google ScholarGoogle Scholar
  61. [61] Tsiami Antigoni, Koutras Petros, Katsamanis Athanasios, Vatakis Argiro, and Maragos Petros. 2019. A behaviorally inspired fusion approach for computational audio-visual saliency modeling. Signal Processing: Image Communication 76, 10 (2019), 186200.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Tsiami Antigoni, Koutras Petros, and Maragos Petros. 2020. Stavis: Spatio-temporal audiovisual saliency network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 47664776.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Burg Erik Van der, Olivers Christian N. L., Bronkhorst Adelbert W., and Theeuwes Jan. 2008. Pip and pop: Nonspatial auditory signals improve spatial visual search. Journal of Experimental Psychology: Human Perception and Performance 34, 5 (2008), 1053.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Vig Eleonora, Dorr Michael, and Cox David. 2014. Large-scale optimization of hierarchical features for saliency prediction in natural images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 27982805.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Wang Lijun, Lu Huchuan, Ruan Xiang, and Yang Ming-Hsuan. 2015. Deep networks for saliency detection via local estimation and global search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 31833192.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Wang Tiantian, Borji Ali, Zhang Lihe, Zhang Pingping, and Lu Huchuan. 2017. A stagewise refinement model for detecting salient objects in images. In Proceedings of the IEEE International Conference on Computer Vision. 40194028.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Wang Wenguan and Shen Jianbing. 2017. Deep visual attention prediction. IEEE Transactions on Image Processing 27, 5 (2017), 23682378.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. [68] Wang Wenguan, Shen Jianbing, Guo Fang, Cheng Ming-Ming, and Borji Ali. 2018. Revisiting video saliency: A large-scale benchmark and a new model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 48944903.Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Wang Wenguan, Shen Jianbing, and Shao Ling. 2017. Video salient object detection via fully convolutional networks. IEEE Transactions on Image Processing 27, 1 (2017), 3849.Google ScholarGoogle ScholarCross RefCross Ref
  70. [70] Wang Yunbo, Jiang Lu, Yang Ming-Hsuan, Li Li-Jia, Long Mingsheng, and Fei-Fei Li. 2018. Eidetic 3d lstm: A model for video prediction and beyond. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  71. [71] Yamada Kentaro, Sugano Yusuke, Okabe Takahiro, Sato Yoichi, Sugimoto Akihiro, and Hiraki Kazuo. 2011. Attention prediction in egocentric video using motion and visual saliency. In Proceedings of the Pacific-Rim Symposium on Image and Video Technology. Springer, 277288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. [72] Yan Qiong, Xu Li, Shi Jianping, and Jia Jiaya. 2013. Hierarchical saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11551162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. [73] Yang Chuan, Zhang Lihe, Lu Huchuan, Ruan Xiang, and Yang Ming-Hsuan. 2013. Saliency detection via graph-based manifold ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 31663173.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. [74] Yang Sheng, Lin Guosheng, Jiang Qiuping, and Lin Weisi. 2019. A dilated inception network for visual saliency prediction. IEEE Transactions on Multimedia 22, 8 (2019), 2163–2176.Google ScholarGoogle Scholar
  75. [75] Yao Shunyu, Min Xiongkuo, and Zhai Guangtao. 2021. Deep audio-visual fusion neural network for saliency estimation. In Proceedings of the 2021 IEEE International Conference on Image Processing. IEEE, 16041608.Google ScholarGoogle ScholarCross RefCross Ref
  76. [76] Yuan Xia, Yue Juan, and Zhang Yanan. 2018. RGB-D saliency detection: Dataset and algorithm for robot vision. In Proceedings of the 2018 IEEE International Conference on Robotics and Biomimetics. IEEE, 10281033.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. [77] Zeng Donghuo, Yu Yi, and Oyama Keizo. 2020. Deep triplet neural networks with cluster-cca for audio-visual cross-modal retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 3 (2020), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. [78] Zhang Ziheng, Xu Yanyu, Yu Jingyi, and Gao Shenghua. 2018. Saliency detection in 360 videos. In Proceedings of the European Conference on Computer Vision. 488503.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. [79] Zhu Dandan, Chen Yongqing, Han Tian, Zhao Defang, Zhu Yucheng, Zhou Qiangqiang, Zhai Guangtao, and Yang Xiaokang. 2020. Ransp: Ranking attention network for saliency prediction on omnidirectional images. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo. IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  80. [80] Zhu Wangjiang, Liang Shuang, Wei Yichen, and Sun Jian. 2014. Saliency optimization from robust background detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 28142821.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Novel Lightweight Audio-visual Saliency Model for Videos

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 4
      July 2023
      263 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3582888
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 February 2023
      • Online AM: 16 December 2022
      • Accepted: 7 December 2022
      • Revised: 5 November 2022
      • Received: 1 May 2022
      Published in tomm Volume 19, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format