research-article

No-Reference Video Quality Assessment with Heterogeneous Knowledge Ensemble

Authors:
Jinjian Wu

Xidian University, Xi'an, China

Xidian University, Xi'an, China
View Profile

,
Yongxu Liu

Xidian University, Xi'an, China

Xidian University, Xi'an, China
View Profile

,
Leida Li

Xidian University, Xi'an, China

Xidian University, Xi'an, China
View Profile

,
Weisheng Dong

Xidian University, Xi'an, China

Xidian University, Xi'an, China
View Profile

,
Guangming Shi

Xidian University, Xi'an, China

Xidian University, Xi'an, China
View Profile

MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021Pages 4174–4182https://doi.org/10.1145/3474085.3475550

Published:17 October 2021Publication History

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 4174–4182

ABSTRACT

Blind assessment of video quality is still challenging even in this deep learning era. The limited number of samples in existing databases is insufficient to learn a good feature extractor for video quality assessment (VQA), while manually labeling a larger database with subjective perception is very labor-intensive and time-consuming. To relieve such difficulty, we first collect 3589 high-quality video clips as the reference and build a large VQA dataset. The dataset contains more than 300K samples degraded by various distortion types due to compression and transmission error, and provides weak labels for each distorted sample with several full-reference VQA algorithms. To learn effective representation from the weakly labeled data, we alleviate the bias of single weak label (i.e., single knowledge) via learning from multiple heterogeneous knowledge. To this end, we propose a novel no-reference VQA (NR-VQA) method with HEterogeneous Knowledge Ensemble (HEKE). Comparing to learning from single knowledge, HEKE can theoretically reach a lower infimum, and learn richer representation due to the heterogeneity. Extensive experimental results show that the proposed HEKE outperforms existing NR-VQA methods, and achieves the state-of-the-art performance. The source code will be available at https://github.com/Sissuire/BVQA-HEKE.

References

S. Ahn and S. Lee. 2018. Deep blind video quality assessment based on temporal human perception. In 2018 25th IEEE Int. Conf. Image Process. (ICIP). 619--623. https://doi.org/10.1109/ICIP.2018.8451450Google Scholar
G. Brown, J. L. Wyatt, and P. Tivno. 2005. Managing diversity in regression ensembles. J. Mach. Learn. Res., Vol. 6, 55 (2005), 1621--1650. Google ScholarDigital Library
P. Chen, L. Li, L. Ma, J. Wu, and G. Shi. 2020. RIRNet: Recurrent-in-recurrent network for video quality assessment. In Proceedings of the 28th ACM Int. Conf. Multimedia (Seattle, WA, USA) (MM '20). ACM, New York, NY, USA, 834--842. https://doi.org/10.1145/3394171.3413717 Google ScholarDigital Library
Cisco 2020. Cisco annual internet report (2018?2023) white paper. Retrieved March 9, 2020 from https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.htmlGoogle Scholar
F. De Simone, M. Naccari, M. Tagliasacchi, F. Dufaux, S. Tubaro, and T. Ebrahimi. 2009. Subjective assessment of H.264/AVC video sequences transmitted over a noisy channel. In 2009 Int. Workshop Quality Multimedia Exper. 204--209. https://doi.org/10.1109/QOMEX.2009.5246952Google ScholarCross Ref
FFmpeg Developers. 2021. FFmpeg tool. http://ffmpeg.org/ Retrieved April, 8, 2021 fromGoogle Scholar
S. Geman, E. Bienenstock, and R. Doursat. 1992. Neural networks and the bias/variance dilemma. Neural Comput., Vol. 4, 1 (1992), 1--58. https://doi.org/10.1162/neco.1992.4.1.1 Google ScholarDigital Library
V. Hosu, F. Hahn, M. Jenadeleh, H. Lin, H. Men, T. Szirányi, S. Li, and D. Saupe. 2017. The Konstanz natural video database (KoNViD-1k). In 2017 Ninth Int. Conf. Quality Multimedia Exper. (QoMEX). 1--6. https://doi.org/10.1109/QoMEX.2017.7965673Google ScholarCross Ref
J. Korhonen. 2019. Two-level approach for no-reference consumer video quality assessment. IEEE Trans. Image Process., Vol. 28, 12 (2019), 5923--5938. https://doi.org/10.1109/TIP.2019.2923051Google ScholarCross Ref
D. Li, T. Jiang, and M. Jiang. 2019. Quality assessment of in-the-wild videos. In Proceedings of the 27th ACM Int. Conf. Multimedia (Nice, France) (MM '19). ACM, New York, NY, USA, 2351--2359. https://doi.org/10.1145/3343031.3351028 Google ScholarDigital Library
X. Li, Q. Guo, and X. Lu. 2016. Spatiotemporal statistics for video quality assessment. IEEE Trans. Image Process., Vol. 25, 7 (2016), 3329--3342. https://doi.org/10.1109/TIP.2016.2568752Google ScholarDigital Library
W. Liu, Z. Duanmu, and Z. Wang. 2018. End-to-end blind quality assessment of compressed videos using deep neural networks. In Proceedings of the 26th ACM Int. Conf. Multimedia (Seoul, Republic of Korea) (MM '18). ACM, New York, NY, USA, 546--554. https://doi.org/10.1145/3240508.3240643 Google ScholarDigital Library
K. Manasa and S. S. Channappayya. 2016. An optical flow-based no-reference video quality assessment algorithm. In 2016 IEEE Int. Conf. Image Process. (ICIP). 2400--2404. https://doi.org/10.1109/ICIP.2016.7532789Google ScholarCross Ref
A. Mittal, M. A. Saad, and A. C. Bovik. 2016. A completely blind video integrity oracle. IEEE Trans. Image Process., Vol. 25, 1 (2016), 289--300. https://doi.org/10.1109/TIP.2015.2502725Google ScholarDigital Library
A. Mittal, R. Soundararajan, and A. C. Bovik. 2013. Making a "completely blind" image quality analyzer. IEEE Signal Process. Lett., Vol. 20, 3 (2013), 209--212. https://doi.org/10.1109/LSP.2012.2227726Google ScholarCross Ref
A. K. Moorthy, L. K. Choi, A. C. Bovik, and G. de Veciana. 2012. Video quality assessment on mobile devices: subjective, behavioral and objective studies. IEEE J. Sel. Top. Signal Process., Vol. 6, 6 (2012), 652--671. https://doi.org/10.1109/JSTSP.2012.2212417Google ScholarCross Ref
Y. Pitrey, M. Barkowsky, R. Pépion, P. Le Callet, and H. Hlavacs. 2012. Influence of the source content and encoding configuration on the perceived quality for scalable video coding. In Human Vis. Electron. Imag. XVII, Vol. 8291. SPIE, 460--467. https://doi.org/10.1117/12.912180Google Scholar
S. V. Reddy Dendi and S. S. Channappayya. 2020. No-reference video quality assessment using natural spatiotemporal scene statistics. IEEE Trans. Image Process., Vol. 29 (2020), 5612--5624. https://doi.org/10.1109/TIP.2020.2984879Google ScholarCross Ref
M. A. Saad, A. C. Bovik, and C. Charrier. 2014. Blind prediction of natural video quality. IEEE Trans. Image Process., Vol. 23, 3 (2014), 1352--1365. https://doi.org/10.1109/TIP.2014.2299154 Google ScholarDigital Library
K. Seshadrinathan and A. C. Bovik. 2010. Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans. Image Process., Vol. 19, 2 (2010), 335--350. https://doi.org/10.1109/TIP.2009.2034992 Google ScholarDigital Library
K. Seshadrinathan, R. Soundararajan, A. C. Bovik, and L. K. Cormack. 2010. Study of subjective and objective quality assessment of video. IEEE Trans. Image Process., Vol. 19, 6 (2010), 1427--1441. Google ScholarDigital Library
Z. Sinno and A. C. Bovik. 2019. Large-scale study of perceptual video quality. IEEE Trans. Image Process., Vol. 28, 2 (2019), 612--627. https://doi.org/10.1109/TIP.2018.2869673Google ScholarDigital Library
R. Soundararajan and A. C. Bovik. 2013. Video quality assessment by reduced reference spatio-temporal entropic differencing. IEEE Trans. Circuits Syst. Video Technol., Vol. 23, 4 (2013), 684--694. https://doi.org/10.1109/TCSVT.2012.2214933 Google ScholarDigital Library
D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, and M. Paluri. 2018. A closer look at spatiotemporal convolutions for action recognition. In 2018 IEEE/CVF Conf. Comput. Vis. Pattern Recog. (CVPR). 6450--6459. https://doi.org/10.1109/CVPR.2018.00675Google Scholar
Z. Tu, Y. Wang, N. Birkbeck, B. Adsumilli, and A. C. Bovik. 2021. UGC-VQA: Benchmarking blind video quality assessment for user generated content. IEEE Trans. Image Process., Vol. 30 (2021), 4449--4464. https://doi.org/10.1109/TIP.2021.3072221Google ScholarCross Ref
P. V. Vu and D. M. Chandler. 2014. ViS3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices. J. Electron. Imag., Vol. 23 (Feb. 2014), 013016.Google ScholarCross Ref
Z. Wang, E. P. Simoncelli, and A. C. Bovik. 2003. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conf. Signals, Syst. Comput., Vol. 2. 1398--1402 Vol.2. https://doi.org/10.1109/ACSSC.2003.1292216Google Scholar
J. Wu, Y. Liu, W. Dong, G. Shi, and W. Lin. 2019. Quality assessment for video with degradation along salient trajectories. IEEE Trans. Multimedia, Vol. 21, 11 (Nov 2019), 2738--2749. https://doi.org/10.1109/TMM.2019.2908377Google ScholarCross Ref
J. Wu, J. Ma, F. Liang, W. Dong, G. Shi, and W. Lin. 2020. End-to-end blind image quality prediction with cascaded deep neural network. IEEE Trans. Image Process., Vol. 29 (2020), 7414--7426. https://doi.org/10.1109/TIP.2020.3002478Google ScholarDigital Library
J. Xu, P. Ye, Y. Liu, and D. Doermann. 2014. No-reference video quality assessment via feature learning. In 2014 IEEE Int. Conf. Image Proce. (ICIP). 491--495. https://doi.org/10.1109/ICIP.2014.7025098Google ScholarCross Ref
W. Xue, L. Zhang, X. Mou, and A. C. Bovik. 2014. Gradient magnitude similarity deviation: a highly efficient perceptual image quality index. IEEE Trans. Image Process., Vol. 23, 2 (2014), 684--695. Google ScholarDigital Library
J. You and J. Korhonen. 2019. Deep neural networks for no-reference video quality assessment. In 2019 IEEE Int. Conf. Image Process. (ICIP). 2349--2353. https://doi.org/10.1109/ICIP.2019.8803395Google ScholarCross Ref
F. Zhang, S. Li, L. Ma, Y. C. Wong, and K. N. Ngan. 2011. IVP subjective quality video database. The Chinese University of Hong Kong, http://ivp. ee. cuhk. edu. hk/research/database/subjective(2011).Google Scholar
L. Zhang, Z. Shi, M. M. Cheng, Y. Liu, J. W. Bian, J. T. Zhou, G. Zheng, and Z. Zeng. 2021. Nonlinear regression via deep negative correlation learning. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 43, 3 (2021), 982--998. https://doi.org/10.1109/TPAMI.2019.2943860Google ScholarCross Ref
Y. Zhang, X. Gao, L. He, W. Lu, and R. He. 2019. Blind video quality assessment with weakly supervised learning and resampling strategy. IEEE Trans. Circuits Syst. Video Technol., Vol. 29, 8 (2019), 2244--2255. https://doi.org/10.1109/TCSVT.2018.2868063Google ScholarCross Ref
K. Zhu, C. Li, V. Asari, and D. Saupe. 2015. No-reference video quality assessment based on artifact measurement and statistical analysis. IEEE Trans. Circuits Syst. Video Technol., Vol. 25, 4 (2015), 533--546. https://doi.org/10.1109/TCSVT.2014.2363737Google ScholarDigital Library

Index Terms

No-Reference Video Quality Assessment with Heterogeneous Knowledge Ensemble
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
  2. Computer graphics
    1. Image manipulation
      1. Image processing

Recommendations

Long Short-term Convolutional Transformer for No-Reference Video Quality Assessment
MM '21: Proceedings of the 29th ACM International Conference on Multimedia

No-reference video quality assessment has not been widely benefited from deep learning, mainly due to the complexity, diversity and particularity of modelling spatial and temporal characteristics in quality assessment scenario. Image quality assessment (...
Read More
Neural network solution for a real-time no-reference video quality assessment of H.264/AVC video bitstreams
Abstract
The ever-growing video streaming services require accurate quality assessment with often no reference to the original media. One primary challenge in developing no-reference (NR) video quality metrics is achieving real-timeliness while retaining ...
Read More
Compressed-domain-based no-reference video quality assessment model considering fast motion and scene change

Due to the variability of wireless channel state, video quality monitoring became very important for guaranteeing users' Quality of Experience (QoE). QoE presents the overall perceptual quality of service from the subjective users' perspective. However, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
no-reference
spatiotemporal representation
video quality assessment
weak label
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 222
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

No-Reference Video Quality Assessment with Heterogeneous Knowledge Ensemble

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Long Short-term Convolutional Transformer for No-Reference Video Quality Assessment

Neural network solution for a real-time no-reference video quality assessment of H.264/AVC video bitstreams

Compressed-domain-based no-reference video quality assessment model considering fast motion and scene change