Multi-level Wavelet-Based Generative Adversarial Network for Perceptual Quality Enhancement of Compressed Video

Wang, Jianyi; Deng, Xin; Xu, Mai; Chen, Congyong; Song, Yuhang

doi:10.1007/978-3-030-58568-6_24

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12359))

Included in the following conference series:

European Conference on Computer Vision

4398 Accesses
29 Citations

Abstract

The past few years have witnessed fast development in video quality enhancement via deep learning. Existing methods mainly focus on enhancing the objective quality of compressed video while ignoring its perceptual quality. In this paper, we focus on enhancing the perceptual quality of compressed video. Our main observation is that enhancing the perceptual quality mostly relies on recovering high-frequency sub-bands in wavelet domain. Accordingly, we propose a novel generative adversarial network (GAN) based on multi-level wavelet packet transform (WPT) to enhance the perceptual quality of compressed video, which is called multi-level wavelet-based GAN (MW-GAN). In MW-GAN, we first apply motion compensation with a pyramid architecture to obtain temporal information. Then, we propose a wavelet reconstruction network with wavelet-dense residual blocks (WDRB) to recover the high-frequency details. In addition, the adversarial loss of MW-GAN is added via WPT to further encourage high-frequency details recovery for video frames. Experimental results demonstrate the superiority of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bampis, C.G., Li, Z., Moorthy, A.K., Katsavounidis, I., Aaron, A., Bovik, A.C.: Study of temporal effects on subjective video quality of experience. IEEE Trans. Image Process. (TIP) 26(11), 5217–5231 (2017)
Article MathSciNet Google Scholar
Blau, Y., Mechrez, R., Timofte, R., Michaeli, T., Zelnik-Manor, L.: The 2018 PIRM challenge on perceptual image super-resolution. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 334–355. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_21
Chapter Google Scholar
Blau, Y., Michaeli, T.: The perception-distortion tradeoff. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6228–6237 (2018)
Google Scholar
Cavigelli, L., Hager, P., Benini, L.: CAS-CNN: a deep convolutional neural network for image compression artifact suppression. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 752–759. IEEE (2017)
Google Scholar
Chang, H., Ng, M.K., Zeng, T.: Reducing artifacts in jpeg decompression via a learned dictionary. IEEE Trans. Sig. Process. 62(3), 718–728 (2013)
Article MathSciNet Google Scholar
Chu, M., Xie, Y., Leal-Taixé, L., Thuerey, N.: Temporally coherent GANs for video super-resolution (tecogan). arXiv preprint arXiv:1811.09393 (2018)
CVNI: Cisco visual networking index: global mobile data traffic forecast update, 2016–2021 white paper. https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/white-paper-c11-741490.html (2017)
Deng, X., Yang, R., Xu, M., Dragotti, P.L.: Wavelet domain style transfer for an effective perception-distortion tradeoff in single image super-resolution. In: Proceedings of the IEEE International Conference on Computer Vision (CVPR), pp. 3076–3085 (2019)
Google Scholar
Dong, C., Deng, Y., Change Loy, C., Tang, X.: Compression artifacts reduction by a deep convolutional network. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 576–584 (2015)
Google Scholar
Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 658–666 (2016)
Google Scholar
Foi, A., Katkovnik, V., Egiazarian, K.: Pointwise shape-adaptive DCT for high-quality denoising and deblocking of grayscale and color images. IEEE Trans. Image Process. (TIP) 16(5), 1395–1411 (2007)
Article MathSciNet Google Scholar
Guan, Z., Xing, Q., Xu, M., Yang, R., Liu, T., Wang, Z.: MFQE 2.0: a new approach for multi-frame quality enhancement on compressed video. IEEE Trans. Pattern Anal. Machine Intelligence (TPAMI), p. 1 (2019)
Google Scholar
Guo, J., Chao, H.: Building dual-domain representations for compression artifacts reduction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_38
Chapter Google Scholar
Jancsary, J., Nowozin, S., Rother, C.: Loss-specific training of non-parametric image restoration models: a new state of the art. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 112–125. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33786-4_9
Chapter Google Scholar
Jawerth, B., Sweldens, W.: An overview of wavelet based multiresolution analyses. SIAM Rev. 36(3), 377–412 (1994)
Article MathSciNet Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Jung, C., Jiao, L., Qi, H., Sun, T.: Image deblocking via sparse representation. Sig. Process. Image Commun. 27(6), 663–677 (2012)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2015)
Google Scholar
Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep laplacian pyramid networks for fast and accurate super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 624–632 (2017)
Google Scholar
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4681–4690 (2017)
Google Scholar
Li, K., Bare, B., Yan, B.: An efficient deep convolutional neural networks model for compressed image deblocking. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 1320–1325. IEEE (2017)
Google Scholar
Li, S., Xu, M., Deng, X., Wang, Z.: Weight-based r-\(\lambda \) rate control for perceptual HEVC coding on conversational videos. Sig. Process. Image Commun. 38, 127–140 (2015)
Article Google Scholar
Liew, A.C., Yan, H.: Blocking artifacts suppression in block-coded images using overcomplete wavelet representation. IEEE Trans. Circuits Syst. Video Technol. (TCSVT) 14(4), 450–461 (2004)
Article Google Scholar
Mallat, S.: A Wavelet Tour of Signal Processing. Elsevier, New York (1999)
MATH Google Scholar
Mao, X., et al.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision (CVPR), pp. 2794–2802 (2017)
Google Scholar
Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015)
Mechrez, R., Talmi, I., Shama, F., Zelnik-Manor, L.: Maintaining natural image statistics with the contextual loss. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 427–443. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20893-6_27
Chapter Google Scholar
Meng, X., et al.: MGANET: a robust model for quality enhancement of compressed video. arXiv preprint arXiv:1811.09150 (2018)
Ohm, J.R., Sullivan, G.J., Schwarz, H., Tan, T.K., Wiegand, T.: Comparison of the coding efficiency of video coding standards including high efficiency video coding (HEVC). IEEE Trans. Circuits Syst. Video Technol. (TCSVT) 22(12), 1669–1684 (2012)
Article Google Scholar
Sajjadi, M.S., Scholkopf, B., Hirsch, M.: Enhancenet: single image super-resolution through automated texture synthesis. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4491–4500 (2017)
Google Scholar
Seshadrinathan, K., Soundararajan, R., Bovik, A.C., Cormack, L.K.: Study of subjective and objective quality assessment of video. IEEE Transactions on Image Processing (TIP) 19(6), 1427–1441 (2010)
Article MathSciNet Google Scholar
Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. (TCSVT) 22(12), 1649–1668 (2012)
Article Google Scholar
Tai, Y., Yang, J., Liu, X., Xu, C.: MEMNET: a persistent memory network for image restoration. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4539–4547 (2017)
Google Scholar
Wang, T., Chen, M., Chao, H.: A novel deep learning-based method of improving coding efficiency from the decoder-end for HEVC. In: 2017 Data Compression Conference (DCC), pp. 410–419. IEEE (2017)
Google Scholar
Wang, X., Yu, K., Dong, C., Change Loy, C.: Recovering realistic texture in image super-resolution by deep spatial feature transform. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 606–615 (2018)
Google Scholar
Wang, X., et al.: ESRGAN: enhanced super-resolution generative adversarial networks. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 63–79. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_5
Chapter Google Scholar
Wang, Z., Liu, D., Chang, S., Ling, Q., Yang, Y., Huang, T.S.: D3: deep dual-domain based fast restoration of jpeg-compressed images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2764–2772 (2016)
Google Scholar
Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems (NIPS), pp. 802–810 (2015)
Google Scholar
Yang, R., Sun, X., Xu, M., Zeng, W.: Quality-gated convolutional LSTM for enhancing compressed video. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 532–537. IEEE (2019)
Google Scholar
Yang, R., Xu, M., Liu, T., Wang, Z., Guan, Z.: Enhancing quality for HEVC compressed videos. IEEE Trans. Circuits Syst. Video Technol. (TCSVT) (2018)
Google Scholar
Yang, R., Xu, M., Wang, Z.: Decoder-side HEVC quality enhancement with scalable convolutional neural network. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 817–822. IEEE (2017)
Google Scholar
Yang, R., Xu, M., Wang, Z., Li, T.: Multi-frame quality enhancement for compressed video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6664–6673 (2018)
Google Scholar
Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Trans. Image Process. (TIP) 26(7), 3142–3155 (2017)
Article MathSciNet Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 586–595 (2018)
Google Scholar

Download references

Acknowledgement

This work was supported by the NSFC under Project 61876013, Project 61922009, and Project 61573037.

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Beihang University, Beijing, China
Jianyi Wang & Mai Xu
School of Cyber Science and Technology, Beihang University, Beijing, China
Xin Deng
College of Software, Beihang University, Beijing, China
Congyong Chen
Hangzhou Innovation Institute, Beihang University, Zhejiang, China
Mai Xu
Department of Computer Science, University of Oxford, Oxford, UK
Yuhang Song

Authors

Jianyi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Deng
View author publications
You can also search for this author in PubMed Google Scholar
Mai Xu
View author publications
You can also search for this author in PubMed Google Scholar
Congyong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuhang Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mai Xu .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 610 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Deng, X., Xu, M., Chen, C., Song, Y. (2020). Multi-level Wavelet-Based Generative Adversarial Network for Perceptual Quality Enhancement of Compressed Video. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12359. Springer, Cham. https://doi.org/10.1007/978-3-030-58568-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-58568-6_24
Published: 13 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58567-9
Online ISBN: 978-3-030-58568-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics