Abstract
The past few years have witnessed fast development in video quality enhancement via deep learning. Existing methods mainly focus on enhancing the objective quality of compressed video while ignoring its perceptual quality. In this paper, we focus on enhancing the perceptual quality of compressed video. Our main observation is that enhancing the perceptual quality mostly relies on recovering high-frequency sub-bands in wavelet domain. Accordingly, we propose a novel generative adversarial network (GAN) based on multi-level wavelet packet transform (WPT) to enhance the perceptual quality of compressed video, which is called multi-level wavelet-based GAN (MW-GAN). In MW-GAN, we first apply motion compensation with a pyramid architecture to obtain temporal information. Then, we propose a wavelet reconstruction network with wavelet-dense residual blocks (WDRB) to recover the high-frequency details. In addition, the adversarial loss of MW-GAN is added via WPT to further encourage high-frequency details recovery for video frames. Experimental results demonstrate the superiority of our method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bampis, C.G., Li, Z., Moorthy, A.K., Katsavounidis, I., Aaron, A., Bovik, A.C.: Study of temporal effects on subjective video quality of experience. IEEE Trans. Image Process. (TIP) 26(11), 5217–5231 (2017)
Blau, Y., Mechrez, R., Timofte, R., Michaeli, T., Zelnik-Manor, L.: The 2018 PIRM challenge on perceptual image super-resolution. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 334–355. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_21
Blau, Y., Michaeli, T.: The perception-distortion tradeoff. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6228–6237 (2018)
Cavigelli, L., Hager, P., Benini, L.: CAS-CNN: a deep convolutional neural network for image compression artifact suppression. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 752–759. IEEE (2017)
Chang, H., Ng, M.K., Zeng, T.: Reducing artifacts in jpeg decompression via a learned dictionary. IEEE Trans. Sig. Process. 62(3), 718–728 (2013)
Chu, M., Xie, Y., Leal-Taixé, L., Thuerey, N.: Temporally coherent GANs for video super-resolution (tecogan). arXiv preprint arXiv:1811.09393 (2018)
CVNI: Cisco visual networking index: global mobile data traffic forecast update, 2016–2021 white paper. https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/white-paper-c11-741490.html (2017)
Deng, X., Yang, R., Xu, M., Dragotti, P.L.: Wavelet domain style transfer for an effective perception-distortion tradeoff in single image super-resolution. In: Proceedings of the IEEE International Conference on Computer Vision (CVPR), pp. 3076–3085 (2019)
Dong, C., Deng, Y., Change Loy, C., Tang, X.: Compression artifacts reduction by a deep convolutional network. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 576–584 (2015)
Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 658–666 (2016)
Foi, A., Katkovnik, V., Egiazarian, K.: Pointwise shape-adaptive DCT for high-quality denoising and deblocking of grayscale and color images. IEEE Trans. Image Process. (TIP) 16(5), 1395–1411 (2007)
Guan, Z., Xing, Q., Xu, M., Yang, R., Liu, T., Wang, Z.: MFQE 2.0: a new approach for multi-frame quality enhancement on compressed video. IEEE Trans. Pattern Anal. Machine Intelligence (TPAMI), p. 1 (2019)
Guo, J., Chao, H.: Building dual-domain representations for compression artifacts reduction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_38
Jancsary, J., Nowozin, S., Rother, C.: Loss-specific training of non-parametric image restoration models: a new state of the art. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 112–125. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33786-4_9
Jawerth, B., Sweldens, W.: An overview of wavelet based multiresolution analyses. SIAM Rev. 36(3), 377–412 (1994)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Jung, C., Jiao, L., Qi, H., Sun, T.: Image deblocking via sparse representation. Sig. Process. Image Commun. 27(6), 663–677 (2012)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2015)
Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep laplacian pyramid networks for fast and accurate super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 624–632 (2017)
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4681–4690 (2017)
Li, K., Bare, B., Yan, B.: An efficient deep convolutional neural networks model for compressed image deblocking. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 1320–1325. IEEE (2017)
Li, S., Xu, M., Deng, X., Wang, Z.: Weight-based r-\(\lambda \) rate control for perceptual HEVC coding on conversational videos. Sig. Process. Image Commun. 38, 127–140 (2015)
Liew, A.C., Yan, H.: Blocking artifacts suppression in block-coded images using overcomplete wavelet representation. IEEE Trans. Circuits Syst. Video Technol. (TCSVT) 14(4), 450–461 (2004)
Mallat, S.: A Wavelet Tour of Signal Processing. Elsevier, New York (1999)
Mao, X., et al.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision (CVPR), pp. 2794–2802 (2017)
Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015)
Mechrez, R., Talmi, I., Shama, F., Zelnik-Manor, L.: Maintaining natural image statistics with the contextual loss. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 427–443. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20893-6_27
Meng, X., et al.: MGANET: a robust model for quality enhancement of compressed video. arXiv preprint arXiv:1811.09150 (2018)
Ohm, J.R., Sullivan, G.J., Schwarz, H., Tan, T.K., Wiegand, T.: Comparison of the coding efficiency of video coding standards including high efficiency video coding (HEVC). IEEE Trans. Circuits Syst. Video Technol. (TCSVT) 22(12), 1669–1684 (2012)
Sajjadi, M.S., Scholkopf, B., Hirsch, M.: Enhancenet: single image super-resolution through automated texture synthesis. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4491–4500 (2017)
Seshadrinathan, K., Soundararajan, R., Bovik, A.C., Cormack, L.K.: Study of subjective and objective quality assessment of video. IEEE Transactions on Image Processing (TIP) 19(6), 1427–1441 (2010)
Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. (TCSVT) 22(12), 1649–1668 (2012)
Tai, Y., Yang, J., Liu, X., Xu, C.: MEMNET: a persistent memory network for image restoration. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4539–4547 (2017)
Wang, T., Chen, M., Chao, H.: A novel deep learning-based method of improving coding efficiency from the decoder-end for HEVC. In: 2017 Data Compression Conference (DCC), pp. 410–419. IEEE (2017)
Wang, X., Yu, K., Dong, C., Change Loy, C.: Recovering realistic texture in image super-resolution by deep spatial feature transform. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 606–615 (2018)
Wang, X., et al.: ESRGAN: enhanced super-resolution generative adversarial networks. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 63–79. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_5
Wang, Z., Liu, D., Chang, S., Ling, Q., Yang, Y., Huang, T.S.: D3: deep dual-domain based fast restoration of jpeg-compressed images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2764–2772 (2016)
Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems (NIPS), pp. 802–810 (2015)
Yang, R., Sun, X., Xu, M., Zeng, W.: Quality-gated convolutional LSTM for enhancing compressed video. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 532–537. IEEE (2019)
Yang, R., Xu, M., Liu, T., Wang, Z., Guan, Z.: Enhancing quality for HEVC compressed videos. IEEE Trans. Circuits Syst. Video Technol. (TCSVT) (2018)
Yang, R., Xu, M., Wang, Z.: Decoder-side HEVC quality enhancement with scalable convolutional neural network. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 817–822. IEEE (2017)
Yang, R., Xu, M., Wang, Z., Li, T.: Multi-frame quality enhancement for compressed video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6664–6673 (2018)
Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Trans. Image Process. (TIP) 26(7), 3142–3155 (2017)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 586–595 (2018)
Acknowledgement
This work was supported by the NSFC under Project 61876013, Project 61922009, and Project 61573037.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, J., Deng, X., Xu, M., Chen, C., Song, Y. (2020). Multi-level Wavelet-Based Generative Adversarial Network for Perceptual Quality Enhancement of Compressed Video. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12359. Springer, Cham. https://doi.org/10.1007/978-3-030-58568-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-58568-6_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58567-9
Online ISBN: 978-3-030-58568-6
eBook Packages: Computer ScienceComputer Science (R0)