skip to main content
research-article
Open Access

DeepFovea: neural reconstruction for foveated rendering and video compression using learned statistics of natural videos

Published:08 November 2019Publication History
Skip Abstract Section

Abstract

In order to provide an immersive visual experience, modern displays require head mounting, high image resolution, low latency, as well as high refresh rate. This poses a challenging computational problem. On the other hand, the human visual system can consume only a tiny fraction of this video stream due to the drastic acuity loss in the peripheral vision. Foveated rendering and compression can save computations by reducing the image quality in the peripheral vision. However, this can cause noticeable artifacts in the periphery, or, if done conservatively, would provide only modest savings. In this work, we explore a novel foveated reconstruction method that employs the recent advances in generative adversarial neural networks. We reconstruct a plausible peripheral video from a small fraction of pixels provided every frame. The reconstruction is done by finding the closest matching video to this sparse input stream of pixels on the learned manifold of natural videos. Our method is more efficient than the state-of-the-art foveated rendering, while providing the visual experience with no noticeable quality degradation. We conducted a user study to validate our reconstruction method and compare it against existing foveated rendering and video compression techniques. Our method is fast enough to drive gaze-contingent head-mounted displays in real time on modern hardware. We plan to publish the trained network to establish a new quality bar for foveated rendering and compression as well as encourage follow-up research.

Skip Supplemental Material Section

Supplemental Material

a212-kaplanyan.mp4

mp4

466.4 MB

References

  1. Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A Large-Scale Video Classification Benchmark. CoRR abs/1609.08675 (2016).Google ScholarGoogle Scholar
  2. Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, 214--223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Lei Jimmy Ba, Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. CoRR abs/1607.06450 (2016). arXiv:1607.06450 http://arxiv.org/abs/1607.06450Google ScholarGoogle Scholar
  4. Christos Bampis, Zhi Li, Ioannis Katsavounidis, Te-Yuan Huang, Chaitanya Ekanadham, and Alan C. Bovik. 2018. Towards Perceptually Optimized End-to-end Adaptive Video Streaming. arXiv preprint arXiv:1808.03898 (2018).Google ScholarGoogle Scholar
  5. Aayush Bansal, Shugao Ma, Deva Ramanan, and Yaser Sheikh. 2018. Recycle-GAN: Unsupervised Video Retargeting. In Proc. European Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  6. Chris Bradley, Jared Abrams, and Wilson S. Geisler. 2014. Retina-V1 model of detectability across the visual field. Journal of vision 14, 12 (2014), 22--22.Google ScholarGoogle ScholarCross RefCross Ref
  7. Chakravarty R. Alla Chaitanya, Anton S. Kaplanyan, Christoph Schied, Marco Salvi, Aaron Lefohn, Derek Nowrouzezahrai, and Timo Aila. 2017. Interactive Reconstruction of Monte Carlo Image Sequences Using a Recurrent Denoising Autoencoder. ACM Trans. Graph. (Proc. SIGGRAPH) 36, 4, Article 98 (2017), 98:1--98:12 pages.Google ScholarGoogle Scholar
  8. Lark Kwon Choi and Alan Conrad Bovik. 2018. Video quality assessment accounting for temporal visual masking of local flicker. Signal Processing: Image Communication 67 (2018), 182 -- 198.Google ScholarGoogle ScholarCross RefCross Ref
  9. Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2016. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). Conference on Learning Representations, ICLR abs/1511.07289 (2016).Google ScholarGoogle Scholar
  10. Robert L Cook. 1986. Stochastic sampling in computer graphics. ACM Transactions on Graphics (TOG) 5, 1 (1986), 51--72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Christine A Curcio, Kenneth R Sloan, Robert E Kalina, and Anita E Hendrickson. 1990. Human photoreceptor topography. Journal of comparative neurology 292, 4 (1990), 497--523.Google ScholarGoogle ScholarCross RefCross Ref
  12. Dennis M Dacey and Michael R Petersen. 1992. Dendritic field size and morphology of midget and parasol ganglion cells of the human retina. Proceedings of the National Academy of sciences 89, 20 (1992), 9666--9670.Google ScholarGoogle ScholarCross RefCross Ref
  13. Wilson S. Geisler. 2008. Visual perception and the statistical properties of natural scenes. Annu. Rev. Psychol. 59 (2008), 167--192.Google ScholarGoogle ScholarCross RefCross Ref
  14. Wilson S. Geisler and Jeffrey S. Perry. 1998. Real-time foveated multiresolution system for low-bandwidth video communication., 3299 - 3299 - 12 pages.Google ScholarGoogle Scholar
  15. Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proc. Conf. Computer Vision and Pattern Recognition (2014), 580--587.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. arXiv e-prints (2014), arXiv:1406.2661. https://arxiv.org/abs/1406.2661Google ScholarGoogle Scholar
  17. Brian Guenter, Mark Finch, Steven Drucker, Desney Tan, and John Snyder. 2012. Foveated 3D Graphics. ACM Trans. Graph. (Proc. SIGGRAPH) 31, 6, Article 164 (2012), 164:1--164:10 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lars Haglund. 2006. The SVT High Definition Multi Format Test Set. (2006). https://media.xiph.org/video/derf/vqeg.its.bldrdoc.gov/HDTV/SVT_MultiFormat/SVT_MultiFormat_v10.pdfGoogle ScholarGoogle Scholar
  19. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (2016), 770--778.Google ScholarGoogle Scholar
  20. Yong He, Yan Gu, and Kayvon Fatahalian. 2014. Extending the Graphics Pipeline with Adaptive, Multi-rate Shading. ACM Trans. Graph. (Proc. SIGGRAPH) 33, 4, Article 142 (2014), 142:1--142:12 pages.Google ScholarGoogle Scholar
  21. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox. 2017. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. In Proc. Conf. Computer Vision and Pattern Recognition. http://lmb.informatik.uni-freiburg.de//Publications/2017/IMKDB17Google ScholarGoogle Scholar
  23. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. Proc. Conf. Computer Vision and Pattern Recognition (2017), 5967--5976.Google ScholarGoogle Scholar
  24. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  25. D. H. Kelly. 1984. Retinal inhomogeneity. I. Spatiotemporal contrast sensitivity. JOSA A 1, 1 (1984), 107--113.Google ScholarGoogle ScholarCross RefCross Ref
  26. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR (2014).Google ScholarGoogle Scholar
  27. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Neural Information Processing Systems 25 (01 2012).Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Debarati Kundu and Brian L. Evans. 2015. Full-reference visual quality assessment for synthetic images: A subjective study. IEEE International Conference on Image Processing (ICIP) (2015), 2374--2378.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  30. C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. 2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proc. Conf. Computer Vision and Pattern Recognition. 105--114.Google ScholarGoogle Scholar
  31. S. Lee, M. Pattichis, and A. C. Bovik. 2001. Foveated Video Compression with Optimal Rate Control. IEEE Transactions on Image Processing 10, 7 (2001), 977--992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Chuan Li and Michael Wand. 2016. Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks. CoRR abs/1604.04382 (2016).Google ScholarGoogle Scholar
  33. Guilin Liu, Fitsum A. Reda, Kevin Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image Inpainting for Irregular Holes Using Partial Convolutions. arXiv preprint arXiv:1804.07723 (2018).Google ScholarGoogle Scholar
  34. Tsung-Jung Liu, Yu-Chieh Lin, Weisi Lin, and C-C Jay Kuo. 2013. Visual quality assessment: recent developments, coding applications and future trends. Transactions on Signal and Information Processing 2 (2013).Google ScholarGoogle Scholar
  35. Rafat Mantiuk, Kil Joong Kim, Allan G. Rempel, and Wolfgang Heidrich. 2011. HDR-VDP-2: a calibrated visual metric for visibility and quality predictions in all luminance conditions. In ACM Transactions on graphics (TOG), Vol. 30. ACM, 40.Google ScholarGoogle Scholar
  36. Don P. Mitchell. 1991. Spectrally Optimal Sampling for Distribution Ray Tracing. Computer Graphics (Proc. SIGGRAPH) 25, 4 (1991), 157--164.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. CoRR abs/1802.05957 (2018).Google ScholarGoogle Scholar
  38. Deepak Pathak, Philipp Krähenbühl, Jeff Donahue, Trevor Darrell, and Alexei Efros. 2016. Context Encoders: Feature Learning by Inpainting. In Proc. Conf. Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  39. Anjul Patney, Marco Salvi, Joohwan Kim, Anton Kaplanyan, Chris Wyman, Nir Benty, David Luebke, and Aaron Lefohn. 2016. Towards Foveated Rendering for Gazetracked Virtual Reality. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 35, 6, Article 179 (2016), 179:1--179:12 pages.Google ScholarGoogle Scholar
  40. Eduardo Pérez-Pellitero, Mehdi S. M. Sajjadi, Michael Hirsch, and Bernhard Schölkopf. 2018. Photorealistic Video Super Resolution. CoRR abs/1807.07930 (2018).Google ScholarGoogle Scholar
  41. E. Pérez-Pellitero, M. S. M. Sajjadi, M. Hirsch, and B. Schölkopf. 2018. Photorealistic Video Super Resolution.Google ScholarGoogle Scholar
  42. Margaret H. Pinson and Stephen Wolf. 2004. A new standardized method for objectively measuring video quality. IEEE Transactions on broadcasting 50, 3 (2004), 312--322.Google ScholarGoogle ScholarCross RefCross Ref
  43. S. Rimac-Drlje, G. Martinović, and B. Zovko-Cihlar. 2011. Foveation-based content Adaptive Structural Similarity index. International Conference on Systems, Signals and Image Processing (2011), 1--4.Google ScholarGoogle Scholar
  44. Oren Rippel, Sanjay Nair, Carissa Lew, Steve Branson, Alexander G. Anderson, and Lubomir Bourdev. 2018. Learned Video Compression. (2018).Google ScholarGoogle Scholar
  45. J. G. Robson. 1966. Spatial and Temporal Contrast-Sensitivity Functions of the Visual System. JOSA A 56, 8 (Aug 1966), 1141--1142.Google ScholarGoogle ScholarCross RefCross Ref
  46. O. Ronneberger, P. Fischer, and T. Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI) (LNCS), Vol. 9351. 234--241.Google ScholarGoogle Scholar
  47. Jyrki Rovamo, Lea Leinonen, Pentti Laurinen, and Veijo Virsu. 1984. Temporal Integration and Contrast Sensitivity in Foveal and Peripheral Vision. Perception 13, 6 (1984), 665--674.Google ScholarGoogle ScholarCross RefCross Ref
  48. Daniel L Ruderman. 1994. The statistics of natural images. Network: computation in neural systems 5, 4 (1994), 517--548.Google ScholarGoogle Scholar
  49. Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85 -- 117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. K. Seshadrinathan, R. Soundararajan, A. C. Bovik, and L. K. Cormack. 2010. Study of Subjective and Objective Quality Assessment of Video. IEEE Transactions on Image Processing 19, 6 (2010), 1427--1441.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014).Google ScholarGoogle Scholar
  52. Rajiv Soundararajan and Alan C. Bovik. 2013. Video quality assessment by reduced reference spatio-temporal entropic differencing. IEEE Transactions on Circuits and Systems for Video Technology 23, 4 (2013), 684--694.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Michael Stengel, Steve Grogorick, Martin Eisemann, and Marcus Magnor. 2016. Adaptive Image-Space Sampling for Gaze-Contingent Real-time Rendering. Computer Graphics Forum (Proc. of Eurographics Symposium on Rendering) 35, 4 (2016), 129--139.Google ScholarGoogle Scholar
  54. Qi Sun, Fu-Chung Huang, Joohwan Kim, Li-Yi Wei, David Luebke, and Arie Kaufman. 2017. Perceptually-guided Foveation for Light Field Displays. ACM Trans. Graph. (Proc. SIGGRAPH) 36, 6, Article 192 (2017), 192:1--192:13 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Nicholas T. Swafford, José A. Iglesias-Guitian, Charalampos Koniaris, Bochang Moon, Darren Cosker, and Kenny Mitchell. 2016. User, metric, and computational evaluation of foveated rendering methods. Proc. ACM Symposium on Applied Perception (2016), 7--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Robert A Ulichney. 1993. Void-and-cluster method for dither array generation. In Human Vision, Visual Processing, and Digital Display IV, Vol. 1913. International Society for Optics and Photonics, 332--343.Google ScholarGoogle Scholar
  57. Alex Vlachos. 2015. Advanced VR Rendering. http://media.steampowered.com/apps/valve/2015/Alex_Vlachos_Advanced_VR_Rendering_GDC2015.pdf Game Developers Conference Talk.Google ScholarGoogle Scholar
  58. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. Video-to-Video Synthesis. In Neural Information Processing Systems.Google ScholarGoogle Scholar
  59. Zhou Wang, Alan Conrad Bovik, Ligang Lu, and Jack L Kouloheris. 2001. Foveated wavelet image quality index. Proc. SPIE 4472 (2001), 42--53.Google ScholarGoogle Scholar
  60. Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600--612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Zhou Wang, Ligang Lu, and Alan C. Bovik. 2003. Foveation scalable video coding with automatic fixation selection. IEEE Transactions on Image Processing 12, 2 (2003), 243--254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Martin Weier, Thorsten Roth, Ernst Kruijff, André Hinkenjann, Arsène Pérard-Gayot, Philipp Slusallek, and Yongmin Li. 2016. Foveated Real-Time Ray Tracing for Head-Mounted Displays. Computer Graphics Forum 35 (2016), 289--298.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. M. Weier, M. Stengel, T. Roth, P. Didyk, E. Eisemann, M. Eisemann, S. Grogorick, A. Hinkenjann, E. Kruijff, M. Magnor, K. Myszkowski, and P. Slusallek. 2017. Perception-driven Accelerated Rendering. Computer Graphics Forum 36, 2 (2017), 611--643.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Y. Ye, E. Alshina, and J. Boyce. 2017. Algorithm descriptions of projection format conversion and video quality metrics in 360Lib. Joint Video Exploration Team of ITU-T SG 16 (2017).Google ScholarGoogle Scholar
  65. Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proc. Conf. Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. DeepFovea: neural reconstruction for foveated rendering and video compression using learned statistics of natural videos

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader