research-article

Open Access

DeepFovea: neural reconstruction for foveated rendering and video compression using learned statistics of natural videos

Authors:
Anton S. Kaplanyan

Facebook Reality Labs

Facebook Reality Labs
View Profile

,
Anton Sochenov

Facebook Reality Labs

Facebook Reality Labs
View Profile

,
Thomas Leimkühler

Facebook Reality Labs

Facebook Reality Labs
View Profile

,
Mikhail Okunev

Facebook Reality Labs

Facebook Reality Labs
View Profile

,
Todd Goodall

Facebook Reality Labs

Facebook Reality Labs
View Profile

,
Gizem Rufo

Facebook Reality Labs

Facebook Reality Labs
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 38 Issue 6Article No.: 212pp 1–13https://doi.org/10.1145/3355089.3356557

Published:08 November 2019Publication History

ACM Transactions on Graphics

Abstract

In order to provide an immersive visual experience, modern displays require head mounting, high image resolution, low latency, as well as high refresh rate. This poses a challenging computational problem. On the other hand, the human visual system can consume only a tiny fraction of this video stream due to the drastic acuity loss in the peripheral vision. Foveated rendering and compression can save computations by reducing the image quality in the peripheral vision. However, this can cause noticeable artifacts in the periphery, or, if done conservatively, would provide only modest savings. In this work, we explore a novel foveated reconstruction method that employs the recent advances in generative adversarial neural networks. We reconstruct a plausible peripheral video from a small fraction of pixels provided every frame. The reconstruction is done by finding the closest matching video to this sparse input stream of pixels on the learned manifold of natural videos. Our method is more efficient than the state-of-the-art foveated rendering, while providing the visual experience with no noticeable quality degradation. We conducted a user study to validate our reconstruction method and compare it against existing foveated rendering and video compression techniques. Our method is fast enough to drive gaze-contingent head-mounted displays in real time on modern hardware. We plan to publish the trained network to establish a new quality bar for foveated rendering and compression as well as encourage follow-up research.

Supplemental Material

a212-kaplanyan.mp4

mp4

466.4 MB

Download

Available for Download

zip

a212-kaplanyan.zip (460.1 MB)

Supplemental files.

References

Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A Large-Scale Video Classification Benchmark. CoRR abs/1609.08675 (2016).Google Scholar
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, 214--223.Google ScholarDigital Library
Lei Jimmy Ba, Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. CoRR abs/1607.06450 (2016). arXiv:1607.06450 http://arxiv.org/abs/1607.06450Google Scholar
Christos Bampis, Zhi Li, Ioannis Katsavounidis, Te-Yuan Huang, Chaitanya Ekanadham, and Alan C. Bovik. 2018. Towards Perceptually Optimized End-to-end Adaptive Video Streaming. arXiv preprint arXiv:1808.03898 (2018).Google Scholar
Aayush Bansal, Shugao Ma, Deva Ramanan, and Yaser Sheikh. 2018. Recycle-GAN: Unsupervised Video Retargeting. In Proc. European Conference on Computer Vision.Google ScholarCross Ref
Chris Bradley, Jared Abrams, and Wilson S. Geisler. 2014. Retina-V1 model of detectability across the visual field. Journal of vision 14, 12 (2014), 22--22.Google ScholarCross Ref
Chakravarty R. Alla Chaitanya, Anton S. Kaplanyan, Christoph Schied, Marco Salvi, Aaron Lefohn, Derek Nowrouzezahrai, and Timo Aila. 2017. Interactive Reconstruction of Monte Carlo Image Sequences Using a Recurrent Denoising Autoencoder. ACM Trans. Graph. (Proc. SIGGRAPH) 36, 4, Article 98 (2017), 98:1--98:12 pages.Google Scholar
Lark Kwon Choi and Alan Conrad Bovik. 2018. Video quality assessment accounting for temporal visual masking of local flicker. Signal Processing: Image Communication 67 (2018), 182 -- 198.Google ScholarCross Ref
Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2016. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). Conference on Learning Representations, ICLR abs/1511.07289 (2016).Google Scholar
Robert L Cook. 1986. Stochastic sampling in computer graphics. ACM Transactions on Graphics (TOG) 5, 1 (1986), 51--72.Google ScholarDigital Library
Christine A Curcio, Kenneth R Sloan, Robert E Kalina, and Anita E Hendrickson. 1990. Human photoreceptor topography. Journal of comparative neurology 292, 4 (1990), 497--523.Google ScholarCross Ref
Dennis M Dacey and Michael R Petersen. 1992. Dendritic field size and morphology of midget and parasol ganglion cells of the human retina. Proceedings of the National Academy of sciences 89, 20 (1992), 9666--9670.Google ScholarCross Ref
Wilson S. Geisler. 2008. Visual perception and the statistical properties of natural scenes. Annu. Rev. Psychol. 59 (2008), 167--192.Google ScholarCross Ref
Wilson S. Geisler and Jeffrey S. Perry. 1998. Real-time foveated multiresolution system for low-bandwidth video communication., 3299 - 3299 - 12 pages.Google Scholar
Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proc. Conf. Computer Vision and Pattern Recognition (2014), 580--587.Google ScholarDigital Library
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. arXiv e-prints (2014), arXiv:1406.2661. https://arxiv.org/abs/1406.2661Google Scholar
Brian Guenter, Mark Finch, Steven Drucker, Desney Tan, and John Snyder. 2012. Foveated 3D Graphics. ACM Trans. Graph. (Proc. SIGGRAPH) 31, 6, Article 164 (2012), 164:1--164:10 pages.Google ScholarDigital Library
Lars Haglund. 2006. The SVT High Definition Multi Format Test Set. (2006). https://media.xiph.org/video/derf/vqeg.its.bldrdoc.gov/HDTV/SVT_MultiFormat/SVT_MultiFormat_v10.pdfGoogle Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (2016), 770--778.Google Scholar
Yong He, Yan Gu, and Kayvon Fatahalian. 2014. Extending the Graphics Pipeline with Adaptive, Multi-rate Shading. ACM Trans. Graph. (Proc. SIGGRAPH) 33, 4, Article 142 (2014), 142:1--142:12 pages.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarDigital Library
E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox. 2017. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. In Proc. Conf. Computer Vision and Pattern Recognition. http://lmb.informatik.uni-freiburg.de//Publications/2017/IMKDB17Google Scholar
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. Proc. Conf. Computer Vision and Pattern Recognition (2017), 5967--5976.Google Scholar
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations.Google Scholar
D. H. Kelly. 1984. Retinal inhomogeneity. I. Spatiotemporal contrast sensitivity. JOSA A 1, 1 (1984), 107--113.Google ScholarCross Ref
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR (2014).Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Neural Information Processing Systems 25 (01 2012).Google ScholarDigital Library
Debarati Kundu and Brian L. Evans. 2015. Full-reference visual quality assessment for synthetic images: A subjective study. IEEE International Conference on Image Processing (ICIP) (2015), 2374--2378.Google ScholarDigital Library
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarCross Ref
C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. 2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proc. Conf. Computer Vision and Pattern Recognition. 105--114.Google Scholar
S. Lee, M. Pattichis, and A. C. Bovik. 2001. Foveated Video Compression with Optimal Rate Control. IEEE Transactions on Image Processing 10, 7 (2001), 977--992.Google ScholarDigital Library
Chuan Li and Michael Wand. 2016. Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks. CoRR abs/1604.04382 (2016).Google Scholar
Guilin Liu, Fitsum A. Reda, Kevin Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image Inpainting for Irregular Holes Using Partial Convolutions. arXiv preprint arXiv:1804.07723 (2018).Google Scholar
Tsung-Jung Liu, Yu-Chieh Lin, Weisi Lin, and C-C Jay Kuo. 2013. Visual quality assessment: recent developments, coding applications and future trends. Transactions on Signal and Information Processing 2 (2013).Google Scholar
Rafat Mantiuk, Kil Joong Kim, Allan G. Rempel, and Wolfgang Heidrich. 2011. HDR-VDP-2: a calibrated visual metric for visibility and quality predictions in all luminance conditions. In ACM Transactions on graphics (TOG), Vol. 30. ACM, 40.Google Scholar
Don P. Mitchell. 1991. Spectrally Optimal Sampling for Distribution Ray Tracing. Computer Graphics (Proc. SIGGRAPH) 25, 4 (1991), 157--164.Google ScholarDigital Library
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. CoRR abs/1802.05957 (2018).Google Scholar
Deepak Pathak, Philipp Krähenbühl, Jeff Donahue, Trevor Darrell, and Alexei Efros. 2016. Context Encoders: Feature Learning by Inpainting. In Proc. Conf. Computer Vision and Pattern Recognition.Google ScholarCross Ref
Anjul Patney, Marco Salvi, Joohwan Kim, Anton Kaplanyan, Chris Wyman, Nir Benty, David Luebke, and Aaron Lefohn. 2016. Towards Foveated Rendering for Gazetracked Virtual Reality. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 35, 6, Article 179 (2016), 179:1--179:12 pages.Google Scholar
Eduardo Pérez-Pellitero, Mehdi S. M. Sajjadi, Michael Hirsch, and Bernhard Schölkopf. 2018. Photorealistic Video Super Resolution. CoRR abs/1807.07930 (2018).Google Scholar
E. Pérez-Pellitero, M. S. M. Sajjadi, M. Hirsch, and B. Schölkopf. 2018. Photorealistic Video Super Resolution.Google Scholar
Margaret H. Pinson and Stephen Wolf. 2004. A new standardized method for objectively measuring video quality. IEEE Transactions on broadcasting 50, 3 (2004), 312--322.Google ScholarCross Ref
S. Rimac-Drlje, G. Martinović, and B. Zovko-Cihlar. 2011. Foveation-based content Adaptive Structural Similarity index. International Conference on Systems, Signals and Image Processing (2011), 1--4.Google Scholar
Oren Rippel, Sanjay Nair, Carissa Lew, Steve Branson, Alexander G. Anderson, and Lubomir Bourdev. 2018. Learned Video Compression. (2018).Google Scholar
J. G. Robson. 1966. Spatial and Temporal Contrast-Sensitivity Functions of the Visual System. JOSA A 56, 8 (Aug 1966), 1141--1142.Google ScholarCross Ref
O. Ronneberger, P. Fischer, and T. Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI) (LNCS), Vol. 9351. 234--241.Google Scholar
Jyrki Rovamo, Lea Leinonen, Pentti Laurinen, and Veijo Virsu. 1984. Temporal Integration and Contrast Sensitivity in Foveal and Peripheral Vision. Perception 13, 6 (1984), 665--674.Google ScholarCross Ref
Daniel L Ruderman. 1994. The statistics of natural images. Network: computation in neural systems 5, 4 (1994), 517--548.Google Scholar
Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85 -- 117.Google ScholarDigital Library
K. Seshadrinathan, R. Soundararajan, A. C. Bovik, and L. K. Cormack. 2010. Study of Subjective and Objective Quality Assessment of Video. IEEE Transactions on Image Processing 19, 6 (2010), 1427--1441.Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014).Google Scholar
Rajiv Soundararajan and Alan C. Bovik. 2013. Video quality assessment by reduced reference spatio-temporal entropic differencing. IEEE Transactions on Circuits and Systems for Video Technology 23, 4 (2013), 684--694.Google ScholarDigital Library
Michael Stengel, Steve Grogorick, Martin Eisemann, and Marcus Magnor. 2016. Adaptive Image-Space Sampling for Gaze-Contingent Real-time Rendering. Computer Graphics Forum (Proc. of Eurographics Symposium on Rendering) 35, 4 (2016), 129--139.Google Scholar
Qi Sun, Fu-Chung Huang, Joohwan Kim, Li-Yi Wei, David Luebke, and Arie Kaufman. 2017. Perceptually-guided Foveation for Light Field Displays. ACM Trans. Graph. (Proc. SIGGRAPH) 36, 6, Article 192 (2017), 192:1--192:13 pages.Google ScholarDigital Library
Nicholas T. Swafford, José A. Iglesias-Guitian, Charalampos Koniaris, Bochang Moon, Darren Cosker, and Kenny Mitchell. 2016. User, metric, and computational evaluation of foveated rendering methods. Proc. ACM Symposium on Applied Perception (2016), 7--14.Google ScholarDigital Library
Robert A Ulichney. 1993. Void-and-cluster method for dither array generation. In Human Vision, Visual Processing, and Digital Display IV, Vol. 1913. International Society for Optics and Photonics, 332--343.Google Scholar
Alex Vlachos. 2015. Advanced VR Rendering. http://media.steampowered.com/apps/valve/2015/Alex_Vlachos_Advanced_VR_Rendering_GDC2015.pdf Game Developers Conference Talk.Google Scholar
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. Video-to-Video Synthesis. In Neural Information Processing Systems.Google Scholar
Zhou Wang, Alan Conrad Bovik, Ligang Lu, and Jack L Kouloheris. 2001. Foveated wavelet image quality index. Proc. SPIE 4472 (2001), 42--53.Google Scholar
Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600--612.Google ScholarDigital Library
Zhou Wang, Ligang Lu, and Alan C. Bovik. 2003. Foveation scalable video coding with automatic fixation selection. IEEE Transactions on Image Processing 12, 2 (2003), 243--254.Google ScholarDigital Library
Martin Weier, Thorsten Roth, Ernst Kruijff, André Hinkenjann, Arsène Pérard-Gayot, Philipp Slusallek, and Yongmin Li. 2016. Foveated Real-Time Ray Tracing for Head-Mounted Displays. Computer Graphics Forum 35 (2016), 289--298.Google ScholarDigital Library
M. Weier, M. Stengel, T. Roth, P. Didyk, E. Eisemann, M. Eisemann, S. Grogorick, A. Hinkenjann, E. Kruijff, M. Magnor, K. Myszkowski, and P. Slusallek. 2017. Perception-driven Accelerated Rendering. Computer Graphics Forum 36, 2 (2017), 611--643.Google ScholarDigital Library
Y. Ye, E. Alshina, and J. Boyce. 2017. Algorithm descriptions of projection format conversion and video quality metrics in 360Lib. Joint Video Exploration Team of ITU-T SG 16 (2017).Google Scholar
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proc. Conf. Computer Vision and Pattern Recognition.Google ScholarCross Ref

Index Terms

DeepFovea: neural reconstruction for foveated rendering and video compression using learned statistics of natural videos
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Perception
      2. Virtual reality
    2. Image compression
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Towards foveated rendering for gaze-tracked virtual reality

Foveated rendering synthesizes images with progressively less detail outside the eye fixation region, potentially unlocking significant speedups for wide field-of-view displays, such as head mounted displays, where target framerate and resolution is ...
Read More
Kernel Foveated Rendering

Foveated rendering coupled with eye-tracking has the potential to dramatically accelerate interactive 3D graphics with minimal loss of perceptual detail. In this paper, we parameterize foveated rendering by embedding polynomial kernel functions in the ...
Read More
Foveated light culling
- A complete pipeline to approximate single-bounce diffuse-to-diffuse indirect illumination in the foveal region.
Graphical abstract

Display Omitted

Abstract
In this paper, we propose a novel Foveated Light Culling (FLC) method to efficiently approximate global illumination for foveated rendering in virtual reality applications. The key idea is to cull the virtual point lights (VPLs) in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Graphics Volume 38, Issue 6
December 2019
1292 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3355089
Issue’s Table of Contents

Copyright © 2019 Owner/Author
This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 November 2019
Published in tog Volume 38, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
foveated rendering
gaze-contingent rendering
generative networks
perceptual rendering
video compression
video generation
virtual reality
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 96
  Total Citations
  View Citations
- 8,999
  Total Downloads
- Downloads (Last 12 months)665
- Downloads (Last 6 weeks)61
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DeepFovea: neural reconstruction for foveated rendering and video compression using learned statistics of natural videos

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Towards foveated rendering for gaze-tracked virtual reality

Kernel Foveated Rendering

Foveated light culling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

DeepFovea: neural reconstruction for foveated rendering and video compression using learned statistics of natural videos

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Towards foveated rendering for gaze-tracked virtual reality

Kernel Foveated Rendering

Foveated light culling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media