skip to main content
10.1145/1730804.1730817acmconferencesArticle/Chapter ViewAbstractPublication Pagesi3dConference Proceedingsconference-collections
research-article

FreePipe: a programmable parallel rendering architecture for efficient multi-fragment effects

Published:19 February 2010Publication History

ABSTRACT

In the past decade, modern GPUs have provided increasing programmability with vertex, geometry and fragment shaders. However, many classical problems have not been efficiently solved using the current graphics pipeline where some stages are still fixed functions on chip. In particular, multi-fragment effects, especially order-independent transparency, require programmability of the blending stage, that makes it difficult to be solved in a single geometry pass. In this paper we present FreePipe, a system for programmable parallel rendering that can run entirely on current graphics hardware and has performance comparable with the traditional graphics pipeline. Within this framework, two schemes for the efficient rendering of multi-fragment effects in a single geometry pass have been developed by exploiting CUDA atomic operations. Both schemes have achieved significant speedups compared to the state-of-the-art methods that are based on traditional graphics pipelines.

References

  1. Aila, E., Miettinen, V., and Nordlund, P. 2003. Delay streams for graphics hardware. ACM Transactions on Graphics, 792--800. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Akenine-Möller, T., Haines, E., and Hoffman, N. 2008. Real-Time Rendering, third ed. A. K. Peters.Google ScholarGoogle Scholar
  3. Bavoil, L., and Myers, K. 2008. Order independent transparency with dual depth peeling. Tech. rep., NVIDIA Corporation.Google ScholarGoogle Scholar
  4. Bavoil, L., Callahan, S. P., Lefohn, A., ao L. D. Comba, J., and Silva, C. T. 2007. Multi-fragment effects on the GPU using the k-buffer. In Proceedings of the 2007 symposium on Interactive 3D graphics and games, 97--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bavoil, L., Callahan, S. P., and Silva, C. T. 2008. Robust soft shadow mapping with backprojection and depth peeling. journal of graphics, gpu, and game tools 13, 1, 19--30.Google ScholarGoogle Scholar
  6. Blythe, D. 2006. The Direct3D 10 system. ACM Transactions on Graphics 25, 3, 724--734. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Carpenter, L. 1984. The A-buffer, an antialiased hidden surface method. In Proceedings of the 11th annual conference on computer graphics and interactive techniques, 103--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Carr, N., Mech, R., and Miller, G. 2008. Coherent layer peeling for transparent high-depth-complexity scenes. In Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Catmull, E. E. 1974. A Subdivision Algorithm for Computer Display of Curved Surfaces. PhD thesis, University of Utah. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cederman, D., and Tsigas, P. 2008. A practical quicksort algorithm for graphics processors. In Proceedings of the 16th Annual European Symposium on Algorithms, 246--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cook, R. L., Carpenter, L., and Catmull, E. 1987. The reyes image rendering architecture. In Computer Graphics (Proceedings of ACM SIGGRAPH 87), ACM, vol. 21, 95--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Eisemann, E., and Décoret, X. 2006. Fast scene voxelization and applications. In SIGGRAPH 2006 Technical Sketch Program. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Everitt, C. 2001. Interactive order-independent transparency. Tech. rep., NVIDIA Corporation.Google ScholarGoogle Scholar
  14. Fatahalian, K., Luong, E., Boulos, S., Akeley, K., Mark, W. R., and Hanrahan, P. 2009. Data-parallel rasterization of micropolygons with defocus and motion blur. In Proceedings of the Conference on High Performance Graphics 2009, ACM, 59--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Govindaraju, N. K., Henson, M., Lin, M. C., and Manocha, D. 2005. Interactive visibility ordering and transparency computations among geometric primitives in complex environments. In Proceedings of the 2005 symposium on Interactive 3D graphics and games, 49--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Govindaraju, N. K., Raghuvanshi, N., Henson, M., Tuft, D., and Manocha, D. 2005. A cache-efficient sorting algorithm for database and data mining computations using graphics processors. Tech. rep., University of North Carolina-Chapel Hill.Google ScholarGoogle Scholar
  17. Hasselgren, J., Munkberg, J., and Akenine-Möller, T. 2009. Automatic pre-tessellation culling. ACM Transactions on Graphics 28, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Horn, D. R., Sugerman, J., Houston, M., and Hanrahan, P. 2007. Interactive k-d tree gpu raytracing. In Proceedings of the 2007 symposium on Interactive 3D graphics and games, 167--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jon Hasselgren, T. A.-M. 2007. PCU: the programmable culling unit. ACM Transactions on Graphics, 92.Google ScholarGoogle Scholar
  20. Jouppi, N. P., and Chang, C.-F. 1999. z3: an economical hardware technique for high-quality antialiasing and transparency. 85--93.Google ScholarGoogle Scholar
  21. Liu, B.-Q., Wei, L.-Y., and Xu, Y.-Q. 2006. Multi-layer depth peeling via fragment sort. Tech. rep., Microsoft Research Asia.Google ScholarGoogle Scholar
  22. Liu, F., Huang, M.-C., Liu, X.-H., and Wu, E.-H. 2009. Efficient depth peeling via bucket sort. In Proceedings of the 1th High Performance Graphics conference, 51--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mammen, A. 1989. Transparency and antialiasing algorithms implemented with the virtual pixel maps technique. IEEE Computer Graphics and Applications 9, 4, 43--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mark, W. R., and Proudfoot, K. 2001. The F-buffer: a rasterization-order fifo buffer for multi-pass rendering. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware, 57--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Morein, S. 2000. ATI Radeon - HyperZ technology. In Proceedings of the Hot 3D Workshop on Graphics Hardware.Google ScholarGoogle Scholar
  26. Myers, K., and Bavoil, L. 2007. Stencil routed A-Buffer. ACM SIGGRAPH 2007 Technical Sketch Program. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. NVIDIA. 2005. GPU programming exposed: the naked truth behind nvidia's demos. Tech. rep., NVIDIA Corporation.Google ScholarGoogle Scholar
  28. NVIDIA. 2008. NVIDIA CUDA: Compute unified device architecture. NVIDIA Corporation.Google ScholarGoogle Scholar
  29. Popov, S., Günther, J., Seidel, H.-P., and Slusallek, P. 2007. Stackless kd-tree traversal for high performance GPU ray tracing. Computer Graphics Forum 26, 3, 415--424.Google ScholarGoogle ScholarCross RefCross Ref
  30. Satish, N., Harris, M., and Garland, M. 2009. Designing efficient sorting algorithms for manycore GPUs. In Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Segal, M., and Akeley, K. 2009. The OpenGL graphics system: A specification.Google ScholarGoogle Scholar
  32. Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., and Hanrahan, P. 2008. Larrabee: A many-core x86 architecture for visual computing. ACM Transactions on Graphics 27, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sengupta, S., Harris, M., Zhang, Y., and Owens, J. D. 2007. Scan primitives for GPU computing. In Proceedings of the 22nd ACM Siggraph/Eurographics Symposium on Graphics Hardware, 97--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sintorn, E., and Assarsson, U. 2008. Fast parallel GPU-sorting using a hybrid algorithm. Journal of Parallel and Distributed Computing archive 68, 1381--1388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sugerman, J., Fatahalian, K., Boulos, S., Akeley, K., and Hanrahan, P. 2009. Gramps: A programming model for graphics pipelines. ACM Transactions on Graphics 28, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Tatarinov, A., and Kharlamov, A. 2009. Alternative rendering pipelines on nvidia cuda. Tech. rep., NVIDIA Corporation.Google ScholarGoogle Scholar
  37. Wexler, D., Gritz, L., Enderton, E., and Rice, J. 2005. GPU-accelerated high-quality hidden surface removal. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, 7--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wittenbrink, C. M. 2001. R-buffer: a pointerless a-buffer hardware architecture. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware, 73--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zhou, K., Hou, Q., Wang, R., and Guo, B. 2008. Real-time KD-tree construction on graphics hardware. ACM Transactions on Graphics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zhou, K., Hou, Q., Ren, Z., Gong, M., Sun, X., and Guo, B. 2009. Renderants: Interactive REYES rendering on GPUs. ACM Transactions on Graphics. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FreePipe: a programmable parallel rendering architecture for efficient multi-fragment effects

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader