Abstract
This article presents Glift, an abstraction and generic template library for defining complex, random-access graphics processor (GPU) data structures. Like modern CPU data structure libraries, Glift enables GPU programmers to separate algorithms from data structure definitions; thereby greatly simplifying algorithmic development and enabling reusable and interchangeable data structures. We characterize a large body of previously published GPU data structures in terms of our abstraction and present several new GPU data structures. The structures, a stack, quadtree, and octree, are explained using simple Glift concepts and implemented using reusable Glift components. We also describe two applications of these structures not previously demonstrated on GPUs: adaptive shadow maps and octree three-dimensional paint. Last, we show that our example Glift data structures perform comparably to handwritten implementations while requiring only a fraction of the programming effort.
- Alexandrescu, A. 2001. Modern C++ Design: Generic Programming and Design Patterns Applied. Addison-Wesley, Reading, MA. Google Scholar
- An, P., Jula, A., Rus, S., Saunders, S., Smith, T., Tanase, G., Thomas, N., Amato, N., and Rauchwerger, L. 2001. STAPL: An adaptive, generic parallel C++ library. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing. 193--208. Google Scholar
- Austern, M. H., Towle, R. A., and Stepanov, A. A. 1996. Range partition adaptors: A mechanism for parallelizing STL. ACM SIGAPP Appl. Comput. Rev. 4, 1, 5--6. Google Scholar
- Benson, D. and Davis, J. 2002. Octree textures. ACM Trans. Graph. 21, 3 (July), 785--790. Google Scholar
- Beyond3D. 2003. DirectX next early preview. Go online to http://www.beyond3d.com/articles/directxnext/.Google Scholar
- Binotto, A. P. D., Comba, J. L. D., and Freitas, C. M. D. 2003. Real-time volume rendering of time-varying data using a fragment-shader compression approach. In Proceedings of the IEEE Symposium on Parallel and Large-Data Visualization and Graphics. 69--75. Google Scholar
- Bolz, J., Farmer, I., Grinspun, E., and Schröder, P. 2003. Sparse matrix solvers on the GPU: Conjugate gradients and multigrid. ACM Trans. Graph. 22, 3 (July), 917--924. Google Scholar
- Boost. 2005. Boost C++ libraries. Go online to http://www.boost.org/.Google Scholar
- Buck, I. 2005. Taking the plunge into GPU computing. In GPU Gems 2, M. Pharr, Ed. Addison-Wesley, Reading, MA, Chap. 32, 509--519.Google Scholar
- Buck, I., Fatahalian, K., and Hanrahan, P. 2004a. GPUBench: Evaluating GPU performance for numerical and scientific applications. In Proceedings of the 2004 ACM Workshop on General-Purpose Computing on Graphics Processors. C-20.Google Scholar
- Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., and Hanrahan, P. 2004b. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. Graph. 23, 3 (Aug.), 777--786. Google Scholar
- Carr, N. A. and Hart, J. C. 2004. Painting detail. ACM Trans. Graph. 23, 3 (Aug.), 845--852. Google Scholar
- Christensen, P. H. and Batali, D. 2004. An irradiance atlas for global illumination in complex production scenes. In Proceedings of Rendering Techniques 2004. 133--141. Google Scholar
- Cole, M. and Parker, S. 2003. Dynamic compilation of C++ template code. In Scientific Programming, vol. 11. IOS Press, Amsterdam, The Netherlands, 321--327. Google Scholar
- Coombe, G., Harris, M. J., and Lastra, A. 2004. Radiosity on graphics hardware. In Proceedings of the 2004 Conference on Graphics Interface. 161--168. Google Scholar
- DeBry, D., Gibbs, J., Petty, D. D., and Robins, N. 2002. Painting and rendering textures on unparameterized models. ACM Trans. Graph. 21, 3 (July), 763--768. Google Scholar
- Ernst, M., Vogelgsang, C., and Greiner, G. 2004. Stack implementation on programmable graphics hardware. In Proceedings of Vision, Modeling, and Visualization. 255--262.Google Scholar
- Fernando, R., Fernandez, S., Bala, K., and Greenberg, D. P. 2001. Adaptive shadow maps. In Proceedings of ACM SIGGRAPH 2001. Computer Graphics Proceedings, Annual Conference Series. 387--390. Google Scholar
- Foley, T. and Sugerman, J. 2005. KD-Tree acceleration structures for a GPU raytracer. In Proceedings of Graphics Hardware 2005. 15--22. Google Scholar
- Goodnight, N., Woolley, C., Lewin, G., Luebke, D., and Humphreys, G. 2003. A multigrid solver for boundary value problems using programmable graphics hardware. In Proceedings of Graphics Hardware 2003. 102--111. Google Scholar
- Govindaraju, N. K., Raghuvanshi, N., Henson, M., Tuft, D., and Manocha, D. 2005. A cache-efficient sorting algorithm for database and data mining computations using graphics processors. Tech. rep. TR05-016. University of North Carolina, Chapel Hill, NC.Google Scholar
- Guenter, B., Knoblock, T. B., and Ruf, E. 1995. Specializing shaders. In Proceedings of SIGGRAPH 95. Computer Graphics Proceedings, Annual Conference Series. 343--350. Google Scholar
- Harris, M. and Buck, I. 2005. GPU flow control idioms. In GPU Gems 2, M. Pharr, Ed. Addison Wesley, Reading, MA, Chap. 34, 547--555.Google Scholar
- Harris, M. J., Baxter III, W., Scheuermann, T., and Lastra, A. 2003. Simulation of cloud dynamics on graphics hardware. In Proceedings of Graphics Hardware 2003. 92--101. Google Scholar
- Hillis, W. D. and Steele Jr., G. L. 1986. Data parallel algorithms. Commun. ACM 29, 12 (Dec.), 1170--1183. Google Scholar
- Hoel, E. G. and Samet, H. 1995. Data-parallel primitives for spatial operations. In Proceedings of the 1995 International Conference on Parallel Processing. III:184--191.Google Scholar
- Horn, D. 2005. Stream reduction operations for GPGPU applications. In GPU Gems 2, M. Pharr, Ed. Addison Wesley, Reading, MA, Chap. 36, 573--589.Google Scholar
- Johnson, G. S., Lee, J., Burns, C. A., and Mark, W. R. 2005. The irregular Z-buffer: Hardware acceleration for irregular data structures. ACM Trans. Graph. 24, 4 (Oct.), 1462--1482. Google Scholar
- Karmesin, S., Haney, S., Humphrey, B., Cummings, J., Williams, T., Crotinger, J., Smith, S., and Gavrilov, E. 2002. Pooma: Parallel object-oriented methods and applications. Go online to http://acts.nersc.gov/pooma/.Google Scholar
- Kendall, R. A., Sosonkina, M., Gropp, W. D., Numrich, R. W., and Sterling, T. 2005. Parallel programming models applicable to cluster computing and beyond. In Numerical Solution of Partial Differential Equations on Parallel Computers, A. M. Bruaset and A. Tveito, Eds. Lecture Notes in Computational Science and Engineering, vol. 51. Springer-Verlag, Berlin, Germany.Google Scholar
- Kessenich, J., Baldwin, D., and Rost, R. 2004. The OpenGL Shading Language version 1.10.59. Go online to http://www.opengl.org/documentation/oglsl.html.Google Scholar
- Kilburn, T., Edwards, D. B. G., Lanigan, M. J., and Sumner, F. H. 1962. One-level storage system. IRE Trans. Electron. Comput. EC-11, 223--235.Google Scholar
- Kraus, M. and Ertl, T. 2002. Adaptive texture maps. In Proceedings of Graphics Hardware 2002. 7--16. Google Scholar
- Lefebvre, S., Hornus, S., and Neyret, F. 2004. All-purpose texture sprites. Tech. rep. 5209. INRIA, Rocquencourt, France.Google Scholar
- Lefohn, A., Kniss, J., and Owens, J. 2005. Implementing efficient parallel data structures on GPUs. In GPU Gems 2, M. Pharr, Ed. Addison Wesley, Reading, MA, Chap. 33, 521--545.Google Scholar
- Lefohn, A. E., Kniss, J. M., Hansen, C. D., and Whitaker, R. T. 2003. Interactive deformation and visualization of level set surfaces using graphics hardware. In Proceedings of IEEE Visualization 2003. 75--82. Google Scholar
- Lefohn, A. E., Kniss, J. M., Hansen, C. D., and Whitaker, R. T. 2004. A streaming narrow-band algorithm: Interactive computation and visualization of level-set surfaces. IEEE Trans. Visual. Comput. Graph. 10, 4 (July/Aug.), 422--433. Google Scholar
- Lindholm, E., Kilgard, M. J., and Moreton, H. 2001. A user-programmable vertex engine. In Proceedings of ACM SIGGRAPH 2001. Computer Graphics Proceedings, Annual Conference Series. 149--158. Google Scholar
- Losasso, F., Gibou, F., and Fedkiw, R. 2004. Simulating water and smoke with an octree data structure. ACM Trans. Graph. 23, 3 (Aug.), 457--462. Google Scholar
- Mark, W. R., Glanville, R. S., Akeley, K., and Kilgard, M. J. 2003. Cg: A system for programming graphics hardware in a C-like language. ACM Trans. Graph. 22, 3 (July), 896--907. Google Scholar
- McCool, M., Toit, S. D., Popa, T., Chan, B., and Moule, K. 2004. Shader algebra. ACM Trans. Graph. 23, 3 (Aug.), 787--795. Google Scholar
- McCormick, P. S., Inman, J., Ahrens, J. P., Hansen, C., and Roth, G. 2004. Scout: A hardware-accelerated system for quantitatively driven visualization and analysis. In Proceedings of IEEE Visualization 2004. 171--178. Google Scholar
- NVIDIA Developer Relations. 2003. Cg: C for graphics. Go online to http://developer.nvidia.com/.Google Scholar
- Owens, J. D., Luebke, D., Govindaraju, N., Harris, M., Krüger, J., Lefohn, A. E., and Purcell, T. 2005. A survey of general-purpose computation on graphics hardware. In Proceedings of Eurographics 2005, State of the Art Reports. 21--51.Google Scholar
- Pharr, M. 2004. An introduction to shader interfaces. In GPU Gems, R. Fernando, Ed. Addison Wesley, Reading, MA, Chap. 32, 537--550.Google Scholar
- Proudfoot, K., Mark, W. R., Tzvetkov, S., and Hanrahan, P. 2001. A real-time procedural shading system for programmable graphics hardware. In Proceedings of ACM SIGGRAPH 2001. Computer Graphics Proceedings, Annual Conference Series. 159--170. Google Scholar
- Purcell, T. J., Buck, I., Mark, W. R., and Hanrahan, P. 2002. Ray tracing on programmable graphics hardware. ACM Trans. Graph. 21, 3 (July), 703--712. Google Scholar
- Purcell, T. J., Donner, C., Cammarano, M., Jensen, H. W., and Hanrahan, P. 2003. Photon mapping on programmable graphics hardware. In Proceedings of Graphics Hardware 2003. 41--50. Google Scholar
- Schneider, J. and Westermann, R. 2003. Compression domain volume rendering. In Proceedings of IEEE Visualization 2003. 293--300. Google Scholar
- Sen, P. 2004. Silhouette maps for improved texture magnification. In Proceedings of Graphics Hardware 2004. 65--74. Google Scholar
- Sen, P., Cammarano, M., and Hanrahan, P. 2003. Shadow silhouette maps. ACM Trans. Graph. 22, 3 (July), 521--526. Google Scholar
- Stamminger, M. and Drettakis, G. 2002. Perspective shadow maps. ACM Trans. Graph. 21, 3 (July), 557--562. Google Scholar
- Strzodka, R. and Telea, A. 2004. Generalized distance transforms and skeletons in graphics hardware. In Proceedings of EG/IEEE TCVG Symposium on Visualization (VisSym '04). 221--230. Google Scholar
- Tarini, M., Hormann, K., Cignoni, P., and Montani, C. 2004. PolyCube-Maps. ACM Trans. Graph. 23, 3 (Aug.), 853--860. Google Scholar
- Thrane, N. and Simonsen, L. O. 2005. A comparison of acceleration structures for GPU assisted ray tracing. M.S. thesis. University of Aarhus, Aarhus, Denmark.Google Scholar
- Wimmer, M., Scherzer, D., and Purgathofer, W. 2004. Light space perspective shadow maps. In Proceedings of the Eurographics Symposium on Rendering. 143--151. Google Scholar
Index Terms
- Glift: Generic, efficient, random-access GPU data structures
Recommendations
Resolution-matched shadow maps
This article presents resolution-matched shadow maps (RMSM), a modified adaptive shadow map (ASM) algorithm, that is practical for interactive rendering of dynamic scenes. Adaptive shadow maps, which build a quadtree of shadow samples to match the ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThe graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
A performance study of general-purpose applications on graphics processors using CUDA
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Comments