skip to main content
10.1145/1964179.1964189acmotherconferencesArticle/Chapter ViewAbstractPublication PagesgpgpuConference Proceedingsconference-collections
research-article

Floating-point data compression at 75 Gb/s on a GPU

Published:05 March 2011Publication History

ABSTRACT

Numeric simulations often generate large amounts of data that need to be stored or sent to other compute nodes. This paper investigates whether GPUs are powerful enough to make real-time data compression and decompression possible in such environments, that is, whether they can operate at the 32- or 40-Gb/s throughput of emerging network cards. The fastest parallel CPU-based floating-point data compression algorithm operates below 20 Gb/s on eight Xeon cores, which is significantly slower than the network speed and thus insufficient for compression to be practical in high-end networks. As a remedy, we have created the highly parallel GFC compression algorithm for double-precision floating-point data. This algorithm is specifically designed for GPUs. It compresses at a minimum of 75 Gb/s, decompresses at 90 Gb/s and above, and can therefore improve internode communication throughput on current and upcoming networks by fully saturating the interconnection links with compressed data.

References

  1. Aqrawi, A. A. and Elster, A. C. 2010. Accelerating disk access using compression for large seismic datasets on modern GPU and CPU. Para 2010 State of the Art in Scientific and Parallel Computing, extended abstract #131.Google ScholarGoogle Scholar
  2. Balevic, A. 2009. Parallel variable-length encoding on GPGPUs. In Proceedings of the 2009 International Conference on Parallel Processing. Euro-Par'09. Springer-Verlag, Berlin, Heidelberg, 26--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Balevic, A., Rockstroh, L., Wroblewski, M. and S. Simon. 2008. Using arithmetic coding for reduction of resulting simulation data size on massively parallel GPGPUs. In Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer-Verlag, Berlin, Heidelberg, 295--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Burtscher, M. and Ratanaworabhan, P. 2009. FPC: A high-speed compressor for double-precision floating-point data. IEEE Trans. Comput. 58, 1 (January 2009), 18--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Burtscher, M. and Ratanaworabhan, P. 2009. pFPC: A parallel compressor for floating-point data. In Proceedings of the 2009 Data Compression Conference. DCC'09. IEEE Computer Society, Washington, DC, 43--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bzip2. Retrieved February 1, 2011 from http://www.bzip.org/.Google ScholarGoogle Scholar
  7. Castaño, I. 2009. High Quality DXT Compression using OpenCL for CUDA. Whitepaper. NVIDIA Corp. Retrieved February 1, 2011 from http://developer.download.nvidia.com/compute/cuda/3_0/sdk/website/OpenCL/website/OpenCL/src/oclDXTCompression/doc/opencl_dxtc.pdf.Google ScholarGoogle Scholar
  8. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J. W. and Skadron, K. 2008. A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. 68, 10 (October 2008), 1370--1380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. CUDA C Programming Guide 3.2. 2010. Retrieved February 1, 2011 from http://developer.download.nvidia.com/compute/cuda/3_2/too lkit/docs/CUDA_C_Programming_Guide.pdf.Google ScholarGoogle Scholar
  10. FPC 1.1. 2009. Retrieved February 1, 2011 from http://www.csl.cornell.edu/~burtscher/research/FPC/.Google ScholarGoogle Scholar
  11. Gzip. Retrieved February 1, 2011 from http://www.gzip.org/.Google ScholarGoogle Scholar
  12. Harris, M., Sengupta, S. and Owens, J. D. 2007. Parallel prefix sum (scan) with CUDA. NVIDIA GPU Gems 3. Addison-Wesley Professional, chapter 39.Google ScholarGoogle Scholar
  13. InfiniBand Trade Association. 2010. Retrieved February 1, 2011 from http://www.infinibandta.org/content/pages.php?pg=press_room_item&rec_id=679.Google ScholarGoogle Scholar
  14. Ke, J., Burtscher, M. and Speight, E. 2004. Runtime compression of MPI messages to improve the performance and scalability of parallel applications. In Proceedings of the 2004 ACM/IEEE Conference on Supercomputing. SC'04. IEEE Computer Society, Washington, DC, 59--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lietsch, S. and Marquardt, O. 2008. A CUDA-supported approach to remote rendering. In Proceedings of the 3rd International Conference on Advances in Visual Computing. ISVC'07. Springer -Verlag, Berlin, Heidelberg, 724--733. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lindholm, E., Nickolls, J., Oberman, S. and Montrym, J. 2008. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28, 2 (March 2008), 39--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lindstrom, P. and Cohen, J. D. 2010. On-the-fly decompression and rendering of multiresolution terrain. In Proceedings of the 2010 ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games. I3D'10. ACM, New York, NY, 65--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lonestar user guide. Retrieved February 1, 2011 from http://services.tacc.utexas.edu/index.php/lonestar-user-guide.Google ScholarGoogle Scholar
  19. Longhorn user guide. Retrieved February 1, 2011 from http://services.tacc.utexas.edu/index.php/longhorn-user-guide.Google ScholarGoogle Scholar
  20. Lzop. Retrieved February 1, 2011 from http://www.lzop.org/.Google ScholarGoogle Scholar
  21. pFPC v1.0. 2009. Retrieved February 1, 2011 from http://users.ices.utexas.edu/~burtscher/research/pFPC/.Google ScholarGoogle Scholar
  22. Scientific IEEE 754 64-Bit Double-Precision Floating-Point Datasets. 2009. Retrieved February 1, 2011 from http://www.csl.cornell.edu/~burtscher/research/FPC/datasets. html.Google ScholarGoogle Scholar
  23. Top500 fastest supercomputers. Retrieved February 1, 2011 from http://www.top500.org/.Google ScholarGoogle Scholar

Index Terms

  1. Floating-point data compression at 75 Gb/s on a GPU

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            GPGPU-4: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
            March 2011
            101 pages
            ISBN:9781450305693
            DOI:10.1145/1964179

            Copyright © 2011 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 5 March 2011

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate57of129submissions,44%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader