research-article

A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation

Authors:
David Barrie Thomas

Imperial College London, London, United Kingdom

Imperial College London, London, United Kingdom
View Profile

,
Lee Howes

Imperial College London, London, United Kingdom

Imperial College London, London, United Kingdom
View Profile

,
Wayne Luk

Imperial College London, London, United Kingdom

Imperial College London, London, United Kingdom
View Profile

FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arraysFebruary 2009Pages 63–72https://doi.org/10.1145/1508128.1508139

Published:22 February 2009Publication History

FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Pages 63–72

ABSTRACT

The future of high-performance computing is likely to rely on the ability to efficiently exploit huge amounts of parallelism. One way of taking advantage of this parallelism is to formulate problems as "embarrassingly parallel" Monte-Carlo simulations, which allow applications to achieve a linear speedup over multiple computational nodes, without requiring a super-linear increase in inter-node communication. However, such applications are reliant on a cheap supply of high quality random numbers, particularly for the three main maximum entropy distributions: uniform, used as a general source of randomness; Gaussian, for discrete-time simulations; and exponential, for discrete-event simulations. In this paper we look at four different types of platform: conventional multi-core CPUs (Intel Core2); GPUs (NVidia GTX 200); FPGAs (Xilinx Virtex-5); and Massively Parallel Processor Arrays (Ambric AM2000). For each platform we determine the most appropriate algorithm for generating each type of number, then calculate the peak generation rate and estimated power efficiency for each device.

References

J. H. Ahrens and U. Dieter. Computer methods for sampling from the exponential and normal distributions. Commun. ACM, 15(10):873--882, 1972. Google ScholarDigital Library
Ambric, Inc. Am2000 Family Architecture Reference, May 2008.Google Scholar
G. E. P. Box and M. E. Muller. A note on the generation of random normal deviates. Annals of Mathematical Statistics, 29(2):610--611, 1958.Google ScholarCross Ref
R. P. Brent. Algorithm 488: A Gaussian pseudo-random number generator. Commun. ACM, 17(12):704--706, 1974. Google ScholarDigital Library
R. Cheung, D. Lee, W. Luk, and J. Villasenor. Hardware generation of arbitrary random number distributions from uniform distributions via the inverson method. IEEE Transactions on VLSI, 15(8):952--962, 2007. Google ScholarDigital Library
L. Devroye. Non-Uniform Random Variate Generation. Springer-Verlag New York, 1996.Google Scholar
M. Y. et. al. FPGA implementation of a data-driven stochastic biochemical simulator with the next reaction method. In Proc. Int. Conf. on Field Programmable Logic and Applications, pages 254--259, 2007.Google Scholar
G. E. Forsythe. Von neumann's comparison method for random sampling from the normal and other distributions. Mathematics of Computation, 26(120):817--826, 1972.Google Scholar
L. Howes and D. Thomas. GPU Gems 3: Efficient Random Number Generation and Application Using CUDA, chapter 37. Addison-Wesley, 2007.Google Scholar
P. L'Ecuyer. Tables of maximally equidistributed combined LFSR generators. Mathematics of Computation, 68(225):261--269, 1999. Google ScholarDigital Library
P. L'Ecuyer. Elsevier Handbooks in Operations Research and Management Science: Simulation, chapter 3: Random Number Generation, pages 55--81. Elsevier Science, 2006.Google Scholar
P. L'Ecuyer and R. Simard. TestU01 random number test suite. www.iro.umontreal.ca/~simardr/indexe.html, 2007.Google Scholar
D. Lee, J. Villasenor, W. Luk, and P. Leong. A hardware Gaussian noise generator using the box-muller method and its error analysis. IEEE Transactions on Computers, 55(6):659--671, 2006. Google ScholarDigital Library
D.-U. Lee, W. Luk, J. D. Villasenor, and P. Y. Cheung. A Gaussian noise generator for hardware-based simulations. IEEE Transactions On Computers, 53(12):1523--1534, december 2004. Google ScholarDigital Library
D.-U. Lee, W. Luk, J. D. Villasenor, G. Zhang, and P. H. Leong. A hardware Gaussian noise generator using the wallace method. IEEE Transactions on VLSI Systems, 13(8):911--920, 2005. Google ScholarDigital Library
G. Marsaglia. Xorshift RNGs. Journal of Statistical Software, 8(14):1--6, 2003.Google ScholarCross Ref
G. Marsaglia and W. W. Tsang. The ziggurat method for generating random variables. Journal of Statistical Software, 5(8):1--7, 2000.Google ScholarCross Ref
F. Panneton, P. L'Ecuyer, and M. Matsumoto. Improved long-period generators based on linear recurrences modulo 2. ACM Transactions on Mathematical Software, 32(1):1--16, 2006. Google ScholarDigital Library
R. A. Rueppel. Correlation immunity and the summation generator. In CRYPTO 85, pages 260--272, 1986. Google ScholarDigital Library
M. Saito and M. Matsumoto. SIMD-oriented fast mersenne twister: a 128-bit pseudorandom number generator. In Monte-Carlo and Quasi-Monte Carlo Methods, pages 607--622, 2006.Google Scholar
D. B. Thomas and W. Luk. High quality uniform random number generation using LUT optimised state-transition matrices. Journal of VLSI Signal Processing, 47(1), 2007. Google ScholarDigital Library
D. B. Thomas and W. Luk. Non-uniform random number generation through piecewise linear approximations. IET Computers and Digital Techniques, 1(4):312--321, 2007.Google ScholarCross Ref
D. B. Thomas and W. Luk. Credit risk modelling using hardware accelerated monte-carlo simulation. In Proc. IEEE Symp. on FPGAs for Custom Computing Machines, 2008. Google ScholarDigital Library
D. B. Thomas and W. Luk. Sampling from the exponential distribution using independent bernoulli variates. In Proc. Int. Conf. on Field Programmable Logic and Applications, 2008.Google ScholarCross Ref
D. B. Thomas, W. Luk, P. H. Leong, and J. D. Villasenor. Gaussian random number generators. ACM Computing Surveys, 39(4):11, 2007. Google ScholarDigital Library
C. S. Wallace. Fast pseudorandom generators for normal and exponential variates. ACM Transactions on Mathematical Software, 22(1):119--127, 1996. Google ScholarDigital Library
G. L. Zhang, P. H. Leong, D.-U. Lee, J. D. Villasenor, R. C. Cheung, and W. Luk. Ziggurat-based hardware Gaussian random number generator. In Proc. Int. Conf. on Field Programmable Logic and Applications, pages 275--280. IEEE Computer Society Press, 2005.Google Scholar

Index Terms

A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation
1. Hardware
  1. Hardware validation

Recommendations

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Current-generation Deep Neural Networks (DNNs), such as AlexNet and VGG, rely heavily on dense floating-point matrix multiplication (GEMM), which maps well to GPUs (regular parallelism, high TFLOP/s). Because of this, GPUs are widely used for ...
Read More
Exploiting Parallelism on GPUs and FPGAs with OmpSs
ANDARE '17: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems

This paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accelerators. The OmpSs programming model is based on the Mercurium compiler and the Nanos++ runtime. Applications are annotated with compiler directives ...
Read More
An OpenCL micro-benchmark suite for GPUs and CPUs

Open computing language (OpenCL) is a new industry standard for task-parallel and data-parallel heterogeneous computing on a variety of modern CPUs, GPUs, DSPs, and other microprocessor designs. OpenCL is vendor independent and hence not specialized for ...
Read More

Reviews

Reviewer: Junqing Sun

As random number generators (RNGs) are widely used in Monte Carlo methods and signal processing, high-performance RNGs have been extensively researched, both as algorithms and hardware architectures. Hardware-wise, the performance and frequency of single central processing units (CPUs) is increasing, but power limitations, the memory wall, and other difficulties exist. At the same time, new architectures, such as field-programmable gate arrays (FPGAs), graphics processing units (GPUs), and multicore processors, are adopted for high-performance applications. This paper identifies "the most appropriate RNG for generating the uniform, Gaussian, and exponential distribution, taking into account the characteristics and architecture of each device." Because of their embarrassing parallelism, RNG algorithms are ideal candidates for these architectures. The authors show that because of the intrinsic fine-grained parallelism and reconfigurability, FPGA achieves, impressively, 30 times the performance and 175 times the power efficiency of traditional CPUs. However, it should be noted that RNGs are very rarely implemented alone. When embedded in a system, the overall performance is determined by other factors as well, such as performance of other algorithms and communication overheads. The results of this paper provide useful information for considering RNG algorithms on different platforms and choosing suitable platforms for systems that include RNGs. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
February 2009
302 pages
ISBN:9781605584102
DOI:10.1145/1508128
General Chair:
Paul Chow
University of Toronto, Canada
,
Program Chair:
Peter Cheung
Imperial College London, UK
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 February 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
fpga
gpu
monte-carlo
mppa
random numbers
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate125of627submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 110
  Total Citations
  View Citations
- 2,608
  Total Downloads
- Downloads (Last 12 months)79
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation

FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?

Exploiting Parallelism on GPUs and FPGAs with OmpSs

An OpenCL micro-benchmark suite for GPUs and CPUs

Reviews

Access critical reviews of Computing literature here