ABSTRACT
The future of high-performance computing is likely to rely on the ability to efficiently exploit huge amounts of parallelism. One way of taking advantage of this parallelism is to formulate problems as "embarrassingly parallel" Monte-Carlo simulations, which allow applications to achieve a linear speedup over multiple computational nodes, without requiring a super-linear increase in inter-node communication. However, such applications are reliant on a cheap supply of high quality random numbers, particularly for the three main maximum entropy distributions: uniform, used as a general source of randomness; Gaussian, for discrete-time simulations; and exponential, for discrete-event simulations. In this paper we look at four different types of platform: conventional multi-core CPUs (Intel Core2); GPUs (NVidia GTX 200); FPGAs (Xilinx Virtex-5); and Massively Parallel Processor Arrays (Ambric AM2000). For each platform we determine the most appropriate algorithm for generating each type of number, then calculate the peak generation rate and estimated power efficiency for each device.
- J. H. Ahrens and U. Dieter. Computer methods for sampling from the exponential and normal distributions. Commun. ACM, 15(10):873--882, 1972. Google ScholarDigital Library
- Ambric, Inc. Am2000 Family Architecture Reference, May 2008.Google Scholar
- G. E. P. Box and M. E. Muller. A note on the generation of random normal deviates. Annals of Mathematical Statistics, 29(2):610--611, 1958.Google ScholarCross Ref
- R. P. Brent. Algorithm 488: A Gaussian pseudo-random number generator. Commun. ACM, 17(12):704--706, 1974. Google ScholarDigital Library
- R. Cheung, D. Lee, W. Luk, and J. Villasenor. Hardware generation of arbitrary random number distributions from uniform distributions via the inverson method. IEEE Transactions on VLSI, 15(8):952--962, 2007. Google ScholarDigital Library
- L. Devroye. Non-Uniform Random Variate Generation. Springer-Verlag New York, 1996.Google Scholar
- M. Y. et. al. FPGA implementation of a data-driven stochastic biochemical simulator with the next reaction method. In Proc. Int. Conf. on Field Programmable Logic and Applications, pages 254--259, 2007.Google Scholar
- G. E. Forsythe. Von neumann's comparison method for random sampling from the normal and other distributions. Mathematics of Computation, 26(120):817--826, 1972.Google Scholar
- L. Howes and D. Thomas. GPU Gems 3: Efficient Random Number Generation and Application Using CUDA, chapter 37. Addison-Wesley, 2007.Google Scholar
- P. L'Ecuyer. Tables of maximally equidistributed combined LFSR generators. Mathematics of Computation, 68(225):261--269, 1999. Google ScholarDigital Library
- P. L'Ecuyer. Elsevier Handbooks in Operations Research and Management Science: Simulation, chapter 3: Random Number Generation, pages 55--81. Elsevier Science, 2006.Google Scholar
- P. L'Ecuyer and R. Simard. TestU01 random number test suite. www.iro.umontreal.ca/~simardr/indexe.html, 2007.Google Scholar
- D. Lee, J. Villasenor, W. Luk, and P. Leong. A hardware Gaussian noise generator using the box-muller method and its error analysis. IEEE Transactions on Computers, 55(6):659--671, 2006. Google ScholarDigital Library
- D.-U. Lee, W. Luk, J. D. Villasenor, and P. Y. Cheung. A Gaussian noise generator for hardware-based simulations. IEEE Transactions On Computers, 53(12):1523--1534, december 2004. Google ScholarDigital Library
- D.-U. Lee, W. Luk, J. D. Villasenor, G. Zhang, and P. H. Leong. A hardware Gaussian noise generator using the wallace method. IEEE Transactions on VLSI Systems, 13(8):911--920, 2005. Google ScholarDigital Library
- G. Marsaglia. Xorshift RNGs. Journal of Statistical Software, 8(14):1--6, 2003.Google ScholarCross Ref
- G. Marsaglia and W. W. Tsang. The ziggurat method for generating random variables. Journal of Statistical Software, 5(8):1--7, 2000.Google ScholarCross Ref
- F. Panneton, P. L'Ecuyer, and M. Matsumoto. Improved long-period generators based on linear recurrences modulo 2. ACM Transactions on Mathematical Software, 32(1):1--16, 2006. Google ScholarDigital Library
- R. A. Rueppel. Correlation immunity and the summation generator. In CRYPTO 85, pages 260--272, 1986. Google ScholarDigital Library
- M. Saito and M. Matsumoto. SIMD-oriented fast mersenne twister: a 128-bit pseudorandom number generator. In Monte-Carlo and Quasi-Monte Carlo Methods, pages 607--622, 2006.Google Scholar
- D. B. Thomas and W. Luk. High quality uniform random number generation using LUT optimised state-transition matrices. Journal of VLSI Signal Processing, 47(1), 2007. Google ScholarDigital Library
- D. B. Thomas and W. Luk. Non-uniform random number generation through piecewise linear approximations. IET Computers and Digital Techniques, 1(4):312--321, 2007.Google ScholarCross Ref
- D. B. Thomas and W. Luk. Credit risk modelling using hardware accelerated monte-carlo simulation. In Proc. IEEE Symp. on FPGAs for Custom Computing Machines, 2008. Google ScholarDigital Library
- D. B. Thomas and W. Luk. Sampling from the exponential distribution using independent bernoulli variates. In Proc. Int. Conf. on Field Programmable Logic and Applications, 2008.Google ScholarCross Ref
- D. B. Thomas, W. Luk, P. H. Leong, and J. D. Villasenor. Gaussian random number generators. ACM Computing Surveys, 39(4):11, 2007. Google ScholarDigital Library
- C. S. Wallace. Fast pseudorandom generators for normal and exponential variates. ACM Transactions on Mathematical Software, 22(1):119--127, 1996. Google ScholarDigital Library
- G. L. Zhang, P. H. Leong, D.-U. Lee, J. D. Villasenor, R. C. Cheung, and W. Luk. Ziggurat-based hardware Gaussian random number generator. In Proc. Int. Conf. on Field Programmable Logic and Applications, pages 275--280. IEEE Computer Society Press, 2005.Google Scholar
Index Terms
- A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation
Recommendations
Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysCurrent-generation Deep Neural Networks (DNNs), such as AlexNet and VGG, rely heavily on dense floating-point matrix multiplication (GEMM), which maps well to GPUs (regular parallelism, high TFLOP/s). Because of this, GPUs are widely used for ...
Exploiting Parallelism on GPUs and FPGAs with OmpSs
ANDARE '17: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC SystemsThis paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accelerators. The OmpSs programming model is based on the Mercurium compiler and the Nanos++ runtime. Applications are annotated with compiler directives ...
An OpenCL micro-benchmark suite for GPUs and CPUs
Open computing language (OpenCL) is a new industry standard for task-parallel and data-parallel heterogeneous computing on a variety of modern CPUs, GPUs, DSPs, and other microprocessor designs. OpenCL is vendor independent and hence not specialized for ...
Comments