skip to main content
research-article
Free Access

FinPar: A Parallel Financial Benchmark

Published:27 June 2016Publication History
Skip Abstract Section

Abstract

Commodity many-core hardware is now mainstream, but parallel programming models are still lagging behind in efficiently utilizing the application parallelism. There are (at least) two principal reasons for this. First, real-world programs often take the form of a deeply nested composition of parallel operators, but mapping the available parallelism to the hardware requires a set of transformations that are tedious to do by hand and beyond the capability of the common user. Second, the best optimization strategy, such as what to parallelize and what to efficiently sequentialize, is often sensitive to the input dataset and therefore requires multiple code versions that are optimized differently, which also raises maintainability problems.

This article presents three array-based applications from the financial domain that are suitable for gpgpu execution. Common benchmark-design practice has been to provide the same code for the sequential and parallel versions that are optimized for only one class of datasets. In comparison, we document (1) all available parallelism via nested map-reduce functional combinators, in a simple Haskell implementation that closely resembles the original code structure, (2) the invariants and code transformations that govern the main trade-offs of a data-sensitive optimization space, and (3) report target cpu and multiversion gpgpu code together with an evaluation that demonstrates optimization trade-offs and other difficulties. We believe that this work provides useful insight into the language constructs and compiler infrastructure capable of expressing and optimizing such applications, and we report in-progress work in this direction.

References

  1. Mehdi Amini, Fabien Coelho, Francois Irigoin, and Ronan Keryell. 2011. Static compilation analysis for host-accelerator communication optimization. In Proceedings of the Conference on Languages and Compilers for Parallel Computing (LCPC’11). 237--251.Google ScholarGoogle Scholar
  2. Patrick Bahr, Jost Berthold, and Martin Elsman. 2015. Certified symbolic management of financial multi-party contracts. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Erik Barendsen and Sjaak Smetsers. 1993. Conventional and uniqueness typing in graph rewrite systems. In Foundations of Software Technology and Theoretical Computer Science. Lecture Notes in Computer Science, Vol. 761. Springer, 41--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Basel Committee on Banking Supervision. 2010. Basel III: A Global Regulatory Framework for More Resilient Banks and Banking Systems. Bank for International Settlements, Basel, Switzerland.Google ScholarGoogle Scholar
  5. M. M. Baskaran, J. Ramanujam, and P. Sadayappan. 2010. Automatic C-to-CUDA code generation for affine programs. In Proceedings of the International Conference on Compiler Construction (CC’10). 244--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nathan Bell and Jared Hoberock. 2011. Thrust: A productivity-oriented library for CUDA. In GPU Computing Gems Jade Edition, W.-M. W. Hwu (Ed.). Morgan Kaufmann, San Francisco, CA.Google ScholarGoogle Scholar
  7. Lars Bergstrom and John Reppy. 2012. Nested data-parallelism on the GPU. In Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming (ICFP’12). 247--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. S. Bird. 1987. An introduction to the theory of lists. In Proceedings of the NATO Advanced Study on Logic of Programming and Calculi of Discrete Design. 5--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Black and M. Scholes. 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81, 3, 637--654.Google ScholarGoogle ScholarCross RefCross Ref
  10. Guy Blelloch. 1996. Programming parallel algorithms. Communications of the ACM 39, 3, 85--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Guy E. Blelloch. 1989. Scans as primitive parallel operations. IEEE Transactions on Computers 38, 11, 1526--1538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Guy E. Blelloch. 1990. Prefix Sums and Their Applications. Carnegie Mellon University, Pittsburgh, PA.Google ScholarGoogle Scholar
  13. Guy E. Blelloch, Jonathan C. Hardwick, Jay Sipelstein, Marco Zagha, and Siddhartha Chatterjee. 1994. Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing 21, 1, 4--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cajo J. Braak. 2006. A Markov chain Monte Carlo version of the genetic algorithm differential evolution: Easy Bayesian computing for real parameter spaces. Statistics and Computing 16, 3, 239--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Paul Bratley and Bennett L. Fox. 1988. Algorithm 659 implementing Sobol’s quasirandom sequence generator. ACM Transactions on Mathematical Software 14, 1, 88--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Richard P. Brent. 1973. Algorithms for Minimization without Derivatives. Prentice Hall.Google ScholarGoogle Scholar
  17. Damiano Brigo and Fabio Mercurio. 2006. Interest Rate Models—Theory and Practice: With Smile, Inflation and Credit (2nd ed.). Springer.Google ScholarGoogle Scholar
  18. Manuel M. T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell array codes with multicore GPUs. In Proceedings of the 6th Workshop on Aspects of Multicore Programming (DAMP’11). 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Chicha, M. Lloyd, C. Oancea, and S. M. Watt. 2004. Parametric polymorphism for computer algebra software components. In Proceedings of the International Symposium on Symbolic and Numeric Algorithms for Scientific Computing. 119--130.Google ScholarGoogle Scholar
  20. Koen Claessen, Mary Sheeran, and Bo Joel Svensson. 2012. Expressive array constructs in an embedded GPU kernel programming language. In Proceedings of the 7th Workshop on Declarative Aspects and Applications of Multicore Programming (DAMP’12). 21--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Crank and P. Nicolson. 1947. A practical method for numerical evaluation of solutions of partial differential equations of the heat-conduction type. Mathematical Proceedings of the Cambridge Philosophical Society 43, 1, 50--67.Google ScholarGoogle ScholarCross RefCross Ref
  22. Francis Dang, Hao Yu, and Lawrence Rauchwerger. 2002. The R-LRPD test: Speculative parallelization of partially parallel loops. In Proceedings of the International Parallel and Distributed Processing Symposium (PDPS’02). 20--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Christophe Dubach, Perry Cheng, Rodric Rabbah, David F. Bacon, and Stephen J. Fink. 2012. Compiling a high-level language for GPUs. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI’12). 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Daniel Egloff. 2011. Pricing financial derivatives with high performance finite difference solvers on GPUs. In GPU Computing Gems Jade Edition, W.-M. W. Hwu (Ed.). Morgan Kaufmann, San Francisco, CA, 309--322.Google ScholarGoogle Scholar
  25. V. Elango, F. Rastello, L.-N. Pouchet, J. Ramanujam, and P. Sadayappan. 2015. On characterizing the data access complexity of programs. In Proceedings of the 42nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’15). ACM, New York, NY, 567--580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Martin Elsman and Martin Dybdal. 2014. Compiling a subset of APL into a typed intermediate language. In Proceedings of the 1st International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY’14). ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Martin Elsman and Anders Schack-Nielsen. 2014. Typelets—a rule-based evaluation model for dynamic, statically typed user interfaces. In Proceedings of the International Symposium on Practical Aspects of Declarative Languages (PADL’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Paul Feautrier. 1991. Dataflow analysis of array and scalar references. International Journal of Parallel Programming 20, 1, 23--54.Google ScholarGoogle ScholarCross RefCross Ref
  29. Michael Flænø Werk, Joakim Ahnfelt-Rønne, and Ken Friis Larsen. 2012. An embedded DSL for stochastic processes: Research article. In Proceedings of the 1st ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC’12). ACM, New York, NY, 93--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. B. Giles, G. R. Mudalige, Z. Sharif, G. Markall, and P. H. J. Kelly. 2011. Performance analysis and optimisation of the OP2 framework on many-core architectures. ACM SIGMETRICS Performance Evaluation Review 38, 4, 9--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Paul Glasserman. 2004. Monte Carlo Methods in Financial Engineering. Springer, New York, NY.Google ScholarGoogle Scholar
  32. Clemens Grelck and Sven-Bodo Scholz. 2006. SAC: A functional array language for efficient multithreaded execution. International Journal of Parallel Programming 34, 4, 383--427. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jing Guo, Jeyarajan Thiyagalingam, and Sven-Bodo Scholz. 2011. Breaking the GPU programming barrier with the auto-parallelising SAC compiler. In Proceedings of the 6th Workshop on Declarative Aspects of Multicore Programming (DAMP’11). ACM, New York, NY, 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. G. Hains and L. M. R. Mullin. 1993. Parallel functional programming with arrays. Computer Journal 36, 3, 238--245.Google ScholarGoogle ScholarCross RefCross Ref
  35. Mary W. Hall, Saman P. Amarasinghe, Brian R. Murphy, Shih-Wei Liao, and Monica S. Lam. 2005. Interprocedural parallelization analysis in SUIF. ACM Transactions on Programming Languages and Systems. 27, 4, 662--731. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Troels Henriksen. 2014. Exploiting Functional Invariants to Optimise Parallelism: A Dataflow Approach. Master’s Thesis. DIKU, Copenhagen, Denmark.Google ScholarGoogle Scholar
  37. Troels Henriksen, Martin Elsman, and Cosmin Eugen Oancea. 2014. Size slicing—a hybrid approach to size inference in Futhark. In Proceedings of the 3rd ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC’14). ACM, New York, NY, 31--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Troels Henriksen and Cosmin Eugen Oancea. 2013. A T2 graph-reduction approach to fusion. In Proceedings of the 2nd ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC’13). ACM, New York, NY, 47--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Troels Henriksen and Cosmin Eugen Oancea. 2014. Bounds checking: An instance of hybrid analysis. In Proceedings of the ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY’14). ACM, New York, NY, 88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Roger W. Hockney. 1965. A fast direct solution of Poisson’s equation using Fourier analysis. Journal of the ACM 12, 1, 95--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Hull. 2009. Options, Futures and Other Derivatives. Prentice Hall.Google ScholarGoogle Scholar
  42. Kenneth E. Iverson. 1962. A Programming Language. John Wiley & Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ajay Joshi, Aashish Phansalkar, Lieven Eeckhout, and Lizy Kurian John. 2006. Measuring benchmark similarity using inherent program characteristics. IEEE Transactios on Computers 6, 769--782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. S. Joshi. 2010. Graphical Asian options. Wilmott Journal 2, 2, 97--107.Google ScholarGoogle ScholarCross RefCross Ref
  45. Hee-Seok Kim, Shengzhao Wu, Li-Wen Chang, and Wen-Mei W. Hwu. 2011. A scalable tridiagonal solver for GPUs. In Proceedings of the International Conference on Parallel Processing (ICPP’11). IEEE, Los Alamitos, CA, 444--453. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. A. Lee, C. Yau, M. B. Giles, A. Doucet, and C. C. Holmes. 2010. On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Journal of Computational and Graphical Statistics 19, 4, 769--789.Google ScholarGoogle ScholarCross RefCross Ref
  47. Seyong Lee, Seung-Jai Min, and Rudolf Eigenmann. 2009. OpenMP to GPGPU: A compiler framework for automatic translation and optimization. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’09). 101--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Yuan Lin and David Padua. 2000. Analysis of irregular single-indexed arrays and its applications in compiler optimizations. In Proceedings of the International Conference on Compiler Construction. 202--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Frederik M. Madsen and Andrzej Filinski. 2013. Towards a streaming model for nested data parallelism. In Proceedings of the 2nd ACM SIGPLAN Workshop on Functional High-Performance Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Geoffrey Mainland and Greg Morrisett. 2010. Nikola: Embedding compiled GPU functions in Haskell. In Proceedings of the 3rd ACM International Symposium on Haskell. 67--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Robin Milner, Mads Tofte, Robert Harper, and David MacQueen. 1997. The Definition of Standard ML (Revised). MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Claus Munk. 2007. Introduction to the Numerical Solution of Partial Differential Equations in Finance. Retrieved May 10, 2016, from http://mit.econ.au.dk/vip_htm/cmunk/noter/pdenote.pdf.Google ScholarGoogle Scholar
  53. Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2014. Deterministic Galois: On-demand, portable and parameterless. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Fredrik Nord and Erwin Laure. 2011. Monte Carlo option pricing with graphics processing units. In Proceedings of the International Conference on Parallel Computing (ParCo’11).Google ScholarGoogle Scholar
  55. Cosmin Oancea, Christian Andreetta, Jost Berthold, Alain Frisch, and Fritz Henglein. 2012. Financial software on GPUs: Between Haskell and Fortran. In Proceedings of the Workshop on Functional High-Performance Computing (FHPC’12). ACM, New York, NY, 61--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Cosmin E. Oancea and Lawrence Rauchwerger. 2015. Scalable conditional induction variable (CIV) analysis. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’15). Google ScholarGoogle Scholar
  57. Cosmin E. Oancea, Jason W. A. Selby, Mark Giesbrecht, and Stephen M. Watt. 2005. Distributed models of thread level speculation. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’05), Vol. 5. 920--927.Google ScholarGoogle Scholar
  58. C. E. Oancea and S. M. Watt. 2005. Domains and expressions: An interface between two approaches to computer algebra. In Proceedings of the 2005 International Symposium on Symbolic and Algebraic Computation (ISSAC’05). ACM, New York, NY, 261--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, P. Sadayappan, and N. Vasilache. 2011. Loop transformations: Convexity, pruning and optimization. In Proceedings of the 38th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL’11). ACM, New York, NY, 549--562. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. James Reinders. 2007. Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O’Reilly Media. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi, Sam S. Stone, David B. Kirk, and Wen-Mei W. Hwu. 2008. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’08). 73--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Standard Performance Evaluation Corporation. 2014. SPEC ACCEL. Retrieved May 10, 2016, from https://www.spec.org/accel/.Google ScholarGoogle Scholar
  63. N. M. Steen, G. D. Byrne, and E. M. Gelbard. 1969. Gaussian quadratures for the integrals ∫0 exp( − x2)f(x)dx and ∫b0 exp( − x2)f(x)dx. Mathematics of Computation 23, 661--671.Google ScholarGoogle Scholar
  64. Rainer Storn and Kenneth Price. 1997. Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11, 4, 341--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Sain-Zee Ueng, Melvin Lathara, Sara S. Baghsorkhi, and Wen-Mei W. Hwu. 2008. CUDA-lite: Reducing GPU programming complexity. In Proceedings of the 21st International Workshop on Languages and Compilers for Parallel Computing (LCPC’08). 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Jin Wang and Sudhakar Yalamanchili. 2014. Characterization and analysis of dynamic parallelism in unstructured GPU applications. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’14). 51--60.Google ScholarGoogle ScholarCross RefCross Ref
  67. David Watkins. 1991. Fundamentals of Matrix Computations. Wiley, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. M. J. Wichura. 1988. Algorithm AS 241: The percentage points of the normal distribution. Journal of the Royal Statistical Society: Series C (Applied Statistics) 37, 3, 477--484.Google ScholarGoogle ScholarCross RefCross Ref
  69. Yi Yang, Ping Xiang, Jingfei Kong, and Huiyang Zhou. 2010. A GPGPU compiler for memory optimization and parallelism management. In Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation (PLDI’10). 86--97. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FinPar: A Parallel Financial Benchmark

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Architecture and Code Optimization
          ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 2
          June 2016
          200 pages
          ISSN:1544-3566
          EISSN:1544-3973
          DOI:10.1145/2952301
          Issue’s Table of Contents

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 June 2016
          • Revised: 1 February 2016
          • Accepted: 1 February 2016
          • Received: 1 August 2015
          Published in taco Volume 13, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader