FinPar: A Parallel Financial Benchmark

Authors:
Christian Andreetta

Nordea Capital Markets, Copenhagen, Denmark

Nordea Capital Markets, Copenhagen, Denmark
View Profile

,
Vivien Bégot

LexiFi

LexiFi
View Profile

,
Jost Berthold

University of Copenhagen

University of Copenhagen
View Profile

,
Martin Elsman

University of Copenhagen, Copenhagen, Denmark

University of Copenhagen, Copenhagen, Denmark
View Profile

,
Fritz Henglein

University of Copenhagen, Copenhagen, Denmark

University of Copenhagen, Copenhagen, Denmark
View Profile

,
Troels Henriksen

University of Copenhagen, Copenhagen, Denmark

University of Copenhagen, Copenhagen, Denmark
View Profile

,
Maj-Britt Nordfang

University of Copenhagen, Copenhagen, Denmark

University of Copenhagen, Copenhagen, Denmark
View Profile

,
Cosmin E. Oancea

University of Copenhagen, Copenhagen, Denmark

University of Copenhagen, Copenhagen, Denmark
View Profile

ACM Transactions on Architecture and Code Optimization Volume 13 Issue 2Article No.: 18pp 1–27https://doi.org/10.1145/2898354

Published:27 June 2016Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

Commodity many-core hardware is now mainstream, but parallel programming models are still lagging behind in efficiently utilizing the application parallelism. There are (at least) two principal reasons for this. First, real-world programs often take the form of a deeply nested composition of parallel operators, but mapping the available parallelism to the hardware requires a set of transformations that are tedious to do by hand and beyond the capability of the common user. Second, the best optimization strategy, such as what to parallelize and what to efficiently sequentialize, is often sensitive to the input dataset and therefore requires multiple code versions that are optimized differently, which also raises maintainability problems.

This article presents three array-based applications from the financial domain that are suitable for gpgpu execution. Common benchmark-design practice has been to provide the same code for the sequential and parallel versions that are optimized for only one class of datasets. In comparison, we document (1) all available parallelism via nested map-reduce functional combinators, in a simple Haskell implementation that closely resembles the original code structure, (2) the invariants and code transformations that govern the main trade-offs of a data-sensitive optimization space, and (3) report target cpu and multiversion gpgpu code together with an evaluation that demonstrates optimization trade-offs and other difficulties. We believe that this work provides useful insight into the language constructs and compiler infrastructure capable of expressing and optimizing such applications, and we report in-progress work in this direction.

References

Mehdi Amini, Fabien Coelho, Francois Irigoin, and Ronan Keryell. 2011. Static compilation analysis for host-accelerator communication optimization. In Proceedings of the Conference on Languages and Compilers for Parallel Computing (LCPC’11). 237--251.Google Scholar
Patrick Bahr, Jost Berthold, and Martin Elsman. 2015. Certified symbolic management of financial multi-party contracts. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP’15). Google ScholarDigital Library
Erik Barendsen and Sjaak Smetsers. 1993. Conventional and uniqueness typing in graph rewrite systems. In Foundations of Software Technology and Theoretical Computer Science. Lecture Notes in Computer Science, Vol. 761. Springer, 41--51. Google ScholarDigital Library
Basel Committee on Banking Supervision. 2010. Basel III: A Global Regulatory Framework for More Resilient Banks and Banking Systems. Bank for International Settlements, Basel, Switzerland.Google Scholar
M. M. Baskaran, J. Ramanujam, and P. Sadayappan. 2010. Automatic C-to-CUDA code generation for affine programs. In Proceedings of the International Conference on Compiler Construction (CC’10). 244--263. Google ScholarDigital Library
Nathan Bell and Jared Hoberock. 2011. Thrust: A productivity-oriented library for CUDA. In GPU Computing Gems Jade Edition, W.-M. W. Hwu (Ed.). Morgan Kaufmann, San Francisco, CA.Google Scholar
Lars Bergstrom and John Reppy. 2012. Nested data-parallelism on the GPU. In Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming (ICFP’12). 247--258. Google ScholarDigital Library
R. S. Bird. 1987. An introduction to the theory of lists. In Proceedings of the NATO Advanced Study on Logic of Programming and Calculi of Discrete Design. 5--42. Google ScholarDigital Library
F. Black and M. Scholes. 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81, 3, 637--654.Google ScholarCross Ref
Guy Blelloch. 1996. Programming parallel algorithms. Communications of the ACM 39, 3, 85--97. Google ScholarDigital Library
Guy E. Blelloch. 1989. Scans as primitive parallel operations. IEEE Transactions on Computers 38, 11, 1526--1538. Google ScholarDigital Library
Guy E. Blelloch. 1990. Prefix Sums and Their Applications. Carnegie Mellon University, Pittsburgh, PA.Google Scholar
Guy E. Blelloch, Jonathan C. Hardwick, Jay Sipelstein, Marco Zagha, and Siddhartha Chatterjee. 1994. Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing 21, 1, 4--14. Google ScholarDigital Library
Cajo J. Braak. 2006. A Markov chain Monte Carlo version of the genetic algorithm differential evolution: Easy Bayesian computing for real parameter spaces. Statistics and Computing 16, 3, 239--249. Google ScholarDigital Library
Paul Bratley and Bennett L. Fox. 1988. Algorithm 659 implementing Sobol’s quasirandom sequence generator. ACM Transactions on Mathematical Software 14, 1, 88--100. Google ScholarDigital Library
Richard P. Brent. 1973. Algorithms for Minimization without Derivatives. Prentice Hall.Google Scholar
Damiano Brigo and Fabio Mercurio. 2006. Interest Rate Models—Theory and Practice: With Smile, Inflation and Credit (2nd ed.). Springer.Google Scholar
Manuel M. T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell array codes with multicore GPUs. In Proceedings of the 6th Workshop on Aspects of Multicore Programming (DAMP’11). 3--14. Google ScholarDigital Library
Y. Chicha, M. Lloyd, C. Oancea, and S. M. Watt. 2004. Parametric polymorphism for computer algebra software components. In Proceedings of the International Symposium on Symbolic and Numeric Algorithms for Scientific Computing. 119--130.Google Scholar
Koen Claessen, Mary Sheeran, and Bo Joel Svensson. 2012. Expressive array constructs in an embedded GPU kernel programming language. In Proceedings of the 7th Workshop on Declarative Aspects and Applications of Multicore Programming (DAMP’12). 21--30. Google ScholarDigital Library
J. Crank and P. Nicolson. 1947. A practical method for numerical evaluation of solutions of partial differential equations of the heat-conduction type. Mathematical Proceedings of the Cambridge Philosophical Society 43, 1, 50--67.Google ScholarCross Ref
Francis Dang, Hao Yu, and Lawrence Rauchwerger. 2002. The R-LRPD test: Speculative parallelization of partially parallel loops. In Proceedings of the International Parallel and Distributed Processing Symposium (PDPS’02). 20--29. Google ScholarDigital Library
Christophe Dubach, Perry Cheng, Rodric Rabbah, David F. Bacon, and Stephen J. Fink. 2012. Compiling a high-level language for GPUs. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI’12). 1--12. Google ScholarDigital Library
Daniel Egloff. 2011. Pricing financial derivatives with high performance finite difference solvers on GPUs. In GPU Computing Gems Jade Edition, W.-M. W. Hwu (Ed.). Morgan Kaufmann, San Francisco, CA, 309--322.Google Scholar
V. Elango, F. Rastello, L.-N. Pouchet, J. Ramanujam, and P. Sadayappan. 2015. On characterizing the data access complexity of programs. In Proceedings of the 42nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’15). ACM, New York, NY, 567--580. Google ScholarDigital Library
Martin Elsman and Martin Dybdal. 2014. Compiling a subset of APL into a typed intermediate language. In Proceedings of the 1st International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY’14). ACM, New York, NY. Google ScholarDigital Library
Martin Elsman and Anders Schack-Nielsen. 2014. Typelets—a rule-based evaluation model for dynamic, statically typed user interfaces. In Proceedings of the International Symposium on Practical Aspects of Declarative Languages (PADL’14). Google ScholarDigital Library
Paul Feautrier. 1991. Dataflow analysis of array and scalar references. International Journal of Parallel Programming 20, 1, 23--54.Google ScholarCross Ref
Michael Flænø Werk, Joakim Ahnfelt-Rønne, and Ken Friis Larsen. 2012. An embedded DSL for stochastic processes: Research article. In Proceedings of the 1st ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC’12). ACM, New York, NY, 93--102. Google ScholarDigital Library
M. B. Giles, G. R. Mudalige, Z. Sharif, G. Markall, and P. H. J. Kelly. 2011. Performance analysis and optimisation of the OP2 framework on many-core architectures. ACM SIGMETRICS Performance Evaluation Review 38, 4, 9--15. Google ScholarDigital Library
Paul Glasserman. 2004. Monte Carlo Methods in Financial Engineering. Springer, New York, NY.Google Scholar
Clemens Grelck and Sven-Bodo Scholz. 2006. SAC: A functional array language for efficient multithreaded execution. International Journal of Parallel Programming 34, 4, 383--427. Google ScholarDigital Library
Jing Guo, Jeyarajan Thiyagalingam, and Sven-Bodo Scholz. 2011. Breaking the GPU programming barrier with the auto-parallelising SAC compiler. In Proceedings of the 6th Workshop on Declarative Aspects of Multicore Programming (DAMP’11). ACM, New York, NY, 15--24. Google ScholarDigital Library
G. Hains and L. M. R. Mullin. 1993. Parallel functional programming with arrays. Computer Journal 36, 3, 238--245.Google ScholarCross Ref
Mary W. Hall, Saman P. Amarasinghe, Brian R. Murphy, Shih-Wei Liao, and Monica S. Lam. 2005. Interprocedural parallelization analysis in SUIF. ACM Transactions on Programming Languages and Systems. 27, 4, 662--731. Google ScholarDigital Library
Troels Henriksen. 2014. Exploiting Functional Invariants to Optimise Parallelism: A Dataflow Approach. Master’s Thesis. DIKU, Copenhagen, Denmark.Google Scholar
Troels Henriksen, Martin Elsman, and Cosmin Eugen Oancea. 2014. Size slicing—a hybrid approach to size inference in Futhark. In Proceedings of the 3rd ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC’14). ACM, New York, NY, 31--42. Google ScholarDigital Library
Troels Henriksen and Cosmin Eugen Oancea. 2013. A T2 graph-reduction approach to fusion. In Proceedings of the 2nd ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC’13). ACM, New York, NY, 47--58. Google ScholarDigital Library
Troels Henriksen and Cosmin Eugen Oancea. 2014. Bounds checking: An instance of hybrid analysis. In Proceedings of the ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY’14). ACM, New York, NY, 88. Google ScholarDigital Library
Roger W. Hockney. 1965. A fast direct solution of Poisson’s equation using Fourier analysis. Journal of the ACM 12, 1, 95--113. Google ScholarDigital Library
J. Hull. 2009. Options, Futures and Other Derivatives. Prentice Hall.Google Scholar
Kenneth E. Iverson. 1962. A Programming Language. John Wiley & Sons. Google ScholarDigital Library
Ajay Joshi, Aashish Phansalkar, Lieven Eeckhout, and Lizy Kurian John. 2006. Measuring benchmark similarity using inherent program characteristics. IEEE Transactios on Computers 6, 769--782. Google ScholarDigital Library
M. S. Joshi. 2010. Graphical Asian options. Wilmott Journal 2, 2, 97--107.Google ScholarCross Ref
Hee-Seok Kim, Shengzhao Wu, Li-Wen Chang, and Wen-Mei W. Hwu. 2011. A scalable tridiagonal solver for GPUs. In Proceedings of the International Conference on Parallel Processing (ICPP’11). IEEE, Los Alamitos, CA, 444--453. Google ScholarDigital Library
A. Lee, C. Yau, M. B. Giles, A. Doucet, and C. C. Holmes. 2010. On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Journal of Computational and Graphical Statistics 19, 4, 769--789.Google ScholarCross Ref
Seyong Lee, Seung-Jai Min, and Rudolf Eigenmann. 2009. OpenMP to GPGPU: A compiler framework for automatic translation and optimization. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’09). 101--110. Google ScholarDigital Library
Yuan Lin and David Padua. 2000. Analysis of irregular single-indexed arrays and its applications in compiler optimizations. In Proceedings of the International Conference on Compiler Construction. 202--218. Google ScholarDigital Library
Frederik M. Madsen and Andrzej Filinski. 2013. Towards a streaming model for nested data parallelism. In Proceedings of the 2nd ACM SIGPLAN Workshop on Functional High-Performance Computing. Google ScholarDigital Library
Geoffrey Mainland and Greg Morrisett. 2010. Nikola: Embedding compiled GPU functions in Haskell. In Proceedings of the 3rd ACM International Symposium on Haskell. 67--78. Google ScholarDigital Library
Robin Milner, Mads Tofte, Robert Harper, and David MacQueen. 1997. The Definition of Standard ML (Revised). MIT Press, Cambridge, MA. Google ScholarDigital Library
Claus Munk. 2007. Introduction to the Numerical Solution of Partial Differential Equations in Finance. Retrieved May 10, 2016, from http://mit.econ.au.dk/vip_htm/cmunk/noter/pdenote.pdf.Google Scholar
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2014. Deterministic Galois: On-demand, portable and parameterless. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). Google ScholarDigital Library
Fredrik Nord and Erwin Laure. 2011. Monte Carlo option pricing with graphics processing units. In Proceedings of the International Conference on Parallel Computing (ParCo’11).Google Scholar
Cosmin Oancea, Christian Andreetta, Jost Berthold, Alain Frisch, and Fritz Henglein. 2012. Financial software on GPUs: Between Haskell and Fortran. In Proceedings of the Workshop on Functional High-Performance Computing (FHPC’12). ACM, New York, NY, 61--72. Google ScholarDigital Library
Cosmin E. Oancea and Lawrence Rauchwerger. 2015. Scalable conditional induction variable (CIV) analysis. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’15). Google Scholar
Cosmin E. Oancea, Jason W. A. Selby, Mark Giesbrecht, and Stephen M. Watt. 2005. Distributed models of thread level speculation. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’05), Vol. 5. 920--927.Google Scholar
C. E. Oancea and S. M. Watt. 2005. Domains and expressions: An interface between two approaches to computer algebra. In Proceedings of the 2005 International Symposium on Symbolic and Algebraic Computation (ISSAC’05). ACM, New York, NY, 261--269. Google ScholarDigital Library
L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, P. Sadayappan, and N. Vasilache. 2011. Loop transformations: Convexity, pruning and optimization. In Proceedings of the 38th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL’11). ACM, New York, NY, 549--562. Google ScholarDigital Library
James Reinders. 2007. Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O’Reilly Media. Google ScholarDigital Library
Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi, Sam S. Stone, David B. Kirk, and Wen-Mei W. Hwu. 2008. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’08). 73--82. Google ScholarDigital Library
Standard Performance Evaluation Corporation. 2014. SPEC ACCEL. Retrieved May 10, 2016, from https://www.spec.org/accel/.Google Scholar
N. M. Steen, G. D. Byrne, and E. M. Gelbard. 1969. Gaussian quadratures for the integrals ∫^∞₀ exp( − x²)f(x)dx and ∫^b₀ exp( − x²)f(x)dx. Mathematics of Computation 23, 661--671.Google Scholar
Rainer Storn and Kenneth Price. 1997. Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11, 4, 341--359. Google ScholarDigital Library
Sain-Zee Ueng, Melvin Lathara, Sara S. Baghsorkhi, and Wen-Mei W. Hwu. 2008. CUDA-lite: Reducing GPU programming complexity. In Proceedings of the 21st International Workshop on Languages and Compilers for Parallel Computing (LCPC’08). 1--15. Google ScholarDigital Library
Jin Wang and Sudhakar Yalamanchili. 2014. Characterization and analysis of dynamic parallelism in unstructured GPU applications. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’14). 51--60.Google ScholarCross Ref
David Watkins. 1991. Fundamentals of Matrix Computations. Wiley, New York, NY. Google ScholarDigital Library
M. J. Wichura. 1988. Algorithm AS 241: The percentage points of the normal distribution. Journal of the Royal Statistical Society: Series C (Applied Statistics) 37, 3, 477--484.Google ScholarCross Ref
Yi Yang, Ping Xiang, Jingfei Kong, and Huiyang Zhou. 2010. A GPGPU compiler for memory optimization and parallelism management. In Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation (PLDI’10). 86--97. Google ScholarDigital Library

Index Terms

FinPar: A Parallel Financial Benchmark
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
  2. Software organization and properties
    1. Extra-functional properties
      1. Software performance

Recommendations

Financial software on GPUs: between Haskell and Fortran
FHPC '12: Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing

This paper presents a real-world pricing kernel for financial derivatives and evaluates the language and compiler tool chain that would allow expressive, hardware-neutral algorithm implementation and efficient execution on graphics-processing units (GPU)...
Read More
A tuning approach for iterative multiple 3d stencil pipeline on GPUs: Anisotropic Nonlinear Diffusion algorithm as case study

This paper focuses on challenging applications that can be expressed as an iterative pipeline of multiple 3d stencil stages and explores their optimization space on GPUs. For this study, we selected a representative example from the field of digital ...
Read More
MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures
ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

Code transformations, such as loop tiling and loop fusion, are of key importance for the efficient implementation of stencil computations. However, their direct application to a large code base is costly and severely impacts program maintainability. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Architecture and Code Optimization Volume 13, Issue 2
June 2016
200 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2952301
Editor:
Koen De Bosschere
Ghent University
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 June 2016
- Revised: 1 February 2016
- Accepted: 1 February 2016
- Received: 1 August 2015
Published in taco Volume 13, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Data-parallel functional language
fission
fusion
strength reduction
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 679
  Total Downloads
- Downloads (Last 12 months)126
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

FinPar: A Parallel Financial Benchmark

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Financial software on GPUs: between Haskell and Fortran

A tuning approach for iterative multiple 3d stencil pipeline on GPUs: Anisotropic Nonlinear Diffusion algorithm as case study

MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures