MultiMLton: A multicore-aware runtime for standard ML

K. C. SIVARAMAKRISHNAN; LUKASZ ZIAREK; SURESH JAGANNATHAN

doi:10.1017/S0956796814000161

MultiMLton: A multicore-aware runtime for standard ML

Part of: JFP Research Articles

Published online by Cambridge University Press: 18 June 2014

K. C. SIVARAMAKRISHNAN ,

LUKASZ ZIAREK and

SURESH JAGANNATHAN

Show author details

K. C. SIVARAMAKRISHNAN: Affiliation:
Purdue University, West Lafayette, IN, USA (e-mail: chandras@purdue.edu)
LUKASZ ZIAREK: Affiliation:
SUNY Buffalo, NY, USA (e-mail: lziarek@buffalo.edu)
SURESH JAGANNATHAN: Affiliation:
Purdue University, West Lafayette, IN, USA (e-mail: suresh@cs.purdue.edu)

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

MultiMLton is an extension of the MLton compiler and runtime system that targets scalable, multicore architectures. It provides specific support for ACML, a derivative of Concurrent ML that allows for the construction of composable asynchronous events. To effectively manage asynchrony, we require the runtime to efficiently handle potentially large numbers of lightweight, short-lived threads, many of which are created specifically to deal with the implicit concurrency introduced by asynchronous events. Scalability demands also dictate that the runtime minimize global coordination. MultiMLton therefore implements a split-heap memory manager that allows mutators and collectors running on different cores to operate mostly independently. More significantly, MultiMLton exploits the premise that there is a surfeit of available concurrency in ACML programs to realize a new collector design that completely eliminates the need for read barriers, a source of significant overhead in other managed runtimes. These two symbiotic features - a thread design specifically tailored to support asynchronous communication, and a memory manager that exploits lightweight concurrency to greatly reduce barrier overheads - are MultiMLton's key novelties. In this article, we describe the rationale, design, and implementation of these features, and provide experimental results over a range of parallel benchmarks and different multicore architectures including an 864 core Azul Vega 3, and a 48 core non-coherent Intel SCC (Single-Cloud Computer), that justify our design decisions.

Type: Articles
Information: Journal of Functional Programming , Volume 24 , Issue 6: Run-Time Systems and Target Platforms for Functional Languages , November 2014 , pp. 613 - 674

DOI: https://doi.org/10.1017/S0956796814000161 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2014

References

Agrawal, K., He, Y., & Leiserson, C. E. (2007) Adaptive work stealing with parallelism feedback. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. (PPoPP '07). New York, NY, USA: ACM, pp. 112–120.Google Scholar

Anderson, T. A. (2010) Optimizations in a private nursery-based garbage collector. In Proceedings of the 2010 International Symposium on Memory Management. (ISMM '10). New York, NY, USA: ACM, pp. 21–30.Google Scholar

Appel, A. W. (1989) Simple generational garbage collection and fast allocation. Softw. Pract. Exp. 19(February), 171–183.Google Scholar

Armstrong, J., Virding, R., Wikstrom, C., & Williams, M. (1996) Concurrent Programming in Erlang, 2nd ed.Prentice-Hall.Google Scholar

Auhagen, S., Bergstrom, L., Fluet, M., & Reppy, J. (2011) Garbage collection for multicore NUMA machines. In Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness. MSPC '11. New York, NY, USA: ACM, pp. 51–57.Google Scholar

Bacon, D. F., Cheng, P., & Rajan, V. T. (2003) A real-time garbage collector with low overhead and consistent utilization. In Proceedings of the 30th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '03). New York, NY, USA: ACM, pp. 285–298.CrossRef Google Scholar

Baker, Jr. and Henry, G. (1978) List processing in real time on a serial computer. Commun. ACM, 21(April), 280–294.Google Scholar

Baker, M., & Carpenter, B. (2000) MPJ: A proposed java message passing API and environment for high performance computing. In Parallel and Distributed Processing, Rolim, J. (ed), Lecture Notes in Computer Science, vol. 1800. Berlin, Heidelberg: Springer, pp. 552–559.CrossRef Google Scholar

Biagioni, E., Cline, K., Lee, P., Okasaki, C., & Stone, C. (1998) Safe-for-space threads in standard ML. Higher Order Symbo. Comput. 11(2), 209–225.Google Scholar

Blackburn, S. M., & Hosking, A. L. (2004) Barriers: Friend or foe? In Proceedings of the 4th International Symposium on Memory Management (ISMM '04). New York, NY, USA: ACM, pp. 143–151.CrossRef Google Scholar

Blumofe, R. D. & Leiserson, C. E. (1999) Scheduling multithreaded computations by work stealing. J. ACM, 46(5), 720–748.Google Scholar

Boehm, H. (2012) A Garbage Collector for C and C++. Available at: http://www.hpl.hp.com/personal/Hans_Boehm/gc.Google Scholar

Brooks, R. A. (1984) Trading data space for reduced time and code space in real-time garbage collection on stock hardware. In Proceedings of the 1984 ACM Symposium on LISP and Functional Programming (LFP '84). New York, NY, USA: ACM, pp. 256–262.Google Scholar

Bruggeman, C., Waddell, O., & Dybvig, R. K. (1996) Representing control in the presence of one-shot continuations. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation (PLDI '96). New York, NY, USA: ACM, pp. 99–107.Google Scholar

C# Language Specification. (2014) Available at: http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-334.pdf.Google Scholar

Chaudhuri, A. (2009) A concurrent ML library in concurrent haskell. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP '09). New York, NY, USA: ACM, pp. 269–280.Google Scholar

Doligez, D. & Leroy, X. (1993) A concurrent, generational garbage collector for a multithreaded implementation of ML. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '93). New York, NY, USA: ACM, pp. 113–123.Google Scholar

Feeley, M. & Miller, J. S. (1990) A parallel virtual machine for efficient scheme compilation. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming (LFP '90). New York, NY, USA: ACM, pp. 119–130.CrossRef Google Scholar

Felleisen, M. & Friedman, D. (1986) Control operators, the SECD Machine, and the λ-calculus. In Formal Description of Programming Concepts III, pp. 193–217.Google Scholar

Frigo, M., Leiserson, C. E., & Randall, K. H. (1998) The Implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation. PLDI '98. New York, NY, USA: ACM, pp. 212–223.Google Scholar

GHC. (2014) Glasgow Haskell Compiler. Available at: http://www.haskell.org/ghc.Google Scholar

Goldman, R. & Gabriel, R. P. (1988). Qlisp: Experience and new directions. In Proceedings of the ACM/SIGPLAN Conference on Parallel Programming: Experience with Applications, Languages and Systems (PPEALS '88). New York, NY, USA: ACM, pp. 111–123.Google Scholar

Goldstein, S. C., Schauser, K. E., & Culler, D. E. (1996) Lazy threads: Implementing a fast parallel call. J. Parallel and Distrib. Comput. - Special Issue on Multithreading for Multiprocessors, 37(1), 5–20.Google Scholar

Harris, T., Marlow, S., & Jones, S. P. (2005) Haskell on a shared-memory multiprocessor. In Proceedings of the 2005 ACM SIGPLAN Workshop on Haskell. (Haskell '05). New York, NY, USA: ACM, pp. 49–61.Google Scholar

Hartel, P. H., Feeley, M., Alt, M., & Augustsson, L. (1996) Benchmarking implementations of functional languages with “Pseudoknot”, a float-intensive benchmark. J. Funct. Program. 6 (4), 621–655. Available at: http://doc.utwente.nl/55704/.CrossRef Google Scholar

Hot-Split. (2013) Contiguous Stacks in Go. Available at: http://golang.org/s/contigstacks.Google Scholar

Intel. (2012) SCC Platform Overview. Available at: http://communities.intel.com/docs/DOC-5512.Google Scholar

Johnston, W. M., Hanna, J. R. Paul, & Millar, R. J. (2004) Advances in dataflow programming languages. ACM Comput. Surv. 36 (1), 1–34.CrossRef Google Scholar

Jones, R. & King, A. C. (2005) A fast analysis for thread-local garbage collection with dynamic class loading. In Proceedings of the Fifth IEEE International Workshop on Source Code Analysis and Manipulation. Washington, DC, USA: IEEE Computer Society, pp. 129–138.Google Scholar

Kale, L. V. & Krishnan, S. (1993) CHARM++: A portable concurrent object oriented system based on C++. In Proceedings of the Eighth Annual Conference on Object-oriented Programming Systems, Languages, and Applications (OOPSLA '93). New York, NY, USA: ACM, pp. 91–108.Google Scholar

Kranz, D. A., Halstead, R. H. Jr. & Mohr, E. (1989). Mul-T: A high-performance Parallel Lisp. In Proceedings of the ACM SIGPLAN 1989 Conference on Programming Language Design and Implementation (PLDI '89). New York, NY, USA: ACM, pp. 81–90.Google Scholar

Lea, Doug. (1999) Concurrent Programming in Java. Second Edition: Design Principles and Patterns. 2nd edn. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.Google Scholar

Li, G., Delisi, M., Gopalakrishnan, G., & Kirby, R. M. (2008) Formal specification of the MPI-2.0 standard in TLA+. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '08). New York, NY, USA: ACM, pp. 283–284.Google Scholar

Marlow, S. & Peyton Jones, S. (2011) Multicore garbage collection with local heaps. In Proceedings of the 2011 International Symposium on Memory Management (ISMM '11). New York, NY, USA: ACM, pp. 21–32.Google Scholar

McKay, D. P. & Shapiro, S. C. (1980) MULTI - a LISP based multiprocessing system. In Proceedings of the 1980 ACM Conference on LISP and Functional Programming (LFP '80). New York, NY, USA: ACM, pp. 29–37.CrossRef Google Scholar

Miller, J. S. (1988) Implementing a scheme-based Parallel processing system. Int. J. Parallel Program. 17(5), 367–402.Google Scholar

MLton. (2012) The MLton Compiler and Runtime System. Available at: http://www.mlton.org.Google Scholar

Mohr, E., Kranz, D. A. & Halstead, R. H. Jr. (1990). Lazy task creation: A technique for increasing the granularity of Parallel programs. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming. (LFP '90). New York, NY, USA: ACM, pp. 185–197.CrossRef Google Scholar

Nikhil, R. & Arvind, (2001) Implicit Parallel Programming in pH. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.Google Scholar

Raymond, D. J. (2000). SISAL: A safe and efficient language for numerical calculations. Linux J. 2000(80es).Google Scholar

Reppy, J. H. (2007) Concurrent Programming in ML. Cambridge, UK: Cambridge University Press.Google Scholar

Reppy, J., Russo, C. V., & Xiao, Y. (2009) Parallel concurrent ML. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming. ICFP '09. New York, NY, USA: ACM, pp. 257–268.CrossRef Google Scholar

Sansom, P. M. (1991) Dual-mode garbage collection. In Proceedings of the Workshop on the Parallel Implementation of Functional Languages, pp. 283–310.Google Scholar

Sivaramakrishnan, K. C., Ziarek, L., Prasad, R., & Jagannathan, S. (2010) Lightweight asynchrony using parasitic threads. In Proceedings of the 5th ACM SIGPLAN workshop on Declarative Aspects of Multicore Programming (DAMP '10). New York, NY, USA: ACM, pp. 63–72.Google Scholar

Sivaramakrishnan, K. C., Ziarek, L., & Jagannathan, S. (2012) Eliminating read barriers through procrastination and cleanliness. In Proceedings of the 2012 International Symposium on Memory Management (ISMM '12). New York, NY, USA: ACM, pp. 49–60.Google Scholar

Sivaramakrishnan, K. C., Harris, T., Marlow, S. & Peyton Jones, S. (2013) Composable Schedular Activations for Haskell. Tech. rept. Microsoft Research, Cambridge.Google Scholar

Stack, T. (2013) Abandoning segmented stacks in Rust. Available at: https://mail.mozilla.org/pipermail/rust-dev/2013-November/006314.html.Google Scholar

Steele, G. L. Jr. (1975) Multiprocessing compactifying garbage collection. Commun. ACM 18 (9), 495–508.Google Scholar

Steensgaard, B. (2000) Thread-specific heaps for multi-threaded programs. In Proceedings of the 2000 International Symposium on Memory Management (ISMM '00). New York, NY, USA: ACM, pp. 18–24.Google Scholar

Svensson, H., Fredlund, L.-A. & Benac Earle, C. (2010) A unified semantics for future erlang. In Proceedings of the 9th ACM SIGPLAN Workshop on Erlang (Erlang '10). New York, NY, USA: ACM, pp. 23–32.Google Scholar

Syme, D., Granicz, A., & Cisternino, A. (2007) Expert F#. Apress.Google Scholar

Tang, H. & Yang, T. (2001) Optimizing threaded MPI execution on SMP clusters. In Proceedings of the 15th International Conference on Supercomputing. ICS '01. New York, NY, USA: ACM, pp. 381–392.Google Scholar

Wand, M. (1980) Continuation-based multiprocessing. In Proceedings of the 1980 ACM Conference on LISP and Functional Programming (LFP '80). New York, NY, USA: ACM, pp. 19–28.Google Scholar

Ziarek, L., Sivaramakrishnan, K. C., & Jagannathan, S. (2011) Composable asynchronous events. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '11). New York, NY, USA: ACM, pp. 628–639.Google Scholar

Submit a response

Discussions

No Discussions have been published for this article.

Article contents

MultiMLton: A multicore-aware runtime for standard ML

Abstract

References

Discussions

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests