Skip to main content

Advertisement

Log in

Performance and energy task migration model for heterogeneous clusters

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This article presents a set of linear regression models to predict the impact of task migration on different objectives, like performance and energy consumption. It allows to establish whether at a given moment the migration of a task is profitable in terms of performance or energy consumption. Also, it can be used to determine the best node to migrate a task depending on the objective. The model uses a small set of parameters that are easily measurable. It has been validated against a small heterogeneous cluster using the Slurm resource manager. The model captures the tendencies observed in the results of the experiments, with average relative errors below 3.5% in execution time and 2.5% in energy consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. The code is available at https://github.com/dmtcp.

References

  1. Petri S, Langendörfer H (1995) Load balancing and fault tolerance in workstation clusters migrating groups of communicating processes. SIGOPS Oper Syst Rev 29(4):25–36

    Article  Google Scholar 

  2. Harchol-Balter M, Downey AB (1997) Exploiting process lifetime distributions for dynamic load balancing. ACM Trans Comput Syst 15(3):253–285

    Article  Google Scholar 

  3. Milojičić DS, Douglis F, Paindaveine Y, Wheeler R, Zhou S (2000) Process migration. ACM Comput Surv 32(3):241–299

    Article  Google Scholar 

  4. Simon P, Stefan L, Antonello M, Carsten C, Jens B (2016) Application migration in HPC—a driver of the exascale era? Int Conf High Perform Comput Simul HPCS 2016:318–325

    Google Scholar 

  5. Jiang Y (2016) A survey of task allocation and load balancing in distributed systems. IEEE Trans Parallel Distrib Syst 27(2):585–599

    Article  Google Scholar 

  6. Laredo JLJ, Guinand F, Olivier D, Bouvry P (2017) Load balancing at the edge of chaos: how self-organized criticality can lead to energy-efficient computing. IEEE Trans Parallel Distrib Syst 28(2):517–529

    Article  Google Scholar 

  7. Gladys U, Montse F, Jordi F (2017) Task packing: getting the best from MPI unbalanced applications. In: Proceedings—2017 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, pp 547–550

  8. De Ivanoe F, Eryk L, Richard O, Umberto S, Ernesto T, Marek T (2018) Effective processor load balancing using multi-objective parallel extremal optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’18, pp 1292–1299, New York, NY, USA, ACM

  9. El-Sayed N, Schroeder B (2018) Understanding practical tradeoffs in hpc checkpoint-scheduling policies. IEEE Trans Dependable Secur Comput 15(2):336–350

    Article  Google Scholar 

  10. Bosque JL, Toharia P, Robles OD, Pastor L (2013) A load index and load balancing algorithm for heterogeneous clusters. J Supercomput 65(3):1104–1113

    Article  Google Scholar 

  11. Belgaum MR, Soomro S, Alansari Z, Alam M, Musa S, Su’ud MM (2017) Load balancing with preemptive and non-preemptive task scheduling in cloud computing, pp 1–5

  12. Pérez B, Stafford E, Bosque JL, Beivide R (2017) Energy efficiency of load balancing for data-parallel applications in heterogeneous systems. J Supercomput 73(1):330–342

    Article  Google Scholar 

  13. Cabrera A, Acosta A, Almeida F, Blanco V (2020) A dynamic multi-objective approach for dynamic load balancing in heterogeneous systems. IEEE Trans Parallel Distrib Syst 31(10):2421–2434

    Article  Google Scholar 

  14. Dominik B, Ulrich R (2014) Parallel multiphysics simulations of charged particles in microfluidic flows. J Comput Sci 8:1–19

    Google Scholar 

  15. Robles OD, Bosque JL, Pastor L, Rodriguez A (2005) Performance analysis of a cbir system on shared-memory systems and heterogeneous clusters. In: Seventh International Workshop on Computer Architecture for Machine Perception (CAMP’05), pp 309–314

  16. Yoo AB, Jette MA, Grondona M (2003) Slurm: simple linux utility for resource management. In: Job Scheduling Strategies for Parallel Processing, pp 44–60. Berlin, Heidelberg

  17. Stafford E, Bosque JL (2020) Improving utilization of heterogeneous clusters. J Supercomput

  18. Ansel J, Arya K, Cooperman G (2009) Dmtcp: transparent checkpointing for cluster computations and the desktop. In: 2009 IEEE International Symposium on Parallel Distributed Processing, pp 1–12

  19. Manuel R-P, Jiajun C, Moríñigo José A, Gene C, Rafael M-G (2019) Job migration in hpc clusters by means of checkpoint/restart. J Supercomput 75(10):6517–6541

    Article  Google Scholar 

  20. Jiannong C, Yinghao L, Minyi G (2005) Process migration for MPI applications based on coordinated checkpoint. Proc Int Conf Parallel Distrib Syst ICPADS 1:306–312

    Google Scholar 

  21. Nils K, Johannes H, Florian S, Martin B, Christian G, Harald K, Britta N, Ulrich R (2019) A scalable and extensible checkpointing scheme for massively parallel simulations. Int J High Perform Comput Appl 33(4):571–589

    Article  Google Scholar 

  22. Pourghassemi B, Chandramowlishwaran A (2017) Cudacr: an in-kernel application-level checkpoint/restart scheme for cuda-enabled gpus. In: International Conference on Cluster Computing, CLUSTER, pp 725–732. IEEE Computer Society

  23. Ming-Tsung C, Yi-Ping Y (2019) Clpkm: a checkpoint-based preemptive multitasking framework for opencl kernels. J Syst Architect 98:53–62

    Article  Google Scholar 

  24. Chen G, Zhang J, Zhu Z, Q Jiang, Jiang H, Pang C (2020) Crstate: checkpoint/restart of opencl program for in-kernel applications. J Supercomput

  25. Ivanoe DF, Eryk L, Richard O, Umberto S, Ernesto T, Marek T (2015) Extremal optimization applied to load balancing in execution of distributed programs. Appl Soft Comput J 30:501–513

    Article  Google Scholar 

  26. Jens B, Simon P, Stefan L, Antonello M (2017) Dynamic co-scheduling driven by main memory bandwidth utilization

  27. Padoin E, Diener M, Navaux P, Mehaut JF (2019) Managing power demand and load imbalance to save energy on systems with heterogeneous CPU speeds. In: Symposium on Computer Architecture and High Performance Computing, pp 72–79

  28. Chao W, Frank M, Christian E, Scott Stephen L (2012) Proactive process-level live migration and back migration in HPC environments. J Parallel Distrib Comput 72(2):254–267

    Article  Google Scholar 

Download references

Acknowledgements

This work has been supported by the Spanish Science and Technology Commission under contract PID2019-105660RB-C22 and the European HiPEAC Network of Excellence.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Esteban Stafford.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stafford, E., Bosque, J.L. Performance and energy task migration model for heterogeneous clusters. J Supercomput 77, 10053–10064 (2021). https://doi.org/10.1007/s11227-021-03663-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03663-1

Keywords

Navigation