Performance and energy task migration model for heterogeneous clusters

Stafford, Esteban; Bosque, José Luis

doi:10.1007/s11227-021-03663-1

Performance and energy task migration model for heterogeneous clusters

Published: 23 February 2021

Volume 77, pages 10053–10064, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

239 Accesses
2 Citations
Explore all metrics

Abstract

This article presents a set of linear regression models to predict the impact of task migration on different objectives, like performance and energy consumption. It allows to establish whether at a given moment the migration of a task is profitable in terms of performance or energy consumption. Also, it can be used to determine the best node to migrate a task depending on the objective. The model uses a small set of parameters that are easily measurable. It has been validated against a small heterogeneous cluster using the Slurm resource manager. The model captures the tendencies observed in the results of the experiments, with average relative errors below 3.5% in execution time and 2.5% in energy consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Designing a MapReduce performance model in distributed heterogeneous platforms based on benchmarking approach

Article 16 January 2020

Optimization for energy-aware design of task scheduling in heterogeneous distributed systems: a meta-heuristic based approach

Article 07 April 2024

Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directions

Article 23 May 2022

Notes

The code is available at https://github.com/dmtcp.

References

Petri S, Langendörfer H (1995) Load balancing and fault tolerance in workstation clusters migrating groups of communicating processes. SIGOPS Oper Syst Rev 29(4):25–36
Article Google Scholar
Harchol-Balter M, Downey AB (1997) Exploiting process lifetime distributions for dynamic load balancing. ACM Trans Comput Syst 15(3):253–285
Article Google Scholar
Milojičić DS, Douglis F, Paindaveine Y, Wheeler R, Zhou S (2000) Process migration. ACM Comput Surv 32(3):241–299
Article Google Scholar
Simon P, Stefan L, Antonello M, Carsten C, Jens B (2016) Application migration in HPC—a driver of the exascale era? Int Conf High Perform Comput Simul HPCS 2016:318–325
Google Scholar
Jiang Y (2016) A survey of task allocation and load balancing in distributed systems. IEEE Trans Parallel Distrib Syst 27(2):585–599
Article Google Scholar
Laredo JLJ, Guinand F, Olivier D, Bouvry P (2017) Load balancing at the edge of chaos: how self-organized criticality can lead to energy-efficient computing. IEEE Trans Parallel Distrib Syst 28(2):517–529
Article Google Scholar
Gladys U, Montse F, Jordi F (2017) Task packing: getting the best from MPI unbalanced applications. In: Proceedings—2017 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, pp 547–550
De Ivanoe F, Eryk L, Richard O, Umberto S, Ernesto T, Marek T (2018) Effective processor load balancing using multi-objective parallel extremal optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’18, pp 1292–1299, New York, NY, USA, ACM
El-Sayed N, Schroeder B (2018) Understanding practical tradeoffs in hpc checkpoint-scheduling policies. IEEE Trans Dependable Secur Comput 15(2):336–350
Article Google Scholar
Bosque JL, Toharia P, Robles OD, Pastor L (2013) A load index and load balancing algorithm for heterogeneous clusters. J Supercomput 65(3):1104–1113
Article Google Scholar
Belgaum MR, Soomro S, Alansari Z, Alam M, Musa S, Su’ud MM (2017) Load balancing with preemptive and non-preemptive task scheduling in cloud computing, pp 1–5
Pérez B, Stafford E, Bosque JL, Beivide R (2017) Energy efficiency of load balancing for data-parallel applications in heterogeneous systems. J Supercomput 73(1):330–342
Article Google Scholar
Cabrera A, Acosta A, Almeida F, Blanco V (2020) A dynamic multi-objective approach for dynamic load balancing in heterogeneous systems. IEEE Trans Parallel Distrib Syst 31(10):2421–2434
Article Google Scholar
Dominik B, Ulrich R (2014) Parallel multiphysics simulations of charged particles in microfluidic flows. J Comput Sci 8:1–19
Google Scholar
Robles OD, Bosque JL, Pastor L, Rodriguez A (2005) Performance analysis of a cbir system on shared-memory systems and heterogeneous clusters. In: Seventh International Workshop on Computer Architecture for Machine Perception (CAMP’05), pp 309–314
Yoo AB, Jette MA, Grondona M (2003) Slurm: simple linux utility for resource management. In: Job Scheduling Strategies for Parallel Processing, pp 44–60. Berlin, Heidelberg
Stafford E, Bosque JL (2020) Improving utilization of heterogeneous clusters. J Supercomput
Ansel J, Arya K, Cooperman G (2009) Dmtcp: transparent checkpointing for cluster computations and the desktop. In: 2009 IEEE International Symposium on Parallel Distributed Processing, pp 1–12
Manuel R-P, Jiajun C, Moríñigo José A, Gene C, Rafael M-G (2019) Job migration in hpc clusters by means of checkpoint/restart. J Supercomput 75(10):6517–6541
Article Google Scholar
Jiannong C, Yinghao L, Minyi G (2005) Process migration for MPI applications based on coordinated checkpoint. Proc Int Conf Parallel Distrib Syst ICPADS 1:306–312
Google Scholar
Nils K, Johannes H, Florian S, Martin B, Christian G, Harald K, Britta N, Ulrich R (2019) A scalable and extensible checkpointing scheme for massively parallel simulations. Int J High Perform Comput Appl 33(4):571–589
Article Google Scholar
Pourghassemi B, Chandramowlishwaran A (2017) Cudacr: an in-kernel application-level checkpoint/restart scheme for cuda-enabled gpus. In: International Conference on Cluster Computing, CLUSTER, pp 725–732. IEEE Computer Society
Ming-Tsung C, Yi-Ping Y (2019) Clpkm: a checkpoint-based preemptive multitasking framework for opencl kernels. J Syst Architect 98:53–62
Article Google Scholar
Chen G, Zhang J, Zhu Z, Q Jiang, Jiang H, Pang C (2020) Crstate: checkpoint/restart of opencl program for in-kernel applications. J Supercomput
Ivanoe DF, Eryk L, Richard O, Umberto S, Ernesto T, Marek T (2015) Extremal optimization applied to load balancing in execution of distributed programs. Appl Soft Comput J 30:501–513
Article Google Scholar
Jens B, Simon P, Stefan L, Antonello M (2017) Dynamic co-scheduling driven by main memory bandwidth utilization
Padoin E, Diener M, Navaux P, Mehaut JF (2019) Managing power demand and load imbalance to save energy on systems with heterogeneous CPU speeds. In: Symposium on Computer Architecture and High Performance Computing, pp 72–79
Chao W, Frank M, Christian E, Scott Stephen L (2012) Proactive process-level live migration and back migration in HPC environments. J Parallel Distrib Comput 72(2):254–267
Article Google Scholar

Download references

Acknowledgements

This work has been supported by the Spanish Science and Technology Commission under contract PID2019-105660RB-C22 and the European HiPEAC Network of Excellence.

Author information

Authors and Affiliations

Department of Computer Science and Electronics, University of Cantabria, Santander, Spain
Esteban Stafford & José Luis Bosque

Authors

Esteban Stafford
View author publications
You can also search for this author in PubMed Google Scholar
José Luis Bosque
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Esteban Stafford.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stafford, E., Bosque, J.L. Performance and energy task migration model for heterogeneous clusters. J Supercomput 77, 10053–10064 (2021). https://doi.org/10.1007/s11227-021-03663-1

Download citation

Accepted: 28 January 2021
Published: 23 February 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11227-021-03663-1

Performance and energy task migration model for heterogeneous clusters

Abstract

Access this article

Similar content being viewed by others

Designing a MapReduce performance model in distributed heterogeneous platforms based on benchmarking approach

Optimization for energy-aware design of task scheduling in heterogeneous distributed systems: a meta-heuristic based approach

Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance and energy task migration model for heterogeneous clusters

Abstract

Access this article

Similar content being viewed by others

Designing a MapReduce performance model in distributed heterogeneous platforms based on benchmarking approach

Optimization for energy-aware design of task scheduling in heterogeneous distributed systems: a meta-heuristic based approach

Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation