Abstract
Datacenters often agglutinate sets of nodes with different capabilities, leading to a sub-optimal resource utilization. One of the best ways of improving utilization is to balance the load by taking into account the heterogeneity of these clusters. This article presents a novel way of expressing computational capacity, more adequate for heterogeneous clusters, and also advocates for task migration in order to further improve the utilization. The experimental evaluation shows that both proposals are advantageous and allow improving the utilization of heterogeneous clusters and reducing the makespan to 16.7% and 17.1%, respectively.
Similar content being viewed by others
Notes
The code is available at https://github.com/dmtcp.
References
Beltrán M, Guzmán A, Bosque JL (2006) Dealing with heterogeneity in load balancing algorithms. In: 5th International Symposium on Parallel and Distributed Computing (ISPDC 2006), 6–9 July 2006, Timisoara, Romania, pp 123–132
Deng W, Liu F, Jin H, Liao X, Liu H, Chen L (2012) Lifetime or energy: consolidating servers with reliability control in virtualized cloud datacenters. In: 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings, pp 18–25
Guenter B, Jain N, Williams C (2011) Managing cost, performance, and reliability tradeoffs for energy-aware server provisioning. In: 2011 Proceedings IEEE INFOCOM, pp 1332–1340
Alam T, Raza Z (2016) An adaptive threshold based hybrid load balancing scheme with sender and receiver initiated approach using random information exchange. Concurr Comput: Pract Exp 28(9):2729–2746
Bosque JL, Robles OD, Pastor L, Rodríguez A (2006) Parallel CBIR implementations with load balancing algorithms. J Parallel Distrib Comput 66(8):1062–1075
Martínez J, Almeida F, Garzón E, Acosta A, Blanco V (2011) Adaptive load balancing of iterative computation on heterogeneous nondedicated systems. J Supercomput 58(3):385–393
Belgaum MR, Soomro S, Alansari Z, Musa S, Alam M, Su’ud MM (2019) Load balancing with preemptive and non-preemptive task scheduling in cloud computing. In: CoRR, arXiv:abs/1905.03094
Ungureanu V, Melamed B, Katehakis M (2008) Effective load balancing for cluster-based servers employing job preemption. Perform Eval 65(8):606–622
Gerofi B, Ishikawa Y (2011) Workload adaptive checkpoint scheduling of virtual machine replication. In: 2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing, pp 204–213
Bartuschat Dominik, Rüde Ulrich (2014) Parallel multiphysics simulations of charged particles in microfluidic flows. J Comput Sci 8:1–19
Bosque JL, Toharia P, Robles OD, Pastor L (2013) A load index and load balancing algorithm for heterogeneous clusters. J Supercomput 65(3):1104–1113
Harchol-Balter M, Downey AB (1997) Exploiting process lifetime distributions for dynamic load balancing. ACM Trans Comput Syst 15(3):253–285
Ansel J, Arya K, Cooperman G (2009) DMTCP: transparent checkpointing for cluster computations and the desktop. In: IEEE International Symposium on Parallel and Distributed Processing, Rome, pp 1–12
Jiang Y (2016) A survey of task allocation and load balancing in distributed systems. IEEE Trans Parallel Distrib Syst 27(2):585–599
Cabrera Pérez A, Acosta A, Almeida F, Blanco Pérez V (2019) A heuristic technique to improve energy efficiency with dynamic load balancing. J Supercomput 75(3):1610–1624
Laredo JLJ, Guinand F, Olivier D, Bouvry P (2017) Load balancing at the edge of chaos: how self-organized criticality can lead to energy-efficient computing. IEEE Trans Parallel Distrib Syst 28(2):517–529
Sheetlani J, Khanna MS (2016) Classification of task partitioning and load balancing strategies in distributed parallel computing systems. Int J Comput Syst 3(5):371–375
Mishra P, Singh S, Mishra M, Agarwal S (2013) Comparative analysis of various evolutionary techniques of load balancing: a review. Int J Comput Appl 63(15):8–13
Bosque JL, Toharia P, Robles OD, Pastor L (2013) A load balancing algorithm for heterogeneous clusters. J Supercomput 65(3):1104–1113
Sant’Ana L, Cordeiro D, de Camargo RY (2019) PLB-HAC: dynamic load-balancing for heterogeneous accelerator clusters. In: Euro-Par 2019: 25th International Conference on Parallel and Distributed Computing, Proceedings, pp 197–209
Cocaña Fernández A, Ranilla J, Sánchez L (2015) Energy-efficient allocation of computing node slots in HPC clusters through parameter learning and hybrid genetic fuzzy system modeling. J Supercomput 71(3):1163–1174
Cocaña-Fernández A, Sánchez L, Ranilla J (2016) Leveraging a predictive model of the workload for intelligent slot allocation schemes in energy-efficient HPC clusters. Eng Appl Artif Intell 48:95–105
Cocaña-Fernández A, San José Guiote E, Sánchez L, Ranilla J (2019) Eco-efficient resource management in hpc clusters through computer intelligence techniques. Energies 12:2129
Kohl N, Hötzer J, Schornbaum F, Bauer M, Godenschwager C, Köstler H, Nestler B, Rüde U (2019) A scalable and extensible checkpointing scheme for massively parallel simulations. Int J High Perform Comput Appl 33(4):571–589
De Falco I, Laskowski E, Olejnik R, Scafuri U, Tarantino E, Tudruj M (2018) Effective processor load balancing using multi-objective parallel extremal optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’18, pp 1292–1299
El-Sayed N, Schroeder B (2018) Understanding practical tradeoffs in HPC checkpoint-scheduling policies. IEEE Trans Dependable Secure Comput 15(2):336–350
Acknowledgements
This work has been supported by the Spanish Science and Technology Commission under contracts TIN2016-76635-C2-2-R and TIN2016-81840-REDT (CAPAP-H6 network) and the European HiPEAC Network of Excellence.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Stafford, E., Bosque, J.L. Improving utilization of heterogeneous clusters. J Supercomput 76, 8787–8800 (2020). https://doi.org/10.1007/s11227-020-03175-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03175-4