Abstract
Runtime systems that automate the execution of applications on distributed cyberinfrastructures need to make scheduling decisions. Researchers have proposed many scheduling algorithms, but most of them are designed based on analytical models and assumptions that may not hold in practice. The literature is thus rife with algorithms that have been evaluated only within the scope of their underlying assumptions but whose practical effectiveness is unclear. It is thus difficult for developers to decide which algorithm to implement in their runtime systems.
To obviate the above difficulty, we propose an approach by which the runtime system executes, throughout application execution, simulations of this very execution. Each simulation is for a different algorithm in a scheduling algorithm portfolio, and the best algorithm is selected based on simulation results. The main objective of this work is to evaluate the feasibility and potential merit of this portfolio scheduling approach, even in the presence of simulation inaccuracy, when compared to the traditional one-algorithm approach. We perform this evaluation via a case study in the context of scientific workflows. Our main finding is that portfolio scheduling can outperform the best one-algorithm approach even in the presence of relatively large simulation inaccuracies.
This manuscript has been authored in part by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The publisher acknowledges the US government license to provide public access under the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adhikari, M., Amgoth, T., Srirama, S.N.: A survey on scheduling strategies for workflows in cloud environment and emerging trends. ACM Comput. Surv. (CSUR) 52(4), 1–36 (2019)
Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the Spring Joint Computer Conference, 18–20 April, pp. 483–485 (1967)
Arya, L.K., Verma, A.: Workflow scheduling algorithms in cloud environment - A survey. In: Proceedings of Conference on Recent Advances in Engineering and Computational Sciences (2014)
Badia Sala, R.M., Ayguadé Parra, E., Labarta Mancho, J.J.: Workflows for science: A challenge when facing the convergence of HPC and big data. Supercomput. Front. Innovat. 4(1), 27–47 (2017)
Buyya, R., Murshed, M.: GridSim: A toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. Concurr. Comput. Practice Exp. 14(13–15), 1175–1220 (2002)
Calheiros, R.N., Ranjan, R., Beloglazov, A., De Rose, C.A.F., Buyya, R.: CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms. Softw. Pract. Exp. 41(1), 23–50 (2011)
Carastan-Santos, D., de Camargo, R.Y.: Obtaining dynamic scheduling policies with simulation and machine learning. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017. Association for Computing Machinery, New York (2017)
Carothers, C.D., Bauer, D., Pearce, S.: ROSS: a high-performance, low memory, modular time warp system. In: Proceedings of the 14th ACM/IEEE/SCS Workshop of Parallel on Distributed Simulation, pp. 53–60 (2000)
Casanova, H., Giersch, A., Legrand, A., Qinson, M., Suter, F.: Versatile, scalable, and accurate simulation of distributed applications and platforms. J. Paral. Distrib. Comput. 75(10), 2899–2917 (2014)
Casanova, H., et al.: Developing accurate and scalable simulators of production workflow management systems with WRENCH. Future Generat. Comput. Syst. 112, 162–175 (2020)
Coleman, T., Casanova, H., Pottier, L., Kaushik, M., Deelman, E., Ferreira da Silva, R.: Wfcommons: a framework for enabling scientific workflow research and development. Future Generat. Comput. Syst. 128, 16–27 (2022)
Deng, K., Song, J., Ren, K., Iosup, A.: Exploring portfolio scheduling for long-term execution of scientific workloads in IaaS clouds. In: Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–12 (2013)
Eyraud-Dubois, L., Legrand, A.: The Influence of Platform Models on Scheduling Techniques. In: Robert, Y., Vivien, F. (eds.) Introduction to Scheduling, chap. 11, pp. 281–309. CRC Press (2009)
Feitelson, D., Naaman, M.: Self-tuning systems. IEEE Softw. 16(2), 52–60 (1999)
Gaussier, É., Lelong, J., Reis, V., Trystram, D.: Online tuning of EASY-backfilling using queue reordering policies. IEEE Trans. Paral. Distrib. Syst. 29(10), 2304–2316 (2018). https://doi.org/10.1109/TPDS.2018.2820699, https://hal.archives-ouvertes.fr/hal-01963216
Gupta, A., Garg, R.: Workflow scheduling in heterogeneous computing systems: A survey. In: 2017 International Conference on Computing and Communication Technologies for Smart Nation (IC3TSN), pp. 319–326. IEEE (2017)
Hoefler, T., Schneider, T., Lumsdaine, A.: LogGOPSim - simulating large-scale applications in the LogGOPS model. In: Proceedings of the ACM Workshop on Large-Scale System and Application Performance, pp. 597–604, Jun 2010
Kecskemeti, G.: DISSECT-CF: A simulator to foster energy-aware scheduling in infrastructure clouds. Simul. Model. Pract. Theory 58(2), 188–218 (2015)
Kecskemeti, G., Ostermann, S., Prodan, R.: Fostering energy-awareness in simulations behind scientific workflow management systems. In: Proc. of the 7th IEEE/ACM International Conference on Utility and Cloud Computing, pp. 29–38 (2014)
Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A Survey of Data-Intensive Scientific Workflow Management. J. Grid Comput. 13(4), 457–493 (2015). https://doi.org/10.1007/s10723-015-9329-8
Malik, A.W., et al.: Cloudnetsim++: A toolkit for data center simulations in omnet++. In: Proceedings of the 2014 11th Annual High Capacity Optical Networks and Emerging/Enabling Technologies (Photonics for Energy), pp. 104–108 (2014)
Nallakumar, R., Sruthi Priya, K.: A survey on deadline constrained workflow scheduling algorithms in cloud environment. Int. J. Comput. Sci. Trends Technol. 2(5), 44–50 (2014)
Núñez, A., Vázquez-Poletti, J., Caminero, A., Carretero, J., Llorente, I.M.: Design of a new cloud computing simulation platform. In: Proceedings of the 11th International Conference on Computational Science and its Applications, pp. 582–593, Jun 2011
Qayyum, T., Malik, A.W., Khan Khattak, M.A., Khalid, O., Khan, S.U.: FogNetSim++: a toolkit for modeling and simulation of distributed fog environment. IEEE Access 6, 63570–63583 (2018)
Rodriguez, M.A., Buyya, R.: A taxonomy and survey on scheduling algorithms for scientific workflows in Iaas cloud computing environments. Conc. Comput. Pract. Exp. 29(8), e4041 (2017)
Ferreira da Silva, R., et al.: A community roadmap for scientific workflows research and development. In: 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), pp. 81–90 (2021)
Singh, L., Singh, S.: A survey of workflow scheduling algorithms and research issues. Int. J. Comput. Appli. 74(15), 21–28 (2013)
Sinnen, O.: Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing). Wiley-Interscience, USA (2007)
Srinivasan, S., Kettimuthu, R., Subramani, V., Sadayappan, P.: selective reservation strategies for backfill job scheduling. In: Proceedings of Workshop on Job Scheduling Strategies for Parallel Processing, pp. 55–71 (2002)
Streit, A.: The self-tuning dynP job-scheduler. In: Proceedings of 16th International Parallel and Distributed Processing Symposium (2002)
Sukhija, N., Malone, B., Srivastava, S., Banicescu, I., Ciorba, F.M.: Portfolio-based selection of robust dynamic loop scheduling algorithms using machine learning. In: Proceedings of IEEE International Parallel Distributed Processing Symposium Workshops, pp. 1638–1647 (2014)
Talby, D., Feitelson, D.: Improving and stabilizing parallel computer performance using adaptive backfilling. In: Proceedings of 19th IEEE International Parallel and Distributed Processing Symposium (2005)
Tikir, M.M., Laurenzano, M.A., Carrington, L., Snavely, A.: PSINS: an open source event tracer and execution simulator for MPI applications. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 135–148. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03869-3_16
Velho, P., Mello Schnorr, L., Casanova, H., Legrand, A.: On the validity of flow-level tcp network models for grid and cloud simulations. ACM Trans. Model. Comput. Simul. 23(4) (2013)
Existing workflow systems (2022). https://s.apache.org/existing-workflow-systems
Acknowledgments
This work is funded by NSF contracts #2106059 and #2106147: “Collaborative Research: OAC Core: Simulation-driven runtime resource management for distributed workflow applications”; and partially funded by NSF contracts #2103489 and #2103508. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Finally, we thank the NSF Chameleon Cloud for providing time grants to access their resources.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Casanova, H., Wong, Y.C., Pottier, L., da Silva, R.F. (2023). On the Feasibility of Simulation-Driven Portfolio Scheduling for Cyberinfrastructure Runtime Systems. In: Klusáček, D., Julita, C., Rodrigo, G.P. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2022. Lecture Notes in Computer Science, vol 13592. Springer, Cham. https://doi.org/10.1007/978-3-031-22698-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-22698-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22697-7
Online ISBN: 978-3-031-22698-4
eBook Packages: Computer ScienceComputer Science (R0)