Scheduling dynamic workloads in multi-tenant scientific workflow as a service platforms

https://doi.org/10.1016/j.future.2017.05.009Get rights and content

Highlights

  • A dynamic and scalable algorithm to schedule multiple workflows is presented.

  • The algorithm is designed for multi-tenant Workflow as a Service platforms.

  • It aims to minimize the total cost of leased resources while meeting the individual deadline of workflows.

  • The use of containers is proposed to address resource usage inefficiencies.

Abstract

With the advent of cloud computing and the availability of data collected from increasingly powerful scientific instruments, workflows have become a prevailing mean to achieve significant scientific advances at an increased pace. Emerging Workflow as a Service (WaaS) platforms offer scientists a simple, easily accessible, and cost-effective way of deploying their applications in the cloud at anytime and from anywhere. They are multi-tenant frameworks and are designed to manage the execution of a continuous workload of heterogeneous workflows. To achieve this, they leverage the compute, storage, and network resources offered by Infrastructure as a Service (IaaS) providers. Hence, at any given point in time, a WaaS platform should be capable of efficiently scheduling an arbitrarily large number of workflows with different characteristics and quality of service requirements. As a result, we propose a resource provisioning and scheduling strategy designed specifically for WaaS environments. The algorithm is scalable and dynamic to adapt to changes in the environment and workload. It leverages containers to address resource utilization inefficiencies and aims to minimize the overall cost of leasing the infrastructure resources while meeting the deadline constraint of each individual workflow. To the best of our knowledge, this is the first approach that explicitly addresses VM sharing in the context of WaaS by modeling the use of containers in the resource provisioning and scheduling heuristics. Our simulation results demonstrate its responsiveness to environmental uncertainties, its ability to meet deadlines, and its cost-efficiency when compared to a state-of-the-art algorithm.

Introduction

Workflows are defined by a set of computational tasks with dependencies between them and are a commonly used application model in computational science. They enable the analysis of data in a structured and distributed manner and have been successfully used to make significant scientific advances in various fields such as biology, physics, medicine, and astronomy  [1]. Their importance is highlighted in todays big data era as they offer an efficient way of processing and extracting knowledge from the data produced by increasingly powerful tools such as telescopes, particle accelerators, and gravitational wave detectors. Hence, it is common for scientific workflows to be large-scale data and compute intensive applications that are deployed on distributed environments in order to produce results in a reasonable amount of time.

The emergence of cloud computing has brought with it several advantages for the deployment of scientific workflows. In particular, Infrastructure as a Service (IaaS) clouds allow Workflow Management Systems (WMSs) to access a virtually infinite pool of resources that can be acquired, configured, and used as needed and are charged on a pay-per-use basis. IaaS providers offer virtualized compute resources called Virtual Machines (VMs) for lease. They have a predefined CPU, memory, storage, and bandwidth capacity and different resource bundles (i.e., VM types) are available at varying prices. They can be elastically acquired and released and are generally charged per time frame, or billing period. While VMs deliver the compute power, IaaS clouds also offer storage and networking services, providing the necessary infrastructure for the execution of workflow applications.

Scheduling algorithms tailored for scientific workflows are crucial in taking advantage of the benefits offered by clouds and they have been widely studied in recent years. To achieve this, they need not only to focus on the task to resource mapping but also on deciding the number and type of resources to use throughout the execution of the workflow (i.e., resource provisioning). The majority of existing approaches focus on generating resource provisioning and scheduling plans for a single instance of a workflow. They assume application and resource models in which a single user submits a single workflow for execution to a WMS. The WMS is then responsible for provisioning the required resources and mapping tasks to them so that the workflow execution is completed within the Quality of Service (QoS) constraints. While this is a valid model, as the adoption of cloud computing becomes more widespread among the scientific community, new application models are emerging.

In particular, Workflow as a Service (WaaS) is an emerging concept in which the execution of workflows is offered as a service to scientists. WaaS can be classified as an offering either at the Platform as a Service or Software as a Service layers as providers make use of compute, storage, and network resources offered by IaaS vendors to fulfill requests sent to a multi-tenant WMS. Workflows submitted to such WMS belong to different users and are not necessarily related to each other; they may vary in structure, size, input data, application, and QoS requirements among other features. As a result, schedulers should be able to process a workload of workflows with different configurations that are continuously arriving for execution (without assuming that the number and type of workflows are known in advance). A high-level overview of a WaaS platform is depicted in Fig. 1. Frameworks realizing this service model are beginning to appear in the literature. For example, Filgueira et al. [2] present a data-intensive workflow as a service model that enables the easy composition and deployment of stream-based workflow applications on cloud platforms using containers. Similarly, Skyport  [3] is an execution environment capable of managing the execution of multiple workflows in clouds by leveraging Docker containers to address software deployment problems and resource utilization inefficiencies. Other examples include the middleware described by Esteves and Veiga.  [4] and the architecture presented by Wang et al. [5].

Scientific workflows are generally composed of tasks of different types. In practical terms, all tasks of the same type run the same software program; that is, they perform the same set of computations potentially on different data sets. This means that different task types require different software components for their execution. Virtualization allows for the execution environment of these tasks to be easily customized. For instance, hardware-level virtualization can be used in such way that the operating system, software packages, and directory structures, among others, can all be tailored for a specific task and stored as a VM image. This image can then be easily used to deploy VMs capable of executing the task or tasks they were designed for. This is the model considered by the majority of existing workflow scheduling algorithms for clouds. They focus on efficiently leasing and releasing VMs with specific characteristics in order to fulfill a set of QoS requirements and in general assume that all VMs can be deployed using a single VM image that contains all of the software required to execute any workflow task. This assumption is realistic and reasonable when considering the scheduling of a single workflow but not when scheduling multiple workflows from different users.

The main reason for this is the impracticality of tailoring a single VM image to support the execution of different tasks from different workflows (e.g., consider the size of the image and the incompatibility between software components required by different tasks from different workflows). WaaS platforms can adopt different approaches to circumvent this issue. An option is to execute each individual workflow on its own set of dedicated VMs or a set of related workflows on their own dedicated VMs, however, this may result in an inefficient use of resources and higher costs. Another option that addresses this issue is to combine the use of VMs and containers, which are a form of operating system-level virtualization. Containers allow applications to be packaged and configured by providing a virtual environment that has its own CPU, memory, block I/O, and network space. By allowing each task or workflow to have a corresponding container image, a VM can be reused to run tasks belonging to different workflows by launching the corresponding container when a task is due to start its execution on that resource. In this way, resource utilization is maximized by reducing the wastage of idle time slots on leased VMs.

In addition to having a well-defined VM sharing model, algorithms tailored for WaaS platforms should be dynamic as they have no knowledge on the arriving workflows. They should also be scalable and capable of making decisions quickly as the number of tasks that need to be processed at any given point in time may be very large. Another important factor that should be taken into consideration is the efficient auto-scaling and management of VMs in order to increase their utilization as a cost controlling mechanism while still being able to satisfy the QoS requirements of individual workflows. This will potentially result in lower costs for users and higher profit for providers. Finally, algorithms should also address common challenges derived from the resource model offered by clouds such as the abundance and heterogeneity of resources, uncertainties derived from performance variation, VM provisioning delays, and billing period pricing models.

In response to these requirements, we propose EPSM, an Elastic resource Provisioning and Scheduling algorithm for Multiple workflows designed for WaaS platforms. It considers containers to address resource utilization inefficiencies and aims to minimize the overall cost of leasing resources while meeting the independent deadline constraint of workflows continuously arriving for execution. Although there are some existing algorithms designed to schedule multiple workflows, they either explicitly or implicitly assume that each workflow, or workflow type in some cases (i.e., same workflow but different number of tasks), has its own designated resources. To the best of our knowledge, this is the first approach that explicitly addresses VM sharing in the context of WaaS by modeling the use of containers in the resource provisioning and scheduling heuristics. Furthermore, the algorithm is dynamic and scalable and our simulation results demonstrate its responsiveness to environmental uncertainties, its ability to meet deadlines, and its cost-efficiency when compared to a state-of-the-art algorithm.

Section snippets

Related work

The majority of algorithms in the literature focus on optimizing the execution of a single workflow with its own QoS requirements. Hence, the resources are used exclusively for the execution of a single application belonging to a single user. Most of these algorithms have as objectives minimizing the total execution cost while meeting a deadline constraint. Examples include IC-PCP and IC-PCPD2  [6], EIPR  [7], TB  [8], the approach proposed by Dziok et al. [9], and CCA  [10]. To achieve this,

Application and resource models

This work is designed to schedule a continuous workload of scientific workflows submitted by users to a WaaS provider. The workflows may have different characteristics such as application type, number of tasks, input data, and deadline. The WaaS provider leases resources from a public IaaS vendor to fulfill the users’ requests and its goal is to minimize the total cost of renting infrastructure resources while meeting the deadline constraint of each of the submitted workflow applications.

The EPSM algorithm

We propose EPSM, a dynamic heuristic-based algorithm that makes resource provisioning and scheduling decisions to satisfy the deadline of individual workflows while minimizing the cost of leasing VMs. Its simplicity was a main design goal to facilitate its implementation in real-world WaaS frameworks and to ensure its scalability with respect to the number of workflows and tasks. Overall, the algorithm maintains a pool of resources which is scaled in and out based on the current requirements of

Performance evaluation

The performance of our proposal was evaluated using five well-known workflows from different scientific areas. The Montage application from the astronomy field is used to generate custom mosaics of the sky based on a set of input images. Most of its tasks are characterized by being I/O intensive while not requiring much CPU processing capacity. The Ligo workflow from the astrophysics domain is used to detect gravitational waves. It is composed mostly of CPU intensive tasks with high memory

Conclusions and future work

WaaS platforms are emerging with the vision of providing scientists with the ability to deploy their applications for execution in the cloud in a simple and cost-effective manner. They have the potential to revolutionize the way in which scientific workflows are processed by offering a utility-like service that can be accessed on-demand by anyone and from anywhere. An important aspect, as is for any multi-tenant cloud-based framework, is to efficiently manage the execution of workflows

Maria A. Rodriguez is a Post-Doctoral Research Fellow in the Cloud Computing and Distributed Systems (CLOUDS) Laboratory in the Department of Computing Information Systems, The University of Melbourne, Australia. Her research interests include resource management and scheduling in clouds and scientific computing.

References (37)

  • T. Dziok et al.

    Adaptive multi-level workflow scheduling with uncertain task estimates

  • A. Deldari et al.

    Cca: a deadline-constrained workflow scheduling algorithm for multicore resources on the cloud

    J. Supercomput.

    (2016)
  • P. Maechling et al.

    SCEC CyberShake workflowsautomating probabilistic seismic hazard analysis calculations

  • E. Deelman et al.

    The cost of doing science on the cloud: the montage example

  • J.-S. Vöckler et al.

    Experiences using cloud computing for a scientific workflow application

  • I. Pietri et al.

    Energy-constrained provisioning for scientific workflow ensembles

  • Q. Jiang et al.

    Executing large scale scientific workflow ensembles in public clouds

  • P. Bryk et al.

    Storage-aware algorithms for scheduling of workflow ensembles in clouds

    J. Grid Comput.

    (2016)
  • Cited by (82)

    • Multi-criteria scheduling of scientific workflows in the Workflow as a Service platform

      2023, Computers and Electrical Engineering
      Citation Excerpt :

      Similarly, scaling down idle resources decreases cost and energy consumption. To evaluate the efficacy of our proposed MCSW approach, we compare it with two existing algorithms in the literature — EPSM [4], and DDBWS [8]. Both these algorithms schedule workflows in the WaaS platform while keeping makespan and monetary costs under consideration.

    View all citing articles on Scopus

    Maria A. Rodriguez is a Post-Doctoral Research Fellow in the Cloud Computing and Distributed Systems (CLOUDS) Laboratory in the Department of Computing Information Systems, The University of Melbourne, Australia. Her research interests include resource management and scheduling in clouds and scientific computing.

    Rajkumar Buyya is a Professor of Computer Science and Software Engineering and Director of the Cloud Computing and Distributed Systems (CLOUDS) Laboratory at the University of Melbourne, Australia. He is also the founding CEO of Manjrasoft, a spin-off company of the University, commercializing its innovations in Cloud Computing. He has authored over 400 publications and four textbooks. He is one of the highly cited authors in computer science and software engineering worldwide.

    View full text