Cost optimization approaches for scientific workflow scheduling in cloud and grid computing: A review, classifications, and open issues

https://doi.org/10.1016/j.jss.2015.11.023Get rights and content

Highlights

  • The relevant cost optimization approaches of workflow scheduling are surveyed.

  • An extensive analysis for scientific workflow scheduling aspects and parameters.

  • Recommendations for service consumers and service providers are presented.

  • Future perspectives on the cost-based workflow scheduling approaches are offered.

Abstract

Workflow scheduling in scientific computing systems is one of the most challenging problems that focuses on satisfying user-defined quality of service requirements while minimizing the workflow execution cost. Several cost optimization approaches have been proposed to improve the economic aspect of Scientific Workflow Scheduling (SWFS) in cloud and grid computing. To date, the literature has not yet seen a comprehensive review that focuses on approaches for supporting cost optimization in the context of SWFS in cloud and grid computing. Furthermore, providing valuable guidelines and analysis to understand the cost optimization of SWFS approaches is not well-explored in the current literature. This paper aims to analyze the problem of cost optimization in SWFS by extensively surveying existing SWFS approaches in cloud and grid computing and provide a classification of cost optimization aspects and parameters of SWFS. Moreover, it provides a classification of cost based metrics that are categorized into monetary and temporal cost parameters based on various scheduling stages. We believe that our findings would help researchers and practitioners in selecting the most appropriate cost optimization approach considering identified aspects and parameters. In addition, we highlight potential future research directions in this on-going area of research.

Introduction

Efficient resources utilization remains a key issue in parallel and distributed computing environments. To resolve this issue, an organization needs to focus on finding the most suitable allocation of an application’s tasks to available computational resources. This notion is generally referred as scheduling (Wang, Chang, Lo, Lee, 2013, Liu, Jin, Chen, Liu, Yuan, Yang, 2010a). Optimal scheduling problem is known to be an NP-complete problem (Wu, Liu, Ni, Yuan, Yang, 2013b, Bittencourt, Madeira, 2011, Ramakrishnan, Chase, Gannon, Nurmi, Wolski, 2011). There is no proposed scheduling approach that can achieve an optimal solution within the polynomial time, especially in the case of scheduling large-size tasks (Abrishami, Naghibzadeh, 2013, Yu, Buyya, 2006b). Users can employ different available computational resources to execute the tasks in an efficient manner. However, current limited computational resources lack in accomplishing users’ demands (e.g., strict service completion deadline, and vast amount of required storage) due to the tremendous increase in complexity and size of today’s applications. Consequently, users need to determine an appropriate computational environment that provides the required storage space and computational resources for processing large-scale complex applications.

Grid computing and cloud computing resources can provide an optimal solution that can meet the user’s requirements by providing scalable and flexible solutions for considered applications (Wu et al., 2013b). The cloud computing based task scheduling differs from the grid computing based scheduling in the following two ways:

  • Resource sharing: Cloud computing offers advanced services by sharing resources using the virtualization notion with the help of internet technologies. Consequently, it supports real-time allocation to fully utilize the available resources while improving elasticity of cloud services. Thus, the scheduler in a cloud workflow system needs to consider the virtualization infrastructure (e.g., virtual services and virtual machines) to efficiently facilitate the computational processes. In contrast, grid computing allows allocating a large cluster of resources in a shared mode. Therefore, it supports batch processing and resources will be available once they are released by other users.

  • Cost of resource usage: Cloud computing provides a flexible costing mechanism in considering the user’s requirements (i.e. pay-as-you-go and on-demands services). On the other hand, grid computing follows a quota strategy to determine the accumulated cost of requested services (Foster et al., 2008). Therefore, grid computing has no flexible costing mechanism as in cloud computing.

In the literature, researchers have categorized task-scheduling strategies into two main categories: (i) job-based, and (ii) workflow-based (Liu, Chen, Wu, Ni, Yuan, Yang, 2010b, Deng, Kong, Song, Ren, Yuan, 2011a, Czarnul, 2013, Ma, Gong, Zou, 2009, Viana, de Oliveira, Mattoso, 2011). Job-based scheduling usually focuses on scheduling a set of independent tasks to be executed in a sequence or parallel manner (Sharif, Taheri, Zomaya, Nepal, 2013, Varalakshmi, Ramaswamy, Balasubramanian, Vijaykumar, 2011). In contrast, workflow-based scheduling (or global task scheduling) aims at mapping and managing the execution of inter-dependent (i.e. precedence constraints) tasks on shared resources for applications with higher complexity (Kaur et al., 2011). The workflow can be defined as multiple steps or activities, which are necessary to complete a submitted task. The components of these activities can be any executable instances (e.g. load sets, report sets, programs, and data) with different structures (e.g. process, pipeline, data distribution, data aggregation, and data redistribution). The workflow scheduling attained more attention of researchers compared to job scheduling, since workflow-based scheduling is able to efficiently determine an optimal solution for large and complex applications by considering precedence constraints between potential tasks. Motivated by this, we focused on reviewing workflow-based scheduling in cloud and grid computing. Workflow-based scheduling is commonly represented using a Directed Acyclic Graph (DAG) model (Wu, Liu, Ni, Yuan, Yang, 2013b, Talukder, Kirley, Buyya, 2009, Pandey, Wu, Guru, Buyya, 2010, Yu, Buyya, 2006a, Liu, Ni, Wu, Yuan, Chen, Yang, 2011b). The DAG is usually represented by: DAG={T,E}where T (vertex) is a set of tasks (a task can be any program that the user would like to execute in a workflow application) and E is a set of directed edges between the vertices. T={t0,,tn}E={e1,,em}

Note that there is a data dependency between edges in E. For instance, if there is a directed edge e (i.e. eE) connecting ti and tj (denoted as titj), then ti is considered as a parent and tj as a child. The input data of task j depends on the produced data by the parent task i. Similarly, the complete path from t0 to tn can be represented as: (t0t1),(t1t2),,(tn2tn1),(tn1tn)

In order to execute workflow tasks in cloud and grid computing, it requires tasks mapping to the set of heterogeneous resources, which are commonly used in cloud as a set of Virtual Machines (VMs): VM={vm0,,vmk}

Furthermore, it is crucial to consider the computational cost (in terms of time) of executing the workflow tasks on available heterogeneous VMs along with the communication cost between these VMs.

Traditionally, the information technology staff manually executes workflow tasks, which requires knowledge about resource availability and the estimated starting time for each workflow task (Ranaldo, Zimeo, 2009, Wang, Chang, Lo, Lee, 2013, Miu, Missier, 2012). It is necessary to automate and optimize the workflow scheduling process in order to achieve an efficient Workflow Management System (WfMS). A WfMS defines, manages, and executes workflows on available computing resources, where the workflow execution order is driven by a computer representation of the workflow logic. The WfMS can be implemented for different purposes including process management, process re-design/optimization, system integration, achieving flexibility, and improving maintainability. The main stages of any WfMS are modeling stage, instantiation stage and execution stage (as depicted in Fig. 1) (Liu, 2012). In the modeling stage, scientific processes are redesigned based on cloud workflow specifications which should contain the task definitions, tasks structural representation (e.g. DAG), and user-defined QoS requirements. The cloud workflow service provider will negotiate with the service consumer to finalize Service Level Agreement (SLA). In the instantiation stage, the WfMS selects and reserves the suitable cloud services (from private cloud, public cloud, and hybrid cloud) based on the SLA in order to execute workflow activities as well as satisfy the defined QoS requirements. Finally, at the execution stage, the cloud workflow execution scheduler coordinates the data and control flows according to the workflow specifications obtained from the modeling stage, and employs the candidate software services reserved at the instantiation stage to execute all the workflow activities. The workflow scheduler (i.e. workflow engine) plays a crucial role in scheduling and allocating the given tasks to the available resources by considering their dependencies as modeled using a DAG.

WfMS in cloud and grid computing must have the ability to handle the requests from different application domains such as business workflow applications and scientific workflow applications. The business workflow application (also referred as transaction intensive workflow) has been defined by Workflow Management Coalition (WfMC) as the automation of a business process, in whole or part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules (e.g. bank transactions and insurance claim applications) (Wieczorek, Hoheisel, Prodan, 2009, de Oliveira, Ocaa, Baio, Mattoso, 2012, Chandrakumar, 2013, Frincu, Genaud, Gossa, 2014, Poola, Garg, Buyya, Yang, Ramamohanarao, 2014). Conversely, Scientific Workflow Application (SWFA) (also known as data and computational intensive scientific workflow) mostly implies data flows together with the tasks execution (Wieczorek, Hoheisel, Prodan, 2009, Ma, Gong, Zou, 2009, Malawski, Figiela, Bubak, Deelman, Nabrzyski, 2014, Tolosana-Calasanz, BaAres, Pham, Rana, 2012), including input scripts (scientific program or data), which can be used to produce, analyze and visualize output results. It can provide interactive tools to help scientists better execute their own workflows and view results in real time. In addition, the SWFA simplifies the process for scientists to reuse the same workflows and provide them with an easy-to-use environment to track and share the output results virtually. Thus, SWFAs have been used in different scientific applications including weather forecasting, bioinformatics, geoinformatics, cheminformatics, biomedical informatics, and astrophysics (Wu, Liu, Ni, Yuan, Yang, 2013b, Malawski, Juve, Deelman, Nabrzyski, 2012). To execute SWFA data, high performance resources, such as supercomputers, need to be delivered by the service provider (i.e. infrastructure as a service) (Wu, Liu, Ni, Yuan, Yang, 2013b, Yan, Luo, Hu, Li, Zhang, 2013, Deelman, Juve, Rynge, Voeckler, Berriman, 2013, Bittencourt, Madeira, 2013, Malawski, Juve, Deelman, Nabrzyski, 2012). Therefore, WfMSs using cloud and grid services enable scientists to define multi-stage computational and data processing pipelines that can be executed as resources with predefined quality of service. Consequently, the scheduling process can automate complex analyses, improve application performance, and reduce the time required to obtain the desired results (Sharif, Taheri, Zomaya, Nepal, 2013, Czarnul, 2013, Malik, Nazir, Qureshi, Khan, 2013, Barrett, Howley, Duggan, 2011, Hameed, Khoshkbarforoushha, Ranjan, Jayaraman, Kolodziej, Balaji, Zeadally, Malluhi, Tziritas, Vishnu, 2014). Inspired by this, we surveyed the studies that focused on Scientific Workflow Scheduling (SWFS) in cloud and grid computing.

One of the most challenging problems with SWFS in cloud and grid computing is to optimize the cost of workflow execution (Abrishami, Naghibzadeh, 2013, Yu, Buyya, 2006a, Li, Su, Cheng, Song, Ma, Wang, 2015). The cost optimization challenge of SWFS in cloud computing is a multi-objective cost-aware problem that requires consideration of three main aspects: (i) different users which usually compete for resources within the cloud or grid computing to satisfy QoS constraints, (ii) the inter-dependencies among workflow tasks, and (iii) high communication cost due to the inter-dependencies between the tasks (i.e. data needs to be transferred from one resource to another). However, considering all cost optimization problem related aspects makes the SWFS process more complicated and also requires a high amount of computational resources in terms of computational time (Rahman, Li, Palit, 2011, Senna, Bittencourt, Madeira, 2012, Grandinetti, Pisacane, Sheikhalishahi, 2013, Ma, Gong, Zou, 2009). Inspired by this, a significant number of SWFS approaches have been proposed in the literature, focusing on reducing the overall execution cost of SWFS (Wu, Liu, Ni, Yuan, Yang, 2013b, Zheng, Sakellariou, 2013, Sahar Adabi, Ali, 2014, Szabo, Sheng, Kroeger, Zhang, Yu, 2014a).

The main aim of this paper is to analyze the cost optimization problem in SWFS by extensively surveying the state-of-the-art SWFS approaches in cloud and grid computing. To achieve this aim, we targeted three main objectives: (1) to classify cost optimization approaches based on the relevant aspects; (2) to classify cost parameters into monetary cost (Lingfang, Veeravalli, Xiaorong, 2012, Byun, Kee, Kim, Maeng, 2011a, Liu, 2011, Netjinda, Sirinaovakul, Achalakul, 2012) and temporal cost (Wieczorek, Hoheisel, Prodan, 2008, Liu, Ni, Wu, Yuan, Chen, Yang, 2011b, Liu, Chen, Wu, Ni, Yuan, Yang, 2010b, Li, Su, Cheng, Huang, Zhang, 2011) parameters based on scheduling stages (i.e. pre-scheduling, during scheduling, and post-scheduling); and (3) to identify the correlation between the cost parameters and their profitability to service consumers and service providers. Therefore, classification is used as the survey method to identify and analyze the cost aspects and parameters of SWFS. To achieve the aforementioned objectives, the following research questions were formulated:

RQ 1: What are the relevant cost optimization approaches for scientific workflow scheduling problem? Answering RQ 1 helps researchers to identify the relevant cost optimization approaches for SWFS problem. It also provides a clearer understanding of strengths of the underlying optimization, and limitations for all reviewed approaches (Section 2).

RQ 2: What are the main classifications affecting the cost optimization of scientific workflow scheduling? Answering RQ 2 helps researchers to understand the overall classification of the cost optimization aspect. The first classification emphasizes the scheduling aspects by focusing on cost optimization approaches (Section 3). The second classification categorizes the cost optimization parameters (cost metrics) into two groups, namely monetary and temporal cost parameters. Moreover, the second classification is extended to divide the reviewed approaches into two groups: (i) profitability for service consumers, and (ii) profitability for service providers (Section 4).

RQ 3: What are the main aspects for cost optimization SWFS approaches? Answering RQ3 would help researchers to identify important cost optimization aspects of SWFS problem in cloud and grid computing (Section 3).

RQ 4: What are the main cost optimization parameters of SWFS in cloud and grid computing and how do monetary and temporal cost parameters affect the profitability of cost optimization of SWFS? Answering RQ4 could help researchers to identify the relevant cost optimization parameters based on the purpose of the model, which might be beneficial to service consumers and/or service providers. Additionally, the results of applying classifications on the reviewed cost optimization models will be obtained, providing a complete relation between monetary and temporal costs based on the scheduling stage (Sections 4 and 5).

The SWFS challenges have gained more attention after the emergence of cloud computing area. A significant number of SWFS approaches (Grounds, Antonio, Muehring, 2009, Wu, Liu, Ni, Yuan, Yang, 2013b, Bittencourt, Madeira, 2011, Yuan, Li, Wang, Zhu, 2009, de Oliveira, Viana, Ogasawara, Ocaa, Mattoso, 2013, Lin, Wu, 2013, Viana, de Oliveira, Mattoso, 2011, Chitra, Madhusudhanan, Sakthidharan, Saravanan, 2014) have been proposed that focused on the cost optimization challenge due to the direct impact on the profitability of service consumers and service providers for business and scientific workflows. Cost optimization plays an important role in different aspects within the proposed cost optimization based SWFS approaches, such as computing environment, optimization method, structural representation, scheduling technique, and workload type.

Many review studies addressed the SWFS challenge in grid and cloud computing. Yu and Buyya (2005) reviewed a number of grid computing workflow management systems. In contrast, Prodan and Wieczorek (2010) and Wieczorek et al. (2009) devised a classification of multi-criteria problems for SWFS. Their devised taxonomy classifies the multi-criteria into four different aspects based on the workflow structure (i.e. cost aggregation method, intra-dependence, optimization direction, and interdependence). On the other hand, some review studies focused on optimizing multi-objective criteria from the cost aspect while considering other constraints such as network bandwidth, storage requirements, energy efficiency, robustness, and fault-tolerance. To the best of our knowledge, no cost-specific review has been conducted that completely covers the cost optimization SWFS approaches in cloud and grid computing. Conversely, other works have emphasized the SWFS problem in cloud computing and grid computing (Shenai, 2012, Bardsiri, Hashemi, 2012, Tilak, Patil, 2012). Singh and Singh (2013) reviewed various SWFS algorithms and compared the algorithms according to their type, objective criteria, and environment. In contrast, some reviews (Arya, Verma, 2014, Liu, Zhang, Lin, Qin, 2014b, Fakhfakh, Kacem, Kacem, 2014) only focused on workflow scheduling in cloud computing.

In order to fully cover the body of knowledge of cost optimization problem of workflow scheduling in cloud and grid computing, it is necessary to provide a complete review regarding SWFS challenges, aspects, and parameters. Our prior work focused on cost-aware SWFS challenges (Alkhanak et al., 2015). We have classified the challenges in cloud workflow scheduling by focusing more on scheduling objectives and functionalities of workflow system architecture as well as the QoS challenges. The current work aims at reviewing important aspects and parameters compared to challenges presented in our previous work. As a result, it provides a complete body of knowledge of cost optimization of SWFS in cloud computing. The following are significant differences in our conducted works:

  • Our prior work reviewed approaches on a broader workflow scheduling domain for different types of workflow applications. However, the current work is specifically focused on reviewing approaches for scientific workflow applications.

  • Both works focused on cost optimization in the area of workflow scheduling in the cloud computing. However, in our previous paper, we focused only on the cost challenges (i.e. system performance, system functionality and QoS). In contrast, the current work determines more precisely the cost optimization aspects and parameters specifically for SWFS.

  • In our previous paper, we classified the profitability of cost challenges based on two viewpoints: (i) service consumers, and (ii) service providers. However, the current work classifies the profitability perspective based on the cost parameters of SWFS in cloud and grid computing.

  • The previous work explicitly reported QoS constraints of cost-aware workflow scheduling. In contrast, the current work focuses on classifying the QoS constraints based on two levels of consideration: (i) activity level, and (ii) workflow level.

To the best of our knowledge, no work has focused on providing a comprehensive classification of cost optimization aspects of SWFS. Moreover, none of the researchers has focused thoroughly on comparing cost optimization SWFS approaches in cloud and grid computing environments. A comprehensive taxonomy is required to provide an in-depth understanding of cost optimization aspects, parameters, constraints, and opportunities that can be useful for future researchers.

Motivated by this, we extensively review cost aspects and parameters in SWFS, which would help researchers and scientists by providing a thorough overview of current state-of-the-art works that is beneficial in optimizing the cloud services of SWFS. The following are newly targeted objectives in the current work:

  • Focusing on cost optimization approaches (not challenges as provided in our prior work) specifically in SWFS domain in cloud and grid computing.

  • Comparing existing cost optimization SWFS approaches.

  • Classifying SWFS aspects and cost parameters in cloud and grid computing.

  • Highlighting potential future research issues in the context of SWFS aspects and cost parameters.

It is important to note that this paper extensively reviews cost optimization SWFS approaches in cloud and grid computing. We have followed a systematic literature review methodology to select most suitable papers in this area of research. Moreover, the adopted methodology helped us to avoid missing any important paper(s), which otherwise would probably be missed out if we used a simple survey strategy for selection of papers. First of all, we formulated an initial set of research questions based on our research experience and discussions with field experts. Then the formulated research questions were refined throughout the literature review to be presented in an unambiguous manner. The systematic literature review methodology was adopted by following the guidelines suggested by Kitchenham et al. (2009) to ensure the selection of suitable papers.

We have widely searched various digital library sources to obtain a large pool of relevant potential papers. The main goal of using the selected digital libraries was to make sure that we do not miss out any of the relevant papers (as recommended by Dieste et al., 2009) rather than just focusing on workshop proceedings, conference proceedings and journals. The following are the selected digital libraries that have been covered:

The initial search criteria were devised by formulating a search query (as shown below) and picking important terms, keywords and their synonyms based on our formulated research objectives (i.e. papers should focus on cost optimization aspects and parameters for SWFS). We joined the terms using logical operators, including (AND) and (OR) to formulate our search query and searched on the title field. Note that we refined the query based on the searching facility and conditions provided by the selected digital libraries.

[(workflow scheduling) AND ((cost optimization) AND (parameters) OR (metrics) OR (aspects)) AND ((cloud computing) OR (grid computing)) AND ((approaches) OR (models) OR (algorithms)) AND ((profitability) AND (service consumer) OR (service provider) OR (Utility provider))]

Based on the suggestions of Kitchenham et al. (2009), we only considered and included the papers written in English. The earliest selected primary study was published in 2004. Therefore, we set the start year to 2003 in order to confirm that related studies within this area of research would be included, and the last date was set to 2015. The initial search resulted in collecting 1043 potential papers.

This section presents the inclusion/exclusion criteria, which have been used to select most relevant papers. Based on the objectives of this review, we formulated the following inclusion criteria:

  • The papers targeting the main focus of our review including cost-optimization problems, parameters, and aspects in SWFS were selected for initial evaluation.

  • The papers written in English were considered. It is mainly due to the fact that English is regarded as a standard language in the research community. Moreover, to the best of our knowledge, majority of reputed journals accept papers only written in English language.

  • The publication period starting from the year 2004 to year 2015 was considered since scheduling of cloud and grid computing area emerged in the year 2004. The objective was to provide the most up-to-date view in this field of research.

  • The papers published in peer-reviewed journals, conferences, and workshop proceedings were selected since they have gone through quality evaluation process (i.e. peer-reviewed by field experts).

  • If a paper is published in two different venues with more and less similar contributions, then the latest and complete version of the paper is included. For instance, a paper published in a conference whose extended version is later published in a journal. In this case, we only included the journal version of the paper.

Similarly, we formulated and adopted the following exclusion criteria in order to exclude irrelevant papers:

  • The Grey literature (e.g., work in progress, workshop reports, and technical reports) were excluded due to the lack of technical details. Moreover, there could be a threat associated with Grey literature that no peer-reviewed process might have been adopted.

  • The papers which do not cover cost optimization problems, parameters, and aspects in SWFS in grid and cloud computing were excluded. This is because they do not focus on the defined objectives of this review.

  • The papers written in non-English languages were excluded.

  • Duplicate papers found from different selected digital libraries were manually excluded to avoid reporting similar results.

  • Papers published pre 2004 were excluded since they lack in covering grid and cloud computing.

  • Surveys, systematic literature reviews, and mapping studies were excluded since they lack in presenting any new approach focusing on cost optimization problems in SWFS.

  • Extended abstracts and short papers were excluded due to the lack of technical details.

In order to critically investigate potential papers, we involved three researchers by adopting a three-stage paper selection strategy, as shown in Fig. 2.

Stage 1: First of all, the potential papers (1043) were checked based on the title of the collected papers to remove the duplication. We observed that there was a large number of irrelevant papers due to conflict between the topics. For instance, scheduling term is related to project management or could be relevant to other computational environments (e.g. utility computing, parallel computing). Similarly, the term workflow is related to different types of workflow applications, which are out of the scope of this survey (e.g. business workflow applications). Finally, after Stage 1, we considered 317 papers.

Stage 2: In this stage, the abstracts of the considered papers were checked based on the formulated research objectives. Therefore, we have classified the papers of various types (e.g. original work, survey, systematic literature review, mapping study, and empirical study) into two dimensions based on their research focus including cost optimization aspects and cost optimization parameters. As a result, this left us with 147 papers, which target on the focused dimensions.

Stage 3: In this stage, participating researchers read the full contents of the 147 papers. Finally, they selected 34 papers, which satisfy the defined inclusion/exclusion criteria. We used the final selected papers to create a classification based on the considered aspects and parameters of cost optimization approaches.

This paper is organized as follows: Section 2 presents the relevant reviewed cost optimization SWFS approaches. Section 3 presents the main aspects of cost optimization SWFS approaches in cloud and grid computing. Furthermore, it provides an in-depth discussion for each of these aspects. Section 4 categorizes the cost parameters into monetary and temporal cost parameters. Special emphasis is given to the mathematical models used to calculate cost, which may possibly affect a particular parameter of scientific workflow scheduling system. The obtained results and findings are discussed in Section 5. Finally, Section 6 concludes the review and presents useful open issues for future research.

Section snippets

Reviewed cost optimization SWFS approaches

This section reviews relevant cost optimization Scientific Workflow Scheduling (SWFS) approaches in cloud and grid computing. Table 1 presents the existing approaches, strengths of the underlying optimization, and limitations for all considered approaches.

Cost optimization aspects of SWFS

One of the major aims of SWFS is to reduce the overall cost of running the cloud workflow system (Wu et al., 2013b) especially for complex jobs like scientific workflow. The execution cost must be taken into consideration when scheduling tasks into the resources (Yu, Buyya, 2006b, Yu, Buyya, 2006a). The running cost of an application is minimized through the cost optimization scheduling policies (Salehi and Buyya, 2010). Therefore, several aspects need to be considered while scheduling the

Cost optimization parameters of SWFS

This section critically analyzes the devised classifications for cost optimization parameters of SWFS in cloud and grid computing. A complete discussion on the sub-classification of cost parameters including the monetary cost and temporal cost is presented in Sections 4.1.1 and 4.2.1. Finally, the section provides the correlations between the surveyed cost optimization SWFS approaches and the profitability by extracting their association with cost optimization parameters.

After analyzing the

Discussion and open issues

In this section, we discuss the presented results and findings from the plethora of the current state-of-the-art cost optimization SWFS approaches and devised cost optimization aspects, parameters and experimental assessment for SWFS approaches. The following sections explain the main observations that we have extracted from our analyses.

Conclusion

The cost optimization of Scientific Workflow Scheduling (SWFS) especially in cloud and grid computing remains an important challenge for both service consumers and service providers. The current work analyzes the cost optimization problem for SWFS in cloud and grid computing. After careful selection of the relevant papers in this field of study, we identified the cost optimization aspects of SWFS that should be considered while scheduling workflows in cloud and grid computing. We introduced two

Acknowledgment

This work is developed within the framework of the research project with reference RG114-12ICT, supported by University of Malaya. We would like to thank all the following researchers: Saif Ur Rehman Khan, Saeid Abolfazli, and Zohre Sanaei from University of Malaya, Malaysia, for their valuable feedback.

Ehab Nabiel Alkhanak is a Ph.D. candidate and research assistant in the Department of Software Engineering, Faculty of Computer Science and Information Technology, University of Malaya. He received his Master of Computer Science degree in 2009 from University of Malaya. He is an active member of Software Requirements, Architecture & Reusability Engineering lab. His research interests include workflow scheduling, scientific workflow application, service-oriented architecture, web-services, and

References (146)

  • ShiJ. et al.

    A budget and deadline aware scientific workflow resource provisioning and scheduling mechanism for cloud

    Proceedings of the 2014 IEEE 18th International Conference on Computer Supported Cooperative Work in Design (CSCWD)

    (2014)
  • Amazon EC2 instance types. URL: https://aws.amazon.com/ec2/instance-types (last accessed...
  • AbrishamiM.E.D.H. et al.

    Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds

    Future Gener. Comput. Syst.

    (2013)
  • AbrishamiS. et al.

    Deadline-constrained workflow scheduling in software as a service cloud

    Sci. Iran.

    (2012)
  • AfzalA. et al.

    QoS-constrained stochastic workflow scheduling in enterprise and scientific grids

    Proceedings of the 7th IEEE/ACM International Conference on Grid Computing

    (2006)
  • AlhamazaniK. et al.

    An overview of the commercial cloud monitoring tools: research dimensions, design issues, and state-of-the-art

    Computing

    (2014)
  • AlkhanakE.N. et al.

    Cost-aware challenges for workflow scheduling approaches in cloud computing environments: taxonomy and opportunities

    Future Gener. Comput. Syst.

    (2015)
  • ArabnejadH. et al.

    List scheduling algorithm for heterogeneous systems by an optimistic cost table

    IEEE Trans. Parallel Distrib. Syst.

    (2014)
  • AryaL.K. et al.

    Workflow scheduling algorithms in cloud environment – a survey

    Proceedings of 2014 International Conference on Recent Advances in Engineering and Computational Sciences (RAECS)

    (2014)
  • BardsiriA.K. et al.

    A review of workflow scheduling in cloud computing environment

    Int. J. Comput. Manag. Res.

    (2012)
  • BarrettE. et al.

    A learning architecture for scheduling workflow applications in the cloud

    Proceedings of the 2011 Ninth IEEE European Conference on Web Services (ECOWS)

    (2011)
  • BittencourtL.F. et al.

    Scheduling in hybrid clouds

    IEEE Commun. Mag.

    (2012)
  • BittencourtL.F. et al.

    HCOC: a cost optimization algorithm for workflow scheduling in hybrid clouds

    J. Internet Serv. Appl.

    (2011)
  • BittencourtL.F. et al.

    Using time discretization to schedule scientific workflows in multiple cloud providers

    Proceedings of 2013 IEEE Sixth International Conference on Cloud Computing (CLOUD)

    (2013)
  • BittencourtL.F. et al.

    Scheduling service workflows for cost optimization in hybrid clouds

    Proceedings of 2010 International Conference on Network and Service Management (CNSM)

    (2010)
  • BjorkqvistM. et al.

    Cost-driven service provisioning in hybrid clouds

    Proceedings of 2012 5th IEEE International Conference on Service-Oriented Computing and Applications (SOCA)

    (2012)
  • BrobergJ. et al.

    MetaCDN: harnessing storage clouds for high performance content delivery

    J. Netw. Comput. Appl.

    (2009)
  • ByunE.-K. et al.

    Cost optimized provisioning of elastic resources for application workflows

    Future Gener. Comput. Syst.

    (2011)
  • ByunE.-K. et al.

    Cost optimized provisioning of elastic resources for application workflows

    Future Gener. Comput. Syst.

    (2011)
  • ChandrakumarB.

    Time synchronization on cognitive radio ad hoc networks: a bio-inspired approach

    Computing

    (2013)
  • ChenW.-n. et al.

    An ant colony optimization algorithm for the time-varying workflow scheduling problem in grids

    Proceedings of IEEE Congress on Evolutionary Computation, CEC’09

    (2009)
  • ChenW.-N. et al.

    An ant colony optimization approach to a grid workflow scheduling problem with various QoS requirements

    IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev.

    (2009)
  • ChitraS. et al.

    Local Minima Jump PSO for Workflow Scheduling in Cloud Computing Environments

    (2014)
  • ChoudharyV. et al.

    An approach to improve task scheduling in a decentralized cloud computing environment

    Int. J. Comput. Technol. Appl.

    (2012)
  • ChunlinL. et al.

    QoS based resource scheduling by computational economy in computational grid

    Inf. Process. Lett.

    (2006)
  • CzarnulP.

    Modeling, run-time optimization and execution of distributed workflow applications in the JEE-based beesycluster environment

    J. Supercomput.

    (2013)
  • DeelmanE. et al.

    Comparing futuregrid, amazon EC2, and open science grid for scientific workflows

    Comput. Sci. Eng.

    (2013)
  • DeelmanE. et al.

    Pegasus: a framework for mapping complex scientific workflows onto distributed systems

    Sci. Program.

    (2005)
  • DelavarA.G. et al.

    A goal-oriented workflow scheduling in heterogeneous distributed systems

    Int. J. Comput. Appl.

    (2012)
  • DengK. et al.

    A weighted k-means clustering based co-scheduling strategy towards efficient execution of scientific workflows in collaborative cloud environments

    Proceedings of 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (DASC)

    (2011)
  • DengK. et al.

    A weighted k-means clustering based co-scheduling strategy towards efficient execution of scientific workflows in collaborative cloud environments

    Proceedings of 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (DASC)

    (2011)
  • de OliveiraD. et al.

    Dimensioning the virtual cluster for parallel scientific workflows in clouds

    Proceedings of the 4th ACM workshop on Scientific cloud computing

    (2013)
  • DongF.

    Workflow Scheduling Algorithms in the Grid

    (2009)
  • DuttaK. et al.

    Cost-based decision-making in middleware virtualization environments

    Eur. J. Oper. Res.

    (2011)
  • FakhfakhF. et al.

    Workflow scheduling in cloud computing: a survey

    Proceedings of 2014 IEEE 18th International Enterprise Distributed Object Computing Conference Workshops and Demonstrations (EDOCW)

    (2014)
  • FidaA.

    Workflow Scheduling for Service Oriented Cloud Computing

    (2008)
  • FosterI. et al.

    Cloud computing and grid computing 360-degree compared

    Proceedings of Grid Computing Environments Workshop, GCE’08

    (2008)
  • FrincuM.E. et al.

    On the efficiency of several VM provisioning strategies for workflows with multi-threaded tasks on clouds

    Computing

    (2014)
  • GargS.K. et al.

    Time and cost trade-off management for scheduling parallel applications on utility grids

    Future Gener. Comput. Syst.

    (2010)
  • GenezT.A. et al.

    Workflow scheduling for SaaS/PaaS cloud providers considering two SLA levels

    Proceedings of 2012 IEEE Network Operations and Management Symposium (NOMS)

    (2012)
  • Cited by (137)

    • Energy-efficient virtual-machine mapping algorithm (EViMA) for workflow tasks with deadlines in a cloud environment

      2022, Journal of Network and Computer Applications
      Citation Excerpt :

      A workflow is a sequence of scientific activities that are performed or should be performed to achieve a scientific goal (Konjaang and Xu, 2021). In reality, the components of these activities can be in any executable form (e.g., various datasets, load sets, programs, and report sets) (Alkhanak et al., 2016). Reducing the energy consumption and execution makespan of scientific workflows while keeping the execution cost of workflow tasks within budget is an effective way to address the workflow scheduling problem.

    View all citing articles on Scopus

    Ehab Nabiel Alkhanak is a Ph.D. candidate and research assistant in the Department of Software Engineering, Faculty of Computer Science and Information Technology, University of Malaya. He received his Master of Computer Science degree in 2009 from University of Malaya. He is an active member of Software Requirements, Architecture & Reusability Engineering lab. His research interests include workflow scheduling, scientific workflow application, service-oriented architecture, web-services, and cloud computing.

    Sai Peck Lee is a professor at Faculty of Computer Science and Information Technology, University of Malaya. She obtained her Master of Computer Science from University of Malaya, her Diplôme d’Études Approfondies (D.E.A.) in Computer Science from Université Pierre et Marie Curie (Paris VI) and her Ph.D. degree in Computer Science from Université Panthéon-Sorbonne (Paris I). Her current research interests in Software Engineering include Object-Oriented Techniques and CASE tools, Software Reuse, Requirements Engineering, Application and Persistence Frameworks, Information Systems and Database Engineering. She has published an academic book, a few book chapters as well as more than 100 papers in various local and international conferences and journals. She has been an active member in the reviewer committees and program committees of several local and international conferences. She is currently in several Experts Referee Panels, both locally and internationally.

    Reza Rezaei has a Ph.D. in Software Engineering, from Faculty of Computer Science and Information Technology, University of Malaya. His research interests are interoperability, cloud computing, and service oriented architecture.

    Reza M. Parizi is a consummate technologist with an entrepreneurial spirit. He has applied his insights and expertise to a host of innovative and technology driven projects across start-ups, IT, and education industries. He received a Ph.D. in software engineering in 2012 and M.Sc. and B.Sc. degrees in Software Engineering and Computer Science respectively in 2008 and 2005. He has more than 5 years of working experience in industrial software development and project managing. His research interests are software engineering, cloud engineering, software testing and quality assurance, big data, software traceability, object- and aspect-oriented programming, software development tools, and empirical studies. He has published several research papers in reputable scientific journals and conferences and also has two copyrights to this credit. Dr. Parizi is now applying his skills as an academician and researcher for School of Engineering and Computing Sciences at New York Institute of Technology (NYIT), Nanjing campus, China.

    View full text