Maintenance reliability estimation for a cloud computing network with nodes failure

https://doi.org/10.1016/j.eswa.2011.04.230Get rights and content

Abstract

The cloud computing network (CCN) has become a new paradigm for the business and clients as the development of information technology. To guarantee the CCN keep a good quality of service (QoS), the maintenance action is necessary when the CCN falls to a specific state such that it cannot afford enough capacity to meet demand d. In the CCN, edges and nodes have various capacities due to failure, partial failure, or maintenance; thus, the CCN has several possible states. This paper proposes an algorithm to estimate the performance of a CCN under maintenance budget with nodes failure. Hence, the maintenance reliability is developed to measure the capability that the CCN can send d units of data from the cloud to the client through multiple paths under the maintenance budget and time constraints. Furthermore, the system supervisor can conduct the sensitive analysis to improve/investigate the most important part in a large CCN afterwards.

Highlights

► An estimated maintenance reliability is proposed to evaluate the performance of a cloud computing network. ► A maintenance model is proposed to guarantee the network keep a sufficient capacity. ► Node failure case is considered in this paper. ► Data is transmitted through multiple paths to shorten the transmission time.

Introduction

In decades, the development of information technology has grown rapidly and explosively. As the need for the large amount of resources (computing resource, storage capacity, or network bandwidth), cloud computing is therefore developed for the enormous requirements. In a cloud computing paradigm, information is processed or stored by servers on the internet and cached temporarily on clients (Hewitt, 2008). All the resources are provided by powerful servers which can be depicted as the “cloud” in a cloud computing network (CCN). For a stable network environment, it is necessary for internet service providers to guarantee the CCN keep a good quality of service (QoS) and satisfy their customers/clients all the times.

For a practical CCN, the capacity of each edge (physical lines, fiber optics, or coaxial cables) and node (servers or switches) should be stochastic due to failure, partial failure, or maintenance. Thus, each edge/node has various capacities or states (Chen and Lin, 2009, Lin, 2004, Lin, 2007, Lin, 2010, Jane et al., 1993, Xue, 1985). To guarantee the CCN keeps a stable QoS, it should be maintained when falling to a specific state such that the cloud cannot provide enough capacity to fulfill the client’s demand d. Thus, the maintenance budget should be considered. According to Yeh’s (2004) definition, the maintenance cost is the overall cost of restoring a network from its failed state back to its original state, where the failed state is that the network sends less than the given d units of data. That is, the edges/nodes in the CCN should be recovered to their highest capacities when only d units of data can be sent.

Furthermore, the transmission time to send data from the cloud to the client is another important issue to be concentrated. When data are transmitted through a CCN, it is necessary to select a shortest delayed path to minimize the transmission time (Bodin et al., 1982, Golden and Magnanti, 1977). However, the flow of data transmission is not considered in these works. In order to find a path which sends the given amount of data from the cloud to the client with minimum transmission time, Chen and Chin (1990) proposed a version of the shortest path problem called the quickest path problem. In such problem, both the capacity and the lead time are involved in each edge and are assumed to be deterministic (Chen and Chin, 1990, Hung and Chen, 1992, Martins and Santos, 1997). Variants of quickest path problems, such as constrained quickest path problem (Chen and Hung, 1994, Chen and Tang, 1998), the first k quickest path problem (Clímaco et al., 2007, Pascoal et al., 2005), and all-pairs quickest path problem (Chen and Hung, 1993, Lee, 1993), are subsequently proposed. To shorten the transmission time, the data can be transmitted through k (k   2) disjoint minimal paths (MPs) simultaneously, in which a MP is a path whose proper subsets are no longer paths. However, these researches are proposed by assuming perfect nodes.

In the CCN, nodes play the role as servers/switches and they would be failure due to unexpectedly malfunction as well as edges. Therefore, all of the failure, maintenance action, and transmission time on nodes are needed to be considered as well. Aggarwal, Gupta, and Misra (1975) proposed the concept that the failure of a node implies the failure of edges incident from it. Based on this concept, further related works modified the original network with node failure to be a conventional network with perfect nodes (Lin, 2001, Lin, 2004, Lin, 2007). With node failure, we present an algorithm to estimate the probability that the CCN can send d units of data from the cloud to the client under both maintenance budget B and time constraint T. Such a probability is named the maintenance reliability herein. A bounding approach is first proposed to generate two sets of capacity vectors; {UB-MPs} and {LB-MPs}, where a UB-MP is the minimal capacity vector satisfying d and T while a LB-MP is the minimal capacity vector satisfying d, B, and T. The estimation of maintenance reliability is derived in terms of UB-MPs and LB-MPs by the Recursive Sum of Disjoint Products (RSDP) algorithm afterwards. The remainder of this paper is organized as follows. Notations and assumptions are described in Section 2. The CCN model and the maintenance reliability are described in Section 3. Algorithm to generate the UB-MPs and LB-MPs is proposed in Section 4. An example presented in Section 5 illustrates the algorithm and how the maintenance reliability may be calculated.

Section snippets

Notations and assumptions

Let G = (N, E, W, C, L) denote a CCN with a cloud Scloud and a client Sclient where E = {eii = 1, 2,  , n} represents the set of edges, N = {eii =  n + 1, n + 2,  , n + r} represents the set of nodes, W = {Wii = 1, 2,  , n + r} with the maximal capacity Wi of ei, C = {cii = 1, 2,  , n + r} with unit maintenance ci cost of ei, and L = {lii = 1, 2,  , n+r} with the lead time li of ei. Suppose the data can be transmitted through P1, P2, …, Pk simultaneously, where Pm is the mth MP for m = 1, 2, …, k. The capacity vector X = (x1, x2,  ,  xn+r) is

The CCN model and maintenance reliability

The maintenance cost is calculated in terms of the amount of capacity that each edge/node needs to be restored. The total cost to recover the edges/nodes in a CCN from the state X isTC(X)=i:eim=1kPmci(Wi-xi),where ci(Wi  xi) is the maintenance cost for ei on any MP to recover from the current capacity xi to its highest capacity Wi. For instance, given the current capacity vector X = (1, 0, 1, 1, 0, 0, 1, 1), the maximal capacity vector W = (3, 3, 3, 1, 2, 4, 5, 4), and the unit maintenance cost C = (25, 15, 25, 40, 

The algorithm to generate UB-MPs and LB-MPs

All UB-MPs and LB-MPs can be generated by the following steps.

  • Step 0.

    [Initialization] Set ΦUB = ∅, ΦLB = ∅, and j = 0.

  • Step 1.

    Find the largest assigned demand dm¯ such that i:eiPmli+dm¯mini:eiPmWiT.

  • Step 2.

    [Generation of feasible demand vector d] Generate all non-negative integer solutions of m=1kdm=d where dmdm¯,m=1,2,,k.

  • Step 3.

    [Generation of UB-MPs] For each demand vector d, do the following steps.

    • 3.1

      Find the minimal capacity vm of Pm such that dm units of data can be sent through Pm under T, m = 1, 2, …, k. That is, find the

An illustrative example

A random network with 12 edges and 6 failure nodes shown in Fig. 1 is utilized to illustrate the solution process. In this example, each edge is combined with several OC-18 (Optical Carrier 18) lines and each line provides two capacities, 1 Gbps (giga bits per second) and 0 bps. Since the lines are provided by different suppliers, the edge’s capacity has different probability distribution. The capacity, lead time, and per unit maintenance cost of each edge are shown in Table 1.

In this example,

Summary

For a practical CCN, the capacity of each edge and node should be stochastic due to failure, partial failure, or maintenance. That is, all of the failure, maintenance action, and transmission time on nodes are needed to be considered as well as edges. In this paper, when the CCN falls to the failed state where it cannot provide enough capacity to satisfy client’s requirements, the maintenance action should be taken on each edge/node for keeping a good QoS. Moreover, the transmission time that

References (27)

  • M.M.B. Pascoal et al.

    An algorithm for ranking quickest simple paths

    Computers and Operations Research

    (2005)
  • W.C. Yeh

    Multistate network reliability evaluation under the maintenance cost constraint

    International Journal of Production Economics

    (2004)
  • K.K. Aggarwal et al.

    A simple method for reliability evaluation of a communication system

    IEEE Transactions on Communications

    (1975)
  • Cited by (42)

    • Rail transport network reliability with train arrival delay: A reference indicator for a travel agency in tour planning

      2022, Expert Systems with Applications
      Citation Excerpt :

      Such a performance indicator can be employed as a reference indicator to assess the number of passengers in a tour group that a travel agency can serve. Stochastic-flow network reliability evaluation is popular in many fields, such as communication networks (Lin & Chang, 2011; Lin, 2012; Forghani-elahabad & Kagan, 2019; Huang et al., 2020), manufacturing networks (Lin & Chang, 2014; Chang et al., 2019; Lin & Huang, 2020), power transmission networks (Lin & Yeh, 2011), and logistic networks (Niu et al., 2014; Xu et al., 2018). In the transport field, Lin & Nguyen (2018) constructed a stochastic flight network model with the constraints of arrival time and the number of stopovers and investigated flight network reliability to support strategic decision-making for airline managers; Lin et al. (2019) measured air transport network reliability considering various source and sink stations.

    • RDPC: Secure cloud storage with deduplication technique

      2020, Proceedings of the 4th International Conference on IoT in Social, Mobile, Analytics and Cloud, ISMAC 2020
    View all citing articles on Scopus
    View full text