Skip to main content
Log in

On the design of reactive approach with flexible checkpoint interval to tolerate faults in cloud computing systems

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

The likelihood of failures rises in cloud computing systems as a result of their unstable nature. Additionally, the size of a cloud computing system varies with time and thus failures become a common incident. Failures have a high impact on cloud performance and the expected benefits for both customers and providers. Fault tolerance is an essential challenge facing cloud providers in order to mitigate the effects of failures and maintaining the Service Level Agreement (SLA) satisfied. Checkpointing is one of the most known reactive fault tolerance techniques used in distributed computing. However, it can incur considerable overheads that depend on the interval of the checkpoint applied and these overheads put down the performance of the cloud. In this paper, a reactive fault tolerance approach in the context of checkpointing is proposed and evaluated with the aim of getting better performance. The approach depends on applying a flexible interval of the checkpoint to reduce overheads. Simulation experiments indicate superior performance of the approach in terms of power consumption, response time, monetary cost and cloud capacity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Abdulhamid S, Abd Latiff M (2017) A checkpointed league championship algorithm-based cloud scheduling scheme with secure fault tolerance responsiveness. Appl Soft Comput 61:670–680

    Article  Google Scholar 

  • Alshayeji M et al (2018) A study on fault tolerance mechanisms in cloud computing. Int J Comput Electr Eng 10:574–538

    Article  Google Scholar 

  • Amoon M (2015) A framework for providing a hybrid fault tolerance in cloud computing. In: Proceedings of Science and Information Conference (SAI), London, pp 844–849

  • BaLa A, Chana I (2012) Fault tolerance- challenges, techniques and implementation in cloud computing. Int J Comput Sci Issues 9:288–293

    Google Scholar 

  • Benoitet A, Hakem M, Robert T (2008) Fault tolerant scheduling of precedence task graphs on heterogeneous platforms. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing, Miami, FL, USA, pp 1–8

  • Buyya R et al (2009) Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as The 5th utility. Future Gen Comput Syst 25:599–616

    Article  Google Scholar 

  • Di S et al (2013) Optimization of cloud task processing with checkpoint-restart mechanism. In: Proceedings of the international conference on high performance computing, networking, storage and analysis, Denver, CO, USA, pp 1–12

  • El-Sayed N, Schroeder B (2018) Understanding practical tradeoffs in HPC checkpoint-scheduling policies. IEEE Trans Dependable Secure Comput 15:336–350

    Article  Google Scholar 

  • Goiri I, Julià F, Guitart J, Torres J (2010) Checkpoint-based fault-tolerant infrastructure for virtualized service providers. In: Proceedings of 12th IEEE/IFIP network operations and management symposium (NOMS’10), Osaka, Japan, pp 455–462

  • Han H et al (2018a) Fault-tolerant scheduling for hybrid real-time tasks based on CPB model in cloud. IEEE Access 6:19616–18629

    Google Scholar 

  • Han L et al (2018b) Checkpointing workflows for fail-stop errors. IEEE Trans Comput. https://doi.org/10.1109/TC.2018.2801300

    Article  MathSciNet  MATH  Google Scholar 

  • Hasan M, Goraya M (2018) Fault tolerance in cloud computing environment: a systematic survey. Comput Ind 99:156–172

    Article  Google Scholar 

  • Kliazovich D, Bouvry P, Khan S (2012) Greencloud: a packet-level simulator of energy-aware cloud computing data centers. J Supercomput 62:1263–1283

    Article  Google Scholar 

  • Kumar S, Goudar R (2012) Cloud computing–research issues, challenges, architecture, platforms and applications: a survey. Int J Future Comput Commun 1:356–360

    Article  Google Scholar 

  • Limam S, Belalem G (2011) Fault tolerant architecture to cloud computing using adaptive checkpoint. Int J Cloud Appl Comput 1:60–69

    Google Scholar 

  • Limam S, Belalem G (2014) A migration approach for fault tolerance in cloud computing. Int J Grid High Perform Comput 6:24–37

    Article  Google Scholar 

  • Limrungsi N et al (2012) Providing reliability as an elastic service in cloud computing. In: Proceedings of IEEE International Conference on Communications (ICC), Ottawa, ON, Canada, pp 1–4

  • Liu D (2015) A fault-tolerant architecture for ROIA in cloud. J Ambient Intell Humaniz Comput 6:587–595

    Article  Google Scholar 

  • Lloyd’s (2018) Cloud Down Impacts on the US economy. AIR Worldwide. https://www.lloyds.com/~/media/files/news-and-insight/risk-insight/2018/cloud-down/aircyberlloydspublic2018final.pdf. Accessed 10 Jul 2018

  • Louatia T, Abbesa H, Cérinb C, Jemnia M (2018) LXCloud-CR: towards LinuX containers distributed hash table based checkpoint-restart. J Parallel Distrib Comput 111:187–205

    Article  Google Scholar 

  • Ni X, Meneses E, Kale L (2012) Hiding checkpoint overhead in HPC applications with a semi-blocking algorithm. In: Proceedings of IEEE International Conference on Cluster Computing, Beijing, China, pp 364–372

  • Nu˜nez A et al (2011) Design of a new cloud computing simulation platform. In: proceedings of international conference on computational science and its applications, Santander, Spain, pp 582–593

  • Ostermann S et al (2011) Groudsim: an event-based simulation framework for computational grids and clouds. In: Proceedings of Euro-Par Parallel Processing Workshops. Springer, pp 305–313

  • Pagare J, Koli N (2015) Design and simulate cloud computing environment using Cloudsim. Int J Comput Technol Appl 6:35–42

    Google Scholar 

  • Patel S, Singh A (2013) Fault tolerance mechanisms and its implementation in cloud computing—a review. Int J Adv Res Comput Sci Softw Eng 3:573–576

    Google Scholar 

  • Rampratap T (2016) Modeling for fault tolerance in cloud computing environment. J Comput Sci Appl 4:9–13

    Google Scholar 

  • Rejinpaul N, Visuwasam L (2012) Checkpoint-based intelligent fault tolerance for cloud service providers. Int J Comput Distrib Syst 2:59–64

    Google Scholar 

  • Sadi S, Yagoubi B (2015) Acs-advanced cloud simulator: a discrete event based simulator for cloud computing environments. In: Proceedings of the 2nd international conference on networking and advanced systems, Annaba, Algeria, pp 11–16

  • Sadi S, Yagoubi B (2016) Communication-aware approaches for transparent checkpointing in cloud computing. Scalable Comput Pract Exp 17:251–270

    Google Scholar 

  • Sampaio A, Barbosa J (2017) A comparative cost analysis of fault-tolerance mechanisms for availability on the cloud. Sustain Comput Inf Syst. https://doi.org/10.1016/j.suscom.2017.11.006

    Article  Google Scholar 

  • Shao Y et al (2017) Chord: checkpoint-based scheduling using hybrid waiting list in shared clusters. J Syst Softw 131:22–34

    Article  Google Scholar 

  • Singh P, Jain E (2014) Survey paper on cloud computing. Int J Innov Eng Technol 3:84–89

    Google Scholar 

  • Ying C, Yu J, He J (2018) Towards fault tolerance optimization based on checkpoints of in-memory framework spark. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-018-1018-6

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by King Saud University, Deanship of Scientific Research, Community College Research Unit.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Amoon.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amoon, M., El-Bahnasawy, N., Sadi, S. et al. On the design of reactive approach with flexible checkpoint interval to tolerate faults in cloud computing systems. J Ambient Intell Human Comput 10, 4567–4577 (2019). https://doi.org/10.1007/s12652-018-1139-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-018-1139-y

Keywords

Navigation