On the design of reactive approach with flexible checkpoint interval to tolerate faults in cloud computing systems

Amoon, Mohammed; El-Bahnasawy, Nirmeen; Sadi, Samy; Wagdi, Manar

doi:10.1007/s12652-018-1139-y

On the design of reactive approach with flexible checkpoint interval to tolerate faults in cloud computing systems

Original Research
Published: 15 November 2018

Volume 10, pages 4567–4577, (2019)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Mohammed Amoon ORCID: orcid.org/0000-0003-1704-7211^1,2,
Nirmeen El-Bahnasawy²,
Samy Sadi³ &
…
Manar Wagdi²

417 Accesses
17 Citations
Explore all metrics

Abstract

The likelihood of failures rises in cloud computing systems as a result of their unstable nature. Additionally, the size of a cloud computing system varies with time and thus failures become a common incident. Failures have a high impact on cloud performance and the expected benefits for both customers and providers. Fault tolerance is an essential challenge facing cloud providers in order to mitigate the effects of failures and maintaining the Service Level Agreement (SLA) satisfied. Checkpointing is one of the most known reactive fault tolerance techniques used in distributed computing. However, it can incur considerable overheads that depend on the interval of the checkpoint applied and these overheads put down the performance of the cloud. In this paper, a reactive fault tolerance approach in the context of checkpointing is proposed and evaluated with the aim of getting better performance. The approach depends on applying a flexible interval of the checkpoint to reduce overheads. Simulation experiments indicate superior performance of the approach in terms of power consumption, response time, monetary cost and cloud capacity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on security challenges in cloud computing: issues, threats, and solutions

Article 28 February 2020

Cloud Security Threats and Solutions: A Survey

Article 19 September 2022

Security Threats, Defense Mechanisms, Challenges, and Future Directions in Cloud Computing

Article 07 April 2021

References

Abdulhamid S, Abd Latiff M (2017) A checkpointed league championship algorithm-based cloud scheduling scheme with secure fault tolerance responsiveness. Appl Soft Comput 61:670–680
Article Google Scholar
Alshayeji M et al (2018) A study on fault tolerance mechanisms in cloud computing. Int J Comput Electr Eng 10:574–538
Article Google Scholar
Amoon M (2015) A framework for providing a hybrid fault tolerance in cloud computing. In: Proceedings of Science and Information Conference (SAI), London, pp 844–849
BaLa A, Chana I (2012) Fault tolerance- challenges, techniques and implementation in cloud computing. Int J Comput Sci Issues 9:288–293
Google Scholar
Benoitet A, Hakem M, Robert T (2008) Fault tolerant scheduling of precedence task graphs on heterogeneous platforms. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing, Miami, FL, USA, pp 1–8
Buyya R et al (2009) Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as The 5th utility. Future Gen Comput Syst 25:599–616
Article Google Scholar
Di S et al (2013) Optimization of cloud task processing with checkpoint-restart mechanism. In: Proceedings of the international conference on high performance computing, networking, storage and analysis, Denver, CO, USA, pp 1–12
El-Sayed N, Schroeder B (2018) Understanding practical tradeoffs in HPC checkpoint-scheduling policies. IEEE Trans Dependable Secure Comput 15:336–350
Article Google Scholar
Goiri I, Julià F, Guitart J, Torres J (2010) Checkpoint-based fault-tolerant infrastructure for virtualized service providers. In: Proceedings of 12th IEEE/IFIP network operations and management symposium (NOMS’10), Osaka, Japan, pp 455–462
Han H et al (2018a) Fault-tolerant scheduling for hybrid real-time tasks based on CPB model in cloud. IEEE Access 6:19616–18629
Google Scholar
Han L et al (2018b) Checkpointing workflows for fail-stop errors. IEEE Trans Comput. https://doi.org/10.1109/TC.2018.2801300
Article MathSciNet MATH Google Scholar
Hasan M, Goraya M (2018) Fault tolerance in cloud computing environment: a systematic survey. Comput Ind 99:156–172
Article Google Scholar
Kliazovich D, Bouvry P, Khan S (2012) Greencloud: a packet-level simulator of energy-aware cloud computing data centers. J Supercomput 62:1263–1283
Article Google Scholar
Kumar S, Goudar R (2012) Cloud computing–research issues, challenges, architecture, platforms and applications: a survey. Int J Future Comput Commun 1:356–360
Article Google Scholar
Limam S, Belalem G (2011) Fault tolerant architecture to cloud computing using adaptive checkpoint. Int J Cloud Appl Comput 1:60–69
Google Scholar
Limam S, Belalem G (2014) A migration approach for fault tolerance in cloud computing. Int J Grid High Perform Comput 6:24–37
Article Google Scholar
Limrungsi N et al (2012) Providing reliability as an elastic service in cloud computing. In: Proceedings of IEEE International Conference on Communications (ICC), Ottawa, ON, Canada, pp 1–4
Liu D (2015) A fault-tolerant architecture for ROIA in cloud. J Ambient Intell Humaniz Comput 6:587–595
Article Google Scholar
Lloyd’s (2018) Cloud Down Impacts on the US economy. AIR Worldwide. https://www.lloyds.com/~/media/files/news-and-insight/risk-insight/2018/cloud-down/aircyberlloydspublic2018final.pdf. Accessed 10 Jul 2018
Louatia T, Abbesa H, Cérinb C, Jemnia M (2018) LXCloud-CR: towards LinuX containers distributed hash table based checkpoint-restart. J Parallel Distrib Comput 111:187–205
Article Google Scholar
Ni X, Meneses E, Kale L (2012) Hiding checkpoint overhead in HPC applications with a semi-blocking algorithm. In: Proceedings of IEEE International Conference on Cluster Computing, Beijing, China, pp 364–372
Nu˜nez A et al (2011) Design of a new cloud computing simulation platform. In: proceedings of international conference on computational science and its applications, Santander, Spain, pp 582–593
Ostermann S et al (2011) Groudsim: an event-based simulation framework for computational grids and clouds. In: Proceedings of Euro-Par Parallel Processing Workshops. Springer, pp 305–313
Pagare J, Koli N (2015) Design and simulate cloud computing environment using Cloudsim. Int J Comput Technol Appl 6:35–42
Google Scholar
Patel S, Singh A (2013) Fault tolerance mechanisms and its implementation in cloud computing—a review. Int J Adv Res Comput Sci Softw Eng 3:573–576
Google Scholar
Rampratap T (2016) Modeling for fault tolerance in cloud computing environment. J Comput Sci Appl 4:9–13
Google Scholar
Rejinpaul N, Visuwasam L (2012) Checkpoint-based intelligent fault tolerance for cloud service providers. Int J Comput Distrib Syst 2:59–64
Google Scholar
Sadi S, Yagoubi B (2015) Acs-advanced cloud simulator: a discrete event based simulator for cloud computing environments. In: Proceedings of the 2nd international conference on networking and advanced systems, Annaba, Algeria, pp 11–16
Sadi S, Yagoubi B (2016) Communication-aware approaches for transparent checkpointing in cloud computing. Scalable Comput Pract Exp 17:251–270
Google Scholar
Sampaio A, Barbosa J (2017) A comparative cost analysis of fault-tolerance mechanisms for availability on the cloud. Sustain Comput Inf Syst. https://doi.org/10.1016/j.suscom.2017.11.006
Article Google Scholar
Shao Y et al (2017) Chord: checkpoint-based scheduling using hybrid waiting list in shared clusters. J Syst Softw 131:22–34
Article Google Scholar
Singh P, Jain E (2014) Survey paper on cloud computing. Int J Innov Eng Technol 3:84–89
Google Scholar
Ying C, Yu J, He J (2018) Towards fault tolerance optimization based on checkpoints of in-memory framework spark. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-018-1018-6
Article Google Scholar

Download references

Acknowledgements

This work was supported by King Saud University, Deanship of Scientific Research, Community College Research Unit.

Author information

Authors and Affiliations

Department of Computer Science, RCC, King Saud University, P. O. Box: 28095-11437, Riyadh, Saudi Arabia
Mohammed Amoon
Computer Science and Engineering Department, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt
Mohammed Amoon, Nirmeen El-Bahnasawy & Manar Wagdi
Department of Computer Science, University of Oran1 Ahmed Ben Bella, Oran, Algeria
Samy Sadi

Authors

Mohammed Amoon
View author publications
You can also search for this author in PubMed Google Scholar
Nirmeen El-Bahnasawy
View author publications
You can also search for this author in PubMed Google Scholar
Samy Sadi
View author publications
You can also search for this author in PubMed Google Scholar
Manar Wagdi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed Amoon.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amoon, M., El-Bahnasawy, N., Sadi, S. et al. On the design of reactive approach with flexible checkpoint interval to tolerate faults in cloud computing systems. J Ambient Intell Human Comput 10, 4567–4577 (2019). https://doi.org/10.1007/s12652-018-1139-y

Download citation

Received: 22 July 2018
Accepted: 12 November 2018
Published: 15 November 2018
Issue Date: November 2019
DOI: https://doi.org/10.1007/s12652-018-1139-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the design of reactive approach with flexible checkpoint interval to tolerate faults in cloud computing systems

Abstract

Access this article

Similar content being viewed by others

A survey on security challenges in cloud computing: issues, threats, and solutions

Cloud Security Threats and Solutions: A Survey

Security Threats, Defense Mechanisms, Challenges, and Future Directions in Cloud Computing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the design of reactive approach with flexible checkpoint interval to tolerate faults in cloud computing systems

Abstract

Access this article

Similar content being viewed by others

A survey on security challenges in cloud computing: issues, threats, and solutions

Cloud Security Threats and Solutions: A Survey

Security Threats, Defense Mechanisms, Challenges, and Future Directions in Cloud Computing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation