Scheduling and checkpointing optimization algorithm for Byzantine fault tolerance in cloud clusters

Chinnathambi, Sathya; Santhanam, Agilan; Rajarathinam, Jeyarani; Senthilkumar, M.

doi:10.1007/s10586-018-2375-9

Scheduling and checkpointing optimization algorithm for Byzantine fault tolerance in cloud clusters

Published: 16 March 2018

Volume 22, pages 14637–14650, (2019)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Sathya Chinnathambi¹,
Agilan Santhanam²,
Jeyarani Rajarathinam¹ &
…
M. Senthilkumar³

504 Accesses
23 Citations
Explore all metrics

Abstract

Cloud computing distinguishes itself from other distributed computing paradigm through offering services on-demand basis without any geographical restrictions. This revolutionizes the computing by offering services to wide array of customers starting from casual user to highly business oriented Industries. In spite of its capabilities, cloud computing still struggle with handling wide array of faults, this causes loss of credibility to cloud computing. Among those faults Byzantine faults offers serious challenge to fault tolerance mechanism, because it often go undetected at the initial stage and it can easily propagate to other VMs before a detection is made. Consequently some of the mission critical application such as air traffic control, online baking etc. still staying away from the cloud for such reasons. However if a Byzantine faults is not detected and tolerated at initial stage then applications such as big data analytics can go completely wrong in spite of hours of computations performed by the entire cloud. Therefore in the previous work a fool-proof Byzantine fault detection has been proposed, as a continuation this work designs a scheduling algorithm (WSSS) and checkpoint optimization algorithm (TCC) to tolerate and eliminate the Byzantine faults before it makes any impact. The WSSS algorithm keeps track of server performance which is part of virtual clusters to help allocate best performing server to mission critical application. WSSS therefore ranks the servers based on a counter which monitors every virtual nodes (VN) for time and performance failures. The TCC algorithm works to generalize the possible Byzantine error prone region through monitoring delay variation to start new VNs with previous checkpointing. Moreover it can stretch the state interval for performing and error free VNs in an effect to minimize the space, time and cost overheads caused by checkpointing. The analysis is performed with plotting state transition and CloudSim based simulation. The result shows TCC reduces fault tolerance overhead exponentially and the WSSS allots virtual resources effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey on Fault-Tolerance-Aware Scheduling in Cloud Computing

Real-Time Fault-Tolerant Scheduling Algorithm in Virtualized Clouds

Fault Aware Improved Clustering Algorithm to Improve Execution Time of the Cloudlets

References

Meroufel, Bakhta, Belalem, Ghalem: Adaptive time based coordinated checkpointing for cloud computing workflows. Scalable Comput. 15(2), 153–168 (2014)
Google Scholar
Bala, A.: Scrutinize: fault monitoring for preventing system failure in cloud computing. In: International Journal of Innovations & Advancement in Computer Science (IJIACS), vol. 4 (2015)
Andrzejak, A., Kondo, D., Yi, S.: Decision model for cloud computing under SLA constraints. Inria Research Report (2010)
Yi, S., Kondo, D., Andrzejak, A.: Reducing costs of spot instances via checkpointing in the amazon elastic compute cloud. In: Third IEEE International Conference on Cloud Computing (2010)
Buyya, R., Ranjan, R., Calheiros, R.N.: InterCloud: utility-oriented federation of cloud computing environments for scaling of application services. In: International Conference on Algorithms and Architectures for Parallel Processing, vol. 6081, pp. 13–31. Springer, Berlin (2010)
Chapter Google Scholar
Zhou, A., Wang, S., Cheng, B., et al.: Cloud service reliability enhancement via virtual machine placement optimization. IEEE Trans. Service Comput. 10(6), 902–913 (2017)
Article Google Scholar
Brandt, J., Gentile, A., Mayo, J., Pebay, P., Roe, D., Thompson, D., Wong, M.: Resource monitoring and management with OVIS to enable HPC in cloud computing environments. In: IEEE International Symposium on Parallel & Distributed Processing (2009)
Masud, M.A.H., Huang, X.: An e-learning system architecture based on cloud computing. Int. J. Inf. Commun. Eng. 6(2) (2012)
Dillon, T., Wu, C., Chang, E.: Cloud computing: issues and challenges. In: 24th IEEE International Conference on Advanced Information Networking and Application, pp. 27–33 (2010)
Butt, S., Lagar-Cavilla, H. A., Srivastava, A., Ganapathy, V.: Self-service cloud computing. In: Proceedings of the 19th ACM Conference on Computer and Communications Security (2012)
Chinnathambi, S., Santhanam, A.: Enhancing Byzantine fault tolerance using integrated detection in cloud systems. In: IndoSys - Indian Symposium on Computer Systems Research (2017)
Duan, S., Peisert, S., Levitt, K.N.: hBFT: speculative Byzantine fault tolerance with minimum cost. IEEE Trans. Dependable Secure Comput. (TDSC) 12(1), 58–70 (2015)
Article Google Scholar
Saikia, L.P., Devi, Y.L.: Fault tolerance techniques and algorithms in cloud computing. Int. J. Comput. Sci. Commun. Netw. 4(1), 1–8 (2014)
Google Scholar
Liu, Y., Wei, W.: A replication-based mechanism for fault tolerance in MapReduce framework. Hindawi Mathematical Problems in Engineering (2015)
Zaidi, T., Rampratap, : Modeling for fault tolerance in cloud computing environment. J. Comput. Sci. Appl. 4(1), 9–13 (2016)
Google Scholar
Bala, A., Chana, I.: Fault tolerance-challenges, techniques and implementation in cloud computing. In: IJCSI International Journal of Computer Science Issues, vol. 9, no. 1 (2012)
Essa, Y.M.: A survey of cloud computing fault tolerance: techniques and implementation. Int. J. Comput. Appl. 138(13):34–38 (2016). https://doi.org/10.5120/ijca2016909055
Article Google Scholar
Sathya, C., Agilan, S.: Design of check pointing algorithm for fault tolerance virtual machine. Perspectivas em Ciencia da Informacao. 22(2):269 (2017)
Google Scholar
Ismail, L., Barua, R.: Implementation and performance evaluation of a distributed conjugate gradient method in a cloud computing environment. Software. 43(3), 281–304 (2010)
Google Scholar
Ji, C., Li, Y., Qiu, W., Awada, U., Li, K.: Big data processing in cloud computing environments. In: International Symposium on Pervasive Systems, Algorithms and Networks (2012)
Yi, S., Andrzejak, A., Kondo, D.: Monetary cost-aware checkpointing and migration on amazon cloud spot instances. IEEE Trans. Serv. Comput. 5(4), 512–524 (2011)
Article Google Scholar
Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A., Khan, S.U.: The rise of “big data” on cloud computing: review and open research issues. Inform. Syst. 47, 98–115 (2015)
Article Google Scholar
Dong, Z., Liu, N., Rojas-Cessa, R.: Greedy scheduling of tasks with time constraints for energy-efficient cloud-computing data centers. J. Cloud Comput. 4(1), 5 (2015)
Article Google Scholar
Jung, D., Chin, S., Chung, K., Yu, H., Gil, J.: JoonMin: an efficient checkpointing scheme using price history of spot instances in cloud computing environment. In IFIP International Conference on Network and Parallel Computing, vol. 6985, pp. 185–200. Springer, Berlin (2011)
Google Scholar
Zhou, B., Buyya, R.: A group-based fault tolerant mechanism for heterogeneous mobile clouds. In: Proceedings of the 14th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (2017)
Jhawar, R., Piuri, V., Santambrogio, M.: A comprehensive conceptual system-level approach to fault tolerance in cloud computing. In: IEEE International Systems Conference (SysCon) (2012)
Tang, Z., Qi, L., Cheng, Z., Li, K., Khan, S.U., Li, K.: An energy-efficient task scheduling algorithm in DVFS-enabled cloud environment. J. Grid Comput. 13(1) (2015)
Article Google Scholar
Liu, H., Jin, H., Xu, C.Z., Liao, X.: Performance and energy modeling for live migration of virtual machines. In: Proceedings of the 20th International Symposium on High Performance Distributed Computing, pp. 171–182 (2011)
Voorsluys, W., Broberg, J., Venugopal, S., Buyya, R.: Cost of virtual machine live migration in clouds: a performance evaluation. In: Proceedings of the 1st International Conference on Cloud Computing, pp. 254–265 (2009)
Google Scholar
Zhang, F., Cao, J., Hwang, K., Wu, C.: Ordinal optimized scheduling of scientific workflows in elastic compute clouds. In: Third IEEE International Conference on Coud Computing Technology and Science, pp. 9–17 (2011)
Chowdhury, M.R., Mahmud, M.R., Rahman, R.M.: Implementation and performance analysis of various VM placement strategies in CloudSim. J. Cloud Comput. (2015)

Download references

Author information

Authors and Affiliations

Department of Computing, Coimbatore Institute of Technology, Coimbatore, Tamil Nadu, 641014, India
Sathya Chinnathambi & Jeyarani Rajarathinam
Department of Physics, Coimbatore Institute of Technology, Coimbatore, Tamil Nadu, 641014, India
Agilan Santhanam
Department of Computer Science, Government Arts College, Udumalpet, Tamil Nadu, 642126, India
M. Senthilkumar

Authors

Sathya Chinnathambi
View author publications
You can also search for this author in PubMed Google Scholar
Agilan Santhanam
View author publications
You can also search for this author in PubMed Google Scholar
Jeyarani Rajarathinam
View author publications
You can also search for this author in PubMed Google Scholar
M. Senthilkumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sathya Chinnathambi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chinnathambi, S., Santhanam, A., Rajarathinam, J. et al. Scheduling and checkpointing optimization algorithm for Byzantine fault tolerance in cloud clusters. Cluster Comput 22 (Suppl 6), 14637–14650 (2019). https://doi.org/10.1007/s10586-018-2375-9

Download citation

Received: 05 February 2018
Revised: 21 February 2018
Accepted: 06 March 2018
Published: 16 March 2018
Issue Date: November 2019
DOI: https://doi.org/10.1007/s10586-018-2375-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scheduling and checkpointing optimization algorithm for Byzantine fault tolerance in cloud clusters

Abstract

Access this article

Similar content being viewed by others

Survey on Fault-Tolerance-Aware Scheduling in Cloud Computing

Real-Time Fault-Tolerant Scheduling Algorithm in Virtualized Clouds

Fault Aware Improved Clustering Algorithm to Improve Execution Time of the Cloudlets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scheduling and checkpointing optimization algorithm for Byzantine fault tolerance in cloud clusters

Abstract

Access this article

Similar content being viewed by others

Survey on Fault-Tolerance-Aware Scheduling in Cloud Computing

Real-Time Fault-Tolerant Scheduling Algorithm in Virtualized Clouds

Fault Aware Improved Clustering Algorithm to Improve Execution Time of the Cloudlets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation