A Systematic Review of Distributed Deep Learning Frameworks for Big Data

Berloco, Francesco; Bevilacqua, Vitoantonio; Colucci, Simona

doi:10.1007/978-3-031-13832-4_21

Francesco Berloco¹³,
Vitoantonio Bevilacqua^13,14 &
Simona Colucci¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13395))

Included in the following conference series:

International Conference on Intelligent Computing

1709 Accesses
1 Citations

Abstract

Traditional Machine Learning and Deep Learning techniques (data acquisition, preparation, model training and evaluation) take a lot of computational resources and time to produce even a simple prediction model, especially when implemented on a single machine. Intuitively, the demand for computational requirements is higher in case of management of Big Data and training of complex models. Thus, a paradigm shift from a single machine to a BD-oriented approach is required for making traditional Machine Learning and Deep Learning techniques fit to Big Data. In particular, it emerges the need for developing and deploying Big Data Analytics Infrastructures on cluster of machines. In this context, main features and principles of Distributed Deep Learning frameworks are here discussed. The main contribution of this paper is a systematic review of proposed solutions, aimed at investigating under a unifying lens their foundational elements, functional features and capabilities, despite the inherent literature fragmentation. To this, we conducted a literature search in Scopus and Google Scholar. This review also compares Distributed Deep Learning approaches according to more technical facets: implemented of parallelism techniques, supported hardware, model parameters sharing modalities, computation modalities for stochastic gradient descent and compatibility with other frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

De Mauro, A., Greco, M., Grimaldi, M.: A formal definition of Big Data based on its essential features. Libr. Rev. 65(3), 122–135 (2016). https://doi.org/10.1108/LR-06-2015-0061
Article Google Scholar
Gupta, D., Rani, R.: A study of big data evolution and research challenges 45(3), 322–340 (2018). https://doi.org/10.1177/0165551518789880
Apache Software Foundation: Apache Hadoop (2010). https://hadoop.apache.org
Joydeep, S.S., Thusoo, A.: Apache Hive (2011). https://hive.apache.org/
L. AMP and A. S. Foundation: Apache Spark (2014). https://spark.apache.org/
Backtype and Twitter: Apache Storm (2011). https://storm.apache.org/
Apache Software Foundation: Apache Flink. https://flink.apache.org/
Goldstein, I., Spatt, C.S., Ye, M.: Big data in finance. Rev. Financ. Stud. 34(7), 3213–3225 (2021). https://doi.org/10.1093/RFS/HHAB038
Article Google Scholar
Cui, Y., Kara, S., Chan, K.C.: Manufacturing big data ecosystem: a systematic literature review. Robot. Comput. Integr. Manuf. 62, 101861 (2020). https://doi.org/10.1016/J.RCIM.2019.101861
Article Google Scholar
Carnimeo, L., et al.: Proposal of a health care network based on big data analytics for PDs. J. Eng. 2019(6), 4603–4611 (2019). https://doi.org/10.1049/JOE.2018.5142
Article Google Scholar
Buongiorno, D., et al.: Deep learning for processing electromyographic signals: a taxonomy-based survey. Neurocomputing 452, 549–565 (2021). https://doi.org/10.1016/J.NEUCOM.2020.06.139
Article Google Scholar
Hillis, W.D., Steele, G.L.: Data parallel algorithms. Commun. ACM 29(12), 1170–1183 (1986). https://doi.org/10.1145/7902.7903
Article Google Scholar
Gardner, W.A.: Learning characteristics of stochastic-gradient-descent algorithms: a general study, analysis, and critique. Signal Process. 6(2), 113–133 (1984). https://doi.org/10.1016/0165-1684(84)90013-6
Article MathSciNet Google Scholar
Zheng, S., et al.: Asynchronous stochastic gradient descent with delay compensation (2017)
Google Scholar
Ben-Nun, T., Hoefler, T.: Demystifying parallel and distributed deep learning. ACM Comput. Surv. 52(4) (2019). https://doi.org/10.1145/3320060
Goyal, P., et al.: Accurate, large minibatch SGD: training ImageNet in 1 hour, June 2017. https://doi.org/10.48550/arxiv.1706.02677
Dean, J., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, vol. 2, pp. 1223–1231 (2012)
Google Scholar
Chilimbi, T., Suzue, Y., Apacible, J., Kalyanaraman, K.: Project Adam: building an efficient and scalable deep learning training system. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2014), pp. 571–582 (2014). https://www.usenix.org/conference/osdi14/technical-sessions/presentation/chilimbi
Li, M., et al.: Scaling distributed machine learning with the parameter server. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, pp. 583–598 (2014)
Google Scholar
Patarasuk, P., Yuan, X.: Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput. 69(2), 117–124 (2009). https://doi.org/10.1016/j.jpdc.2008.09.002
Article Google Scholar
Zhao, W., et al.: Distributed hierarchical GPU parameter server for massive scale deep learning ads systems, March 2020. https://doi.org/10.48550/arxiv.2003.05622
Yang, C., Amazon, A.W.S.: Tree-based Allreduce communication on MXNet. Technical report (2018)
Google Scholar
Niu, F., Recht, B., Ré, C., Wright, S.J.: HOGWILD!: a lock-free approach to parallelizing stochastic gradient descent. In: Advances in Neural Information Processing Systems 24, 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011, June 2011. https://doi.org/10.48550/arxiv.1106.5730
Ho, Q., et al.: More effective distributed ML via a stale synchronous parallel parameter server. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1, pp. 1223–1231 (2013)
Google Scholar
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing, vol. 32 (2019). https://doi.org/10.48550/arxiv.1912.01703
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems, March 2016. https://doi.org/10.48550/arxiv.1603.04467
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia, pp. 675–678, June 2014, https://doi.org/10.48550/arxiv.1408.5093
Akiba, T., Fukuda, K., Suzuki, S.: ChainerMN: scalable distributed deep learning framework, October 2017. https://doi.org/10.48550/arxiv.1710.11351
(Jinquan) Dai, J., et al.: BigDL: A distributed deep learning framework for big data. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 50–60 (2019). https://doi.org/10.1145/3357223.3362707
Ooi, B.C., et al.: SINGA: a distributed deep learning platform. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 685–688 (2015). https://doi.org/10.1145/2733373.2807410
Elephas: Distributed Deep Learning with Keras and Pyspark. http://maxpumperla.com/elephas/. Accessed 22 Mar 2022
Tensorflowonspark. https://github.com/yahoo/TensorFlowOnSpark. Accessed 22 Mar 2022
Liberty, E., et al.: Elastic machine learning algorithms in Amazon SageMaker. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 731–737 (2020). https://doi.org/10.1145/3318464.3386126
Yuan, J., et al.: OneFlow: redesign the distributed deep learning framework from scratch, October 2021. https://doi.org/10.48550/arxiv.2110.15032
Sergeev, A., Del Balso, M.: Horovod: fast and easy distributed deep learning in TensorFlow, February 2018. https://doi.org/10.48550/arxiv.1802.05799
Khumoyun, A., Cui, Y., Hanku, L.: Spark based distributed Deep Learning framework for Big Data applications. In: 2016 International Conference on Information Science and Communication Technology, ICISCT 2016, December 2016. https://doi.org/10.1109/ICISCT.2016.7777390
Lim, E.J., Ahn, S.Y., Park, Y.M., Choi, W.: Distributed deep learning framework based on shared memory for fast deep neural network training. In: 9th International Conference on Information and Communication Technology Convergence. Powered by Smart Intelligence, ICTC 2018, pp. 1239–1242, November 2018, doi: https://doi.org/10.1109/ICTC.2018.8539420
Qubole data service (2011). https://docs.qubole.com/en/latest/user-guide/qds.html

Download references

Author information

Authors and Affiliations

Polytechnic University of Bari, 70126, Bari, Italy
Francesco Berloco, Vitoantonio Bevilacqua & Simona Colucci
Apulian Bioengineering srl, Via delle Violette, 14, 70026, Modugno, BA, Italy
Vitoantonio Bevilacqua

Authors

Francesco Berloco
View author publications
You can also search for this author in PubMed Google Scholar
Vitoantonio Bevilacqua
View author publications
You can also search for this author in PubMed Google Scholar
Simona Colucci
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francesco Berloco .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Xi’an Polytechnic University, Xi’an, China
Junfeng Jing
The University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne
Polytecnic of Bari, Bari, Italy
Vitoantonio Bevilacqua
Liverpool John Moores University, Liverpool, UK
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Berloco, F., Bevilacqua, V., Colucci, S. (2022). A Systematic Review of Distributed Deep Learning Frameworks for Big Data. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2022. Lecture Notes in Computer Science(), vol 13395. Springer, Cham. https://doi.org/10.1007/978-3-031-13832-4_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-13832-4_21
Published: 16 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13831-7
Online ISBN: 978-3-031-13832-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics