Abstract
Traditional Machine Learning and Deep Learning techniques (data acquisition, preparation, model training and evaluation) take a lot of computational resources and time to produce even a simple prediction model, especially when implemented on a single machine. Intuitively, the demand for computational requirements is higher in case of management of Big Data and training of complex models. Thus, a paradigm shift from a single machine to a BD-oriented approach is required for making traditional Machine Learning and Deep Learning techniques fit to Big Data. In particular, it emerges the need for developing and deploying Big Data Analytics Infrastructures on cluster of machines. In this context, main features and principles of Distributed Deep Learning frameworks are here discussed. The main contribution of this paper is a systematic review of proposed solutions, aimed at investigating under a unifying lens their foundational elements, functional features and capabilities, despite the inherent literature fragmentation. To this, we conducted a literature search in Scopus and Google Scholar. This review also compares Distributed Deep Learning approaches according to more technical facets: implemented of parallelism techniques, supported hardware, model parameters sharing modalities, computation modalities for stochastic gradient descent and compatibility with other frameworks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
De Mauro, A., Greco, M., Grimaldi, M.: A formal definition of Big Data based on its essential features. Libr. Rev. 65(3), 122–135 (2016). https://doi.org/10.1108/LR-06-2015-0061
Gupta, D., Rani, R.: A study of big data evolution and research challenges 45(3), 322–340 (2018). https://doi.org/10.1177/0165551518789880
Apache Software Foundation: Apache Hadoop (2010). https://hadoop.apache.org
Joydeep, S.S., Thusoo, A.: Apache Hive (2011). https://hive.apache.org/
L. AMP and A. S. Foundation: Apache Spark (2014). https://spark.apache.org/
Backtype and Twitter: Apache Storm (2011). https://storm.apache.org/
Apache Software Foundation: Apache Flink. https://flink.apache.org/
Goldstein, I., Spatt, C.S., Ye, M.: Big data in finance. Rev. Financ. Stud. 34(7), 3213–3225 (2021). https://doi.org/10.1093/RFS/HHAB038
Cui, Y., Kara, S., Chan, K.C.: Manufacturing big data ecosystem: a systematic literature review. Robot. Comput. Integr. Manuf. 62, 101861 (2020). https://doi.org/10.1016/J.RCIM.2019.101861
Carnimeo, L., et al.: Proposal of a health care network based on big data analytics for PDs. J. Eng. 2019(6), 4603–4611 (2019). https://doi.org/10.1049/JOE.2018.5142
Buongiorno, D., et al.: Deep learning for processing electromyographic signals: a taxonomy-based survey. Neurocomputing 452, 549–565 (2021). https://doi.org/10.1016/J.NEUCOM.2020.06.139
Hillis, W.D., Steele, G.L.: Data parallel algorithms. Commun. ACM 29(12), 1170–1183 (1986). https://doi.org/10.1145/7902.7903
Gardner, W.A.: Learning characteristics of stochastic-gradient-descent algorithms: a general study, analysis, and critique. Signal Process. 6(2), 113–133 (1984). https://doi.org/10.1016/0165-1684(84)90013-6
Zheng, S., et al.: Asynchronous stochastic gradient descent with delay compensation (2017)
Ben-Nun, T., Hoefler, T.: Demystifying parallel and distributed deep learning. ACM Comput. Surv. 52(4) (2019). https://doi.org/10.1145/3320060
Goyal, P., et al.: Accurate, large minibatch SGD: training ImageNet in 1 hour, June 2017. https://doi.org/10.48550/arxiv.1706.02677
Dean, J., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, vol. 2, pp. 1223–1231 (2012)
Chilimbi, T., Suzue, Y., Apacible, J., Kalyanaraman, K.: Project Adam: building an efficient and scalable deep learning training system. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2014), pp. 571–582 (2014). https://www.usenix.org/conference/osdi14/technical-sessions/presentation/chilimbi
Li, M., et al.: Scaling distributed machine learning with the parameter server. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, pp. 583–598 (2014)
Patarasuk, P., Yuan, X.: Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput. 69(2), 117–124 (2009). https://doi.org/10.1016/j.jpdc.2008.09.002
Zhao, W., et al.: Distributed hierarchical GPU parameter server for massive scale deep learning ads systems, March 2020. https://doi.org/10.48550/arxiv.2003.05622
Yang, C., Amazon, A.W.S.: Tree-based Allreduce communication on MXNet. Technical report (2018)
Niu, F., Recht, B., Ré, C., Wright, S.J.: HOGWILD!: a lock-free approach to parallelizing stochastic gradient descent. In: Advances in Neural Information Processing Systems 24, 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011, June 2011. https://doi.org/10.48550/arxiv.1106.5730
Ho, Q., et al.: More effective distributed ML via a stale synchronous parallel parameter server. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1, pp. 1223–1231 (2013)
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing, vol. 32 (2019). https://doi.org/10.48550/arxiv.1912.01703
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems, March 2016. https://doi.org/10.48550/arxiv.1603.04467
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia, pp. 675–678, June 2014, https://doi.org/10.48550/arxiv.1408.5093
Akiba, T., Fukuda, K., Suzuki, S.: ChainerMN: scalable distributed deep learning framework, October 2017. https://doi.org/10.48550/arxiv.1710.11351
(Jinquan) Dai, J., et al.: BigDL: A distributed deep learning framework for big data. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 50–60 (2019). https://doi.org/10.1145/3357223.3362707
Ooi, B.C., et al.: SINGA: a distributed deep learning platform. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 685–688 (2015). https://doi.org/10.1145/2733373.2807410
Elephas: Distributed Deep Learning with Keras and Pyspark. http://maxpumperla.com/elephas/. Accessed 22 Mar 2022
Tensorflowonspark. https://github.com/yahoo/TensorFlowOnSpark. Accessed 22 Mar 2022
Liberty, E., et al.: Elastic machine learning algorithms in Amazon SageMaker. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 731–737 (2020). https://doi.org/10.1145/3318464.3386126
Yuan, J., et al.: OneFlow: redesign the distributed deep learning framework from scratch, October 2021. https://doi.org/10.48550/arxiv.2110.15032
Sergeev, A., Del Balso, M.: Horovod: fast and easy distributed deep learning in TensorFlow, February 2018. https://doi.org/10.48550/arxiv.1802.05799
Khumoyun, A., Cui, Y., Hanku, L.: Spark based distributed Deep Learning framework for Big Data applications. In: 2016 International Conference on Information Science and Communication Technology, ICISCT 2016, December 2016. https://doi.org/10.1109/ICISCT.2016.7777390
Lim, E.J., Ahn, S.Y., Park, Y.M., Choi, W.: Distributed deep learning framework based on shared memory for fast deep neural network training. In: 9th International Conference on Information and Communication Technology Convergence. Powered by Smart Intelligence, ICTC 2018, pp. 1239–1242, November 2018, doi: https://doi.org/10.1109/ICTC.2018.8539420
Qubole data service (2011). https://docs.qubole.com/en/latest/user-guide/qds.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Berloco, F., Bevilacqua, V., Colucci, S. (2022). A Systematic Review of Distributed Deep Learning Frameworks for Big Data. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2022. Lecture Notes in Computer Science(), vol 13395. Springer, Cham. https://doi.org/10.1007/978-3-031-13832-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-13832-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13831-7
Online ISBN: 978-3-031-13832-4
eBook Packages: Computer ScienceComputer Science (R0)