Skip to main content

A Systematic Review of Distributed Deep Learning Frameworks for Big Data

  • Conference paper
  • First Online:
Book cover Intelligent Computing Methodologies (ICIC 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13395))

Included in the following conference series:

Abstract

Traditional Machine Learning and Deep Learning techniques (data acquisition, preparation, model training and evaluation) take a lot of computational resources and time to produce even a simple prediction model, especially when implemented on a single machine. Intuitively, the demand for computational requirements is higher in case of management of Big Data and training of complex models. Thus, a paradigm shift from a single machine to a BD-oriented approach is required for making traditional Machine Learning and Deep Learning techniques fit to Big Data. In particular, it emerges the need for developing and deploying Big Data Analytics Infrastructures on cluster of machines. In this context, main features and principles of Distributed Deep Learning frameworks are here discussed. The main contribution of this paper is a systematic review of proposed solutions, aimed at investigating under a unifying lens their foundational elements, functional features and capabilities, despite the inherent literature fragmentation. To this, we conducted a literature search in Scopus and Google Scholar. This review also compares Distributed Deep Learning approaches according to more technical facets: implemented of parallelism techniques, supported hardware, model parameters sharing modalities, computation modalities for stochastic gradient descent and compatibility with other frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. De Mauro, A., Greco, M., Grimaldi, M.: A formal definition of Big Data based on its essential features. Libr. Rev. 65(3), 122–135 (2016). https://doi.org/10.1108/LR-06-2015-0061

    Article  Google Scholar 

  2. Gupta, D., Rani, R.: A study of big data evolution and research challenges 45(3), 322–340 (2018). https://doi.org/10.1177/0165551518789880

  3. Apache Software Foundation: Apache Hadoop (2010). https://hadoop.apache.org

  4. Joydeep, S.S., Thusoo, A.: Apache Hive (2011). https://hive.apache.org/

  5. L. AMP and A. S. Foundation: Apache Spark (2014). https://spark.apache.org/

  6. Backtype and Twitter: Apache Storm (2011). https://storm.apache.org/

  7. Apache Software Foundation: Apache Flink. https://flink.apache.org/

  8. Goldstein, I., Spatt, C.S., Ye, M.: Big data in finance. Rev. Financ. Stud. 34(7), 3213–3225 (2021). https://doi.org/10.1093/RFS/HHAB038

    Article  Google Scholar 

  9. Cui, Y., Kara, S., Chan, K.C.: Manufacturing big data ecosystem: a systematic literature review. Robot. Comput. Integr. Manuf. 62, 101861 (2020). https://doi.org/10.1016/J.RCIM.2019.101861

    Article  Google Scholar 

  10. Carnimeo, L., et al.: Proposal of a health care network based on big data analytics for PDs. J. Eng. 2019(6), 4603–4611 (2019). https://doi.org/10.1049/JOE.2018.5142

    Article  Google Scholar 

  11. Buongiorno, D., et al.: Deep learning for processing electromyographic signals: a taxonomy-based survey. Neurocomputing 452, 549–565 (2021). https://doi.org/10.1016/J.NEUCOM.2020.06.139

    Article  Google Scholar 

  12. Hillis, W.D., Steele, G.L.: Data parallel algorithms. Commun. ACM 29(12), 1170–1183 (1986). https://doi.org/10.1145/7902.7903

    Article  Google Scholar 

  13. Gardner, W.A.: Learning characteristics of stochastic-gradient-descent algorithms: a general study, analysis, and critique. Signal Process. 6(2), 113–133 (1984). https://doi.org/10.1016/0165-1684(84)90013-6

    Article  MathSciNet  Google Scholar 

  14. Zheng, S., et al.: Asynchronous stochastic gradient descent with delay compensation (2017)

    Google Scholar 

  15. Ben-Nun, T., Hoefler, T.: Demystifying parallel and distributed deep learning. ACM Comput. Surv. 52(4) (2019). https://doi.org/10.1145/3320060

  16. Goyal, P., et al.: Accurate, large minibatch SGD: training ImageNet in 1 hour, June 2017. https://doi.org/10.48550/arxiv.1706.02677

  17. Dean, J., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, vol. 2, pp. 1223–1231 (2012)

    Google Scholar 

  18. Chilimbi, T., Suzue, Y., Apacible, J., Kalyanaraman, K.: Project Adam: building an efficient and scalable deep learning training system. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2014), pp. 571–582 (2014). https://www.usenix.org/conference/osdi14/technical-sessions/presentation/chilimbi

  19. Li, M., et al.: Scaling distributed machine learning with the parameter server. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, pp. 583–598 (2014)

    Google Scholar 

  20. Patarasuk, P., Yuan, X.: Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput. 69(2), 117–124 (2009). https://doi.org/10.1016/j.jpdc.2008.09.002

    Article  Google Scholar 

  21. Zhao, W., et al.: Distributed hierarchical GPU parameter server for massive scale deep learning ads systems, March 2020. https://doi.org/10.48550/arxiv.2003.05622

  22. Yang, C., Amazon, A.W.S.: Tree-based Allreduce communication on MXNet. Technical report (2018)

    Google Scholar 

  23. Niu, F., Recht, B., Ré, C., Wright, S.J.: HOGWILD!: a lock-free approach to parallelizing stochastic gradient descent. In: Advances in Neural Information Processing Systems 24, 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011, June 2011. https://doi.org/10.48550/arxiv.1106.5730

  24. Ho, Q., et al.: More effective distributed ML via a stale synchronous parallel parameter server. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1, pp. 1223–1231 (2013)

    Google Scholar 

  25. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing, vol. 32 (2019). https://doi.org/10.48550/arxiv.1912.01703

  26. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems, March 2016. https://doi.org/10.48550/arxiv.1603.04467

  27. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia, pp. 675–678, June 2014, https://doi.org/10.48550/arxiv.1408.5093

  28. Akiba, T., Fukuda, K., Suzuki, S.: ChainerMN: scalable distributed deep learning framework, October 2017. https://doi.org/10.48550/arxiv.1710.11351

  29. (Jinquan) Dai, J., et al.: BigDL: A distributed deep learning framework for big data. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 50–60 (2019). https://doi.org/10.1145/3357223.3362707

  30. Ooi, B.C., et al.: SINGA: a distributed deep learning platform. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 685–688 (2015). https://doi.org/10.1145/2733373.2807410

  31. Elephas: Distributed Deep Learning with Keras and Pyspark. http://maxpumperla.com/elephas/. Accessed 22 Mar 2022

  32. Tensorflowonspark. https://github.com/yahoo/TensorFlowOnSpark. Accessed 22 Mar 2022

  33. Liberty, E., et al.: Elastic machine learning algorithms in Amazon SageMaker. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 731–737 (2020). https://doi.org/10.1145/3318464.3386126

  34. Yuan, J., et al.: OneFlow: redesign the distributed deep learning framework from scratch, October 2021. https://doi.org/10.48550/arxiv.2110.15032

  35. Sergeev, A., Del Balso, M.: Horovod: fast and easy distributed deep learning in TensorFlow, February 2018. https://doi.org/10.48550/arxiv.1802.05799

  36. Khumoyun, A., Cui, Y., Hanku, L.: Spark based distributed Deep Learning framework for Big Data applications. In: 2016 International Conference on Information Science and Communication Technology, ICISCT 2016, December 2016. https://doi.org/10.1109/ICISCT.2016.7777390

  37. Lim, E.J., Ahn, S.Y., Park, Y.M., Choi, W.: Distributed deep learning framework based on shared memory for fast deep neural network training. In: 9th International Conference on Information and Communication Technology Convergence. Powered by Smart Intelligence, ICTC 2018, pp. 1239–1242, November 2018, doi: https://doi.org/10.1109/ICTC.2018.8539420

  38. Qubole data service (2011). https://docs.qubole.com/en/latest/user-guide/qds.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Berloco .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Berloco, F., Bevilacqua, V., Colucci, S. (2022). A Systematic Review of Distributed Deep Learning Frameworks for Big Data. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2022. Lecture Notes in Computer Science(), vol 13395. Springer, Cham. https://doi.org/10.1007/978-3-031-13832-4_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-13832-4_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-13831-7

  • Online ISBN: 978-3-031-13832-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics