Skip to main content

Abstract

The scientific community is currently experiencing unprecedented amounts of data generated by cutting-edge science facilities. Soon facilities will be producing up to 1 PB/s which will force scientist to use more autonomous techniques to learn from the data. The adoption of machine learning methods, like deep learning techniques, in large-scale workflows comes with a shift in the workflow’s computational and I/O patterns. These changes often include iterative processes and model architecture searches, in which datasets are analyzed multiple times in different formats with different model configurations in order to find accurate, reliable and efficient learning models. This shift in behavior brings changes in I/O patterns at the application level as well at the system level. These changes also bring new challenges for the HPC I/O teams, since these patterns contain more complex I/O workloads. In this paper we discuss the I/O patterns experienced by emerging analytical codes that rely on machine learning algorithms and highlight the challenges in designing efficient I/O transfers for such workflows. We comment on how to leverage the data access patterns in order to fetch in a more efficient way the required input data in the format and order given by the needs of the application and how to optimize the data path between collaborative processes. We will motivate our work and show performance gains with a study case of medical applications.

This manuscript has been authored in part by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baghban, A., Kahani, M., Nazari, M.A., Ahmadi, M.H., Yan, W.-M.: Sensitivity analysis and application of machine learning methods to predict the heat transfer performance of CNT/water nanofluid flows through coils. Int. J. Heat Mass Transf. 128, 825–835 (2019)

    Article  Google Scholar 

  2. Bei, Z., et al.: RFHOC: a random-forest approach to auto-tuning hadoop’s configuration. IEEE Trans. Parallel Distrib. Syst. 27(5), 1470–1483 (2016)

    Article  Google Scholar 

  3. Cummings, J., et al.: EFFIS: an end-to-end framework for fusion integrated simulation. In: 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp. 428–434 (2010)

    Google Scholar 

  4. Deelman, E., et al.: The Pegasus workflow management system: translational computer science in practice. J. Comput. Sci. 52, 101200 (2021). Funding Acknowledgments: NSF 1664162

    Google Scholar 

  5. Dieleman, S., Willett, K.W., Dambre, J.: Rotation-invariant convolutional neural networks for galaxy morphology prediction. Mon. Not. R. Astron. Soc. 450(2), 1441–1459 (2015)

    Article  Google Scholar 

  6. Docan, C., Parashar, M., Klasky, S.: Dataspaces: an interaction and coordination framework for coupled simulation workflows. Clust. Comput. 15(2), 163–181 (2012)

    Article  Google Scholar 

  7. Feng, X., Kumar, A., Recht, B., Ré, C.: Towards a unified architecture for in-RDBMS analytics. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, pp. 325–336. Association for Computing Machinery, Scottsdale, Arizona, USA, May 2012 (2012)

    Google Scholar 

  8. Ferreira, D.R.: Applications of deep learning to nuclear fusion research (2018)

    Google Scholar 

  9. Godoy, W.F., et al.: ADIOS 2: the adaptable input output system. A framework for high-performance data management. SoftwareX 12, 100561 (2020)

    Google Scholar 

  10. Gupta, R., et al.: Characterizing immune responses in whole slide images of cancer with digital pathology and pathomics. Curr. Pathobiol. Rep. 8(4), 133–148 (2020)

    Google Scholar 

  11. Günther, S., Ruthotto, L., Schroder, J.B., Cyr, E.C., Gauger, N.R.: Layer-parallel training of deep residual neural networks (2019). arXiv http://arxiv.org/abs/1812.04352

  12. Hafiz, A.M.: Image classification by reinforcement learning with two-state Q-learning (2020)

    Google Scholar 

  13. Harlap, A., et al.: PipeDream: Fast and efficient pipeline parallel DNN training (2018)

    Google Scholar 

  14. Huo, Y., et al.: Consistent cortical reconstruction and multi-atlas brain segmentation. Neuroimage 138, 197–210 (2016)

    Article  Google Scholar 

  15. Jin, M., Homma, Y., Sim, A., Kroeger, W., Wu, K.: Performance prediction for data transfers in LCLS workflow. In: Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, SNTA 2019, pp. 37–44. Association for Computing Machinery, New York, NY, USA (2019)

    Google Scholar 

  16. Kumar, A., McCann, R., Naughton, J., Patel, J.M.: Model selection management systems: the next frontier of advanced analytics. SIGMOD Rec. 44(4), 17–22 (2016)

    Article  Google Scholar 

  17. Li, M., Liu, Z., Shi, X., Jin, H.: ATCS: auto-tuning configurations of big data frameworks based on generative adversarial nets. IEEE Access 8, 50485–50496 (2020)

    Article  Google Scholar 

  18. Liang, C.-J.M., et al.: AutoSys: the design and operation of learning-augmented systems. In: 2020 USENIX Annual Technical Conference, July 2020, pp. 323–336. USENIX Association (2020)

    Google Scholar 

  19. Liu, Y., et al.: Predict Ki-67 positive cells in H&E-stained images using deep learning independently from IHC-stained images. Front. Mol. Biosci. 7, 183 (2020)

    Article  Google Scholar 

  20. Miao, H., Li, A., Davis, L.S., Deshpande, A.: ModelHub: deep learning lifecycle management. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 1393–1394 (2017)

    Google Scholar 

  21. Mushtaq, H., Liu, F., Costa, C., Liu, G., Hofstee, P., Al-Ars, Z.: SparkGA: a spark framework for cost effective, fast and accurate DNA analysis at scale. In: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM-BCB 2017, pp. 148–157. Association for Computing Machinery, New York, NY, USA (2017)

    Google Scholar 

  22. n/a. Tuning Spark. https://spark.apache.org/docs/latest/tuning.html (Accessed 1 June 2021)

  23. Neary, P.: Automatic hyperparameter tuning in deep convolutional neural networks using asynchronous reinforcement learning. In: 2018 IEEE International Conference on Cognitive Computing (ICCC), pp. 73–77 (2018)

    Google Scholar 

  24. Patton, R.M., et al.: Exascale deep learning to accelerate cancer research. CoRR, abs/1909.12291 (2019)

    Google Scholar 

  25. Potapov, A., Rodionov, S.: Genetic algorithms with DNN-based trainable crossover as an example of partial specialization of general search. In: Everitt, T., Goertzel, B., Potapov, A. (eds.) AGI 2017. LNCS (LNAI), vol. 10414, pp. 101–111. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63703-7_10

    Chapter  Google Scholar 

  26. Real, E., et al.: Large-scale evolution of image classifiers (2017)

    Google Scholar 

  27. Saffari, A., Leistner, C., Santner, J., Godec, M., Bischof, H.: On-line random forests. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp. 1393–1400 (2009)

    Google Scholar 

  28. Scardapane, S., Wang, D.: Randomness in neural networks: an overview. WIREs Data Min. Knowl. Discov. 7(2), e1200 (2017)

    Google Scholar 

  29. Schwarz, N., Campbell, S., Hexemer, A., Mehta, A., Thayer, J.: Enabling scientific discovery at next-generation light sources with advanced AI and HPC. In: Nichols, J., Verastegui, B., Maccabe, A.B., Hernandez, O., Parete-Koon, S., Ahearn, T. (eds.) SMC 2020. CCIS, vol. 1315, pp. 145–156. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63393-6_10

  30. Tarlow, D., Batra, D., Kohli, P., Kolmogorov, V.: Dynamic tree block coordinate ascent. In: ICML, pp. 113–120 (2011)

    Google Scholar 

  31. Tax, T.M.S., Mediano, P.A.M., Shanahan, M.: The partial information decomposition of generative neural network models. Entropy 19(9), 474 (2017)

    Google Scholar 

  32. Vartak, M., et al.: ModelDB: a system for machine learning model management. In: Proceedings of the Workshop on Human-In-the-Loop Data Analytics. HILDA 2016. Association for Computing Machinery (2016)

    Google Scholar 

  33. Vu, Q.D., et al.: Methods for segmentation and classification of digital microscopy tissue images. Front. Bioeng. Biotechnolo. 7, 53 (2019)

    Google Scholar 

  34. Wang, H., Rafatirad, S., Homayoun, H.: A+ tuning: architecture+application auto-tuning for in-memory data-processing frameworks. In: 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS), pp. 163–166 (2019)

    Google Scholar 

  35. Wawrzyński, P., Zawistowski, P., Lepak, Ł.: Automatic hyperparameter tuning in on-line learning: classic momentum and adam. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2020)

    Google Scholar 

  36. Wozniak, J.M., et al.: Scaling deep learning for cancer with advanced workflow storage integration. In: Proceedings of MLHPC 2018, Proceedings of MLHPC 2018: Machine Learning in HPC Environments, Held in Conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis, February 2019, pp. 114–123 (2019)

    Google Scholar 

  37. Xu, Z., Kakde, D., Chaudhuri, A.: Automatic hyperparameter tuning method for local outlier factor, with applications to anomaly detection. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 4201–4207 (2019)

    Google Scholar 

  38. Yang, F., Chen, Z., Gangopadhyay, A.: Using randomness to improve robustness of tree-based models against evasion attacks. IEEE Trans. Knowl. Data Eng., 25–35 (2020)

    Google Scholar 

  39. Zhang, P.: A novel feature selection method based on global sensitivity analysis with application in machine learning-based prediction model. Appl. Soft Comput. 85, 105859 (2019)

    Article  Google Scholar 

  40. Zhang, S., Liang, G., Pan, S., Zheng, L.: A fast medical image super resolution method based on deep learning network. IEEE Access 7, 12319–12327 (2019)

    Article  Google Scholar 

  41. Zhang, Z., Yin, L., Peng, Y., Li, D.: A quick survey on large scale distributed deep learning systems. In: 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), pp. 1052–1056 (2018)

    Google Scholar 

  42. Zhou, J., Troyanskaya, O.G.: Deep supervised and convolutional generative stochastic network for protein secondary structure prediction (2014)

    Google Scholar 

Download references

Acknowledgements

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This work was partially funded by the Center of Advanced Systems Understanding (CASUS), which is financed by Germany’s Federal Ministry of Education and Research (BMBF) and by the Saxon Ministry for Science, Culture and Tourism (SMWK) with tax funds on the basis of the budget approved by the Saxon State Parliament.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ana Gainaru .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gainaru, A. et al. (2022). Understanding and Leveraging the I/O Patterns of Emerging Machine Learning Analytics. In: Nichols, J., et al. Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation. SMC 2021. Communications in Computer and Information Science, vol 1512. Springer, Cham. https://doi.org/10.1007/978-3-030-96498-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-96498-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-96497-9

  • Online ISBN: 978-3-030-96498-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics