Understanding and Leveraging the I/O Patterns of Emerging Machine Learning Analytics

Gainaru, Ana; Ganyushin, Dmitry; Xie, Bing; Kurc, Tahsin; Saltz, Joel; Oral, Sarp; Podhorszki, Norbert; Poeschel, Franz; Huebl, Axel; Klasky, Scott

doi:10.1007/978-3-030-96498-6_7

Ana Gainaru¹²,
Dmitry Ganyushin¹²,
Bing Xie¹²,
Tahsin Kurc¹³,
Joel Saltz¹³,
Sarp Oral¹²,
Norbert Podhorszki¹²,
Franz Poeschel¹⁵,
Axel Huebl¹⁴ &
…
Scott Klasky¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1512))

Included in the following conference series:

Smoky Mountains Computational Sciences and Engineering Conference

1048 Accesses

Abstract

The scientific community is currently experiencing unprecedented amounts of data generated by cutting-edge science facilities. Soon facilities will be producing up to 1 PB/s which will force scientist to use more autonomous techniques to learn from the data. The adoption of machine learning methods, like deep learning techniques, in large-scale workflows comes with a shift in the workflow’s computational and I/O patterns. These changes often include iterative processes and model architecture searches, in which datasets are analyzed multiple times in different formats with different model configurations in order to find accurate, reliable and efficient learning models. This shift in behavior brings changes in I/O patterns at the application level as well at the system level. These changes also bring new challenges for the HPC I/O teams, since these patterns contain more complex I/O workloads. In this paper we discuss the I/O patterns experienced by emerging analytical codes that rely on machine learning algorithms and highlight the challenges in designing efficient I/O transfers for such workflows. We comment on how to leverage the data access patterns in order to fetch in a more efficient way the required input data in the format and order given by the needs of the application and how to optimize the data path between collaborative processes. We will motivate our work and show performance gains with a study case of medical applications.

This manuscript has been authored in part by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baghban, A., Kahani, M., Nazari, M.A., Ahmadi, M.H., Yan, W.-M.: Sensitivity analysis and application of machine learning methods to predict the heat transfer performance of CNT/water nanofluid flows through coils. Int. J. Heat Mass Transf. 128, 825–835 (2019)
Article Google Scholar
Bei, Z., et al.: RFHOC: a random-forest approach to auto-tuning hadoop’s configuration. IEEE Trans. Parallel Distrib. Syst. 27(5), 1470–1483 (2016)
Article Google Scholar
Cummings, J., et al.: EFFIS: an end-to-end framework for fusion integrated simulation. In: 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp. 428–434 (2010)
Google Scholar
Deelman, E., et al.: The Pegasus workflow management system: translational computer science in practice. J. Comput. Sci. 52, 101200 (2021). Funding Acknowledgments: NSF 1664162
Google Scholar
Dieleman, S., Willett, K.W., Dambre, J.: Rotation-invariant convolutional neural networks for galaxy morphology prediction. Mon. Not. R. Astron. Soc. 450(2), 1441–1459 (2015)
Article Google Scholar
Docan, C., Parashar, M., Klasky, S.: Dataspaces: an interaction and coordination framework for coupled simulation workflows. Clust. Comput. 15(2), 163–181 (2012)
Article Google Scholar
Feng, X., Kumar, A., Recht, B., Ré, C.: Towards a unified architecture for in-RDBMS analytics. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, pp. 325–336. Association for Computing Machinery, Scottsdale, Arizona, USA, May 2012 (2012)
Google Scholar
Ferreira, D.R.: Applications of deep learning to nuclear fusion research (2018)
Google Scholar
Godoy, W.F., et al.: ADIOS 2: the adaptable input output system. A framework for high-performance data management. SoftwareX 12, 100561 (2020)
Google Scholar
Gupta, R., et al.: Characterizing immune responses in whole slide images of cancer with digital pathology and pathomics. Curr. Pathobiol. Rep. 8(4), 133–148 (2020)
Google Scholar
Günther, S., Ruthotto, L., Schroder, J.B., Cyr, E.C., Gauger, N.R.: Layer-parallel training of deep residual neural networks (2019). arXiv http://arxiv.org/abs/1812.04352
Hafiz, A.M.: Image classification by reinforcement learning with two-state Q-learning (2020)
Google Scholar
Harlap, A., et al.: PipeDream: Fast and efficient pipeline parallel DNN training (2018)
Google Scholar
Huo, Y., et al.: Consistent cortical reconstruction and multi-atlas brain segmentation. Neuroimage 138, 197–210 (2016)
Article Google Scholar
Jin, M., Homma, Y., Sim, A., Kroeger, W., Wu, K.: Performance prediction for data transfers in LCLS workflow. In: Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, SNTA 2019, pp. 37–44. Association for Computing Machinery, New York, NY, USA (2019)
Google Scholar
Kumar, A., McCann, R., Naughton, J., Patel, J.M.: Model selection management systems: the next frontier of advanced analytics. SIGMOD Rec. 44(4), 17–22 (2016)
Article Google Scholar
Li, M., Liu, Z., Shi, X., Jin, H.: ATCS: auto-tuning configurations of big data frameworks based on generative adversarial nets. IEEE Access 8, 50485–50496 (2020)
Article Google Scholar
Liang, C.-J.M., et al.: AutoSys: the design and operation of learning-augmented systems. In: 2020 USENIX Annual Technical Conference, July 2020, pp. 323–336. USENIX Association (2020)
Google Scholar
Liu, Y., et al.: Predict Ki-67 positive cells in H&E-stained images using deep learning independently from IHC-stained images. Front. Mol. Biosci. 7, 183 (2020)
Article Google Scholar
Miao, H., Li, A., Davis, L.S., Deshpande, A.: ModelHub: deep learning lifecycle management. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 1393–1394 (2017)
Google Scholar
Mushtaq, H., Liu, F., Costa, C., Liu, G., Hofstee, P., Al-Ars, Z.: SparkGA: a spark framework for cost effective, fast and accurate DNA analysis at scale. In: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM-BCB 2017, pp. 148–157. Association for Computing Machinery, New York, NY, USA (2017)
Google Scholar
n/a. Tuning Spark. https://spark.apache.org/docs/latest/tuning.html (Accessed 1 June 2021)
Neary, P.: Automatic hyperparameter tuning in deep convolutional neural networks using asynchronous reinforcement learning. In: 2018 IEEE International Conference on Cognitive Computing (ICCC), pp. 73–77 (2018)
Google Scholar
Patton, R.M., et al.: Exascale deep learning to accelerate cancer research. CoRR, abs/1909.12291 (2019)
Google Scholar
Potapov, A., Rodionov, S.: Genetic algorithms with DNN-based trainable crossover as an example of partial specialization of general search. In: Everitt, T., Goertzel, B., Potapov, A. (eds.) AGI 2017. LNCS (LNAI), vol. 10414, pp. 101–111. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63703-7_10
Chapter Google Scholar
Real, E., et al.: Large-scale evolution of image classifiers (2017)
Google Scholar
Saffari, A., Leistner, C., Santner, J., Godec, M., Bischof, H.: On-line random forests. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp. 1393–1400 (2009)
Google Scholar
Scardapane, S., Wang, D.: Randomness in neural networks: an overview. WIREs Data Min. Knowl. Discov. 7(2), e1200 (2017)
Google Scholar
Schwarz, N., Campbell, S., Hexemer, A., Mehta, A., Thayer, J.: Enabling scientific discovery at next-generation light sources with advanced AI and HPC. In: Nichols, J., Verastegui, B., Maccabe, A.B., Hernandez, O., Parete-Koon, S., Ahearn, T. (eds.) SMC 2020. CCIS, vol. 1315, pp. 145–156. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63393-6_10
Tarlow, D., Batra, D., Kohli, P., Kolmogorov, V.: Dynamic tree block coordinate ascent. In: ICML, pp. 113–120 (2011)
Google Scholar
Tax, T.M.S., Mediano, P.A.M., Shanahan, M.: The partial information decomposition of generative neural network models. Entropy 19(9), 474 (2017)
Google Scholar
Vartak, M., et al.: ModelDB: a system for machine learning model management. In: Proceedings of the Workshop on Human-In-the-Loop Data Analytics. HILDA 2016. Association for Computing Machinery (2016)
Google Scholar
Vu, Q.D., et al.: Methods for segmentation and classification of digital microscopy tissue images. Front. Bioeng. Biotechnolo. 7, 53 (2019)
Google Scholar
Wang, H., Rafatirad, S., Homayoun, H.: A+ tuning: architecture+application auto-tuning for in-memory data-processing frameworks. In: 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS), pp. 163–166 (2019)
Google Scholar
Wawrzyński, P., Zawistowski, P., Lepak, Ł.: Automatic hyperparameter tuning in on-line learning: classic momentum and adam. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2020)
Google Scholar
Wozniak, J.M., et al.: Scaling deep learning for cancer with advanced workflow storage integration. In: Proceedings of MLHPC 2018, Proceedings of MLHPC 2018: Machine Learning in HPC Environments, Held in Conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis, February 2019, pp. 114–123 (2019)
Google Scholar
Xu, Z., Kakde, D., Chaudhuri, A.: Automatic hyperparameter tuning method for local outlier factor, with applications to anomaly detection. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 4201–4207 (2019)
Google Scholar
Yang, F., Chen, Z., Gangopadhyay, A.: Using randomness to improve robustness of tree-based models against evasion attacks. IEEE Trans. Knowl. Data Eng., 25–35 (2020)
Google Scholar
Zhang, P.: A novel feature selection method based on global sensitivity analysis with application in machine learning-based prediction model. Appl. Soft Comput. 85, 105859 (2019)
Article Google Scholar
Zhang, S., Liang, G., Pan, S., Zheng, L.: A fast medical image super resolution method based on deep learning network. IEEE Access 7, 12319–12327 (2019)
Article Google Scholar
Zhang, Z., Yin, L., Peng, Y., Li, D.: A quick survey on large scale distributed deep learning systems. In: 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), pp. 1052–1056 (2018)
Google Scholar
Zhou, J., Troyanskaya, O.G.: Deep supervised and convolutional generative stochastic network for protein secondary structure prediction (2014)
Google Scholar

Download references

Acknowledgements

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This work was partially funded by the Center of Advanced Systems Understanding (CASUS), which is financed by Germany’s Federal Ministry of Education and Research (BMBF) and by the Saxon Ministry for Science, Culture and Tourism (SMWK) with tax funds on the basis of the budget approved by the Saxon State Parliament.

Author information

Authors and Affiliations

Oak Ridge National Laboratory, Oak Ridge, USA
Ana Gainaru, Dmitry Ganyushin, Bing Xie, Sarp Oral, Norbert Podhorszki & Scott Klasky
Stony Brook University, New York, USA
Tahsin Kurc & Joel Saltz
Lawrence Berkeley National Laboratory (LBNL), Berkeley, USA
Axel Huebl
Center for Advanced Systems Understanding, Görlitz, Germany
Franz Poeschel

Authors

Ana Gainaru
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Ganyushin
View author publications
You can also search for this author in PubMed Google Scholar
Bing Xie
View author publications
You can also search for this author in PubMed Google Scholar
Tahsin Kurc
View author publications
You can also search for this author in PubMed Google Scholar
Joel Saltz
View author publications
You can also search for this author in PubMed Google Scholar
Sarp Oral
View author publications
You can also search for this author in PubMed Google Scholar
Norbert Podhorszki
View author publications
You can also search for this author in PubMed Google Scholar
Franz Poeschel
View author publications
You can also search for this author in PubMed Google Scholar
Axel Huebl
View author publications
You can also search for this author in PubMed Google Scholar
Scott Klasky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ana Gainaru .

Editor information

Editors and Affiliations

Oak Ridge National Laboratory, Oak Ridge, TN, USA
Jeffrey Nichols
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Arthur ‘Barney’ Maccabe
Oak Ridge National Laboratory, Oak Ridge, TN, USA
James Nutaro
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Swaroop Pophale
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Pravallika Devineni
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Theresa Ahearn
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Becky Verastegui

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gainaru, A. et al. (2022). Understanding and Leveraging the I/O Patterns of Emerging Machine Learning Analytics. In: Nichols, J., et al. Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation. SMC 2021. Communications in Computer and Information Science, vol 1512. Springer, Cham. https://doi.org/10.1007/978-3-030-96498-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-96498-6_7
Published: 10 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96497-9
Online ISBN: 978-3-030-96498-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics