Abstract
The scientific community is currently experiencing unprecedented amounts of data generated by cutting-edge science facilities. Soon facilities will be producing up to 1 PB/s which will force scientist to use more autonomous techniques to learn from the data. The adoption of machine learning methods, like deep learning techniques, in large-scale workflows comes with a shift in the workflow’s computational and I/O patterns. These changes often include iterative processes and model architecture searches, in which datasets are analyzed multiple times in different formats with different model configurations in order to find accurate, reliable and efficient learning models. This shift in behavior brings changes in I/O patterns at the application level as well at the system level. These changes also bring new challenges for the HPC I/O teams, since these patterns contain more complex I/O workloads. In this paper we discuss the I/O patterns experienced by emerging analytical codes that rely on machine learning algorithms and highlight the challenges in designing efficient I/O transfers for such workflows. We comment on how to leverage the data access patterns in order to fetch in a more efficient way the required input data in the format and order given by the needs of the application and how to optimize the data path between collaborative processes. We will motivate our work and show performance gains with a study case of medical applications.
This manuscript has been authored in part by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baghban, A., Kahani, M., Nazari, M.A., Ahmadi, M.H., Yan, W.-M.: Sensitivity analysis and application of machine learning methods to predict the heat transfer performance of CNT/water nanofluid flows through coils. Int. J. Heat Mass Transf. 128, 825–835 (2019)
Bei, Z., et al.: RFHOC: a random-forest approach to auto-tuning hadoop’s configuration. IEEE Trans. Parallel Distrib. Syst. 27(5), 1470–1483 (2016)
Cummings, J., et al.: EFFIS: an end-to-end framework for fusion integrated simulation. In: 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp. 428–434 (2010)
Deelman, E., et al.: The Pegasus workflow management system: translational computer science in practice. J. Comput. Sci. 52, 101200 (2021). Funding Acknowledgments: NSF 1664162
Dieleman, S., Willett, K.W., Dambre, J.: Rotation-invariant convolutional neural networks for galaxy morphology prediction. Mon. Not. R. Astron. Soc. 450(2), 1441–1459 (2015)
Docan, C., Parashar, M., Klasky, S.: Dataspaces: an interaction and coordination framework for coupled simulation workflows. Clust. Comput. 15(2), 163–181 (2012)
Feng, X., Kumar, A., Recht, B., Ré, C.: Towards a unified architecture for in-RDBMS analytics. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, pp. 325–336. Association for Computing Machinery, Scottsdale, Arizona, USA, May 2012 (2012)
Ferreira, D.R.: Applications of deep learning to nuclear fusion research (2018)
Godoy, W.F., et al.: ADIOS 2: the adaptable input output system. A framework for high-performance data management. SoftwareX 12, 100561 (2020)
Gupta, R., et al.: Characterizing immune responses in whole slide images of cancer with digital pathology and pathomics. Curr. Pathobiol. Rep. 8(4), 133–148 (2020)
Günther, S., Ruthotto, L., Schroder, J.B., Cyr, E.C., Gauger, N.R.: Layer-parallel training of deep residual neural networks (2019). arXiv http://arxiv.org/abs/1812.04352
Hafiz, A.M.: Image classification by reinforcement learning with two-state Q-learning (2020)
Harlap, A., et al.: PipeDream: Fast and efficient pipeline parallel DNN training (2018)
Huo, Y., et al.: Consistent cortical reconstruction and multi-atlas brain segmentation. Neuroimage 138, 197–210 (2016)
Jin, M., Homma, Y., Sim, A., Kroeger, W., Wu, K.: Performance prediction for data transfers in LCLS workflow. In: Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, SNTA 2019, pp. 37–44. Association for Computing Machinery, New York, NY, USA (2019)
Kumar, A., McCann, R., Naughton, J., Patel, J.M.: Model selection management systems: the next frontier of advanced analytics. SIGMOD Rec. 44(4), 17–22 (2016)
Li, M., Liu, Z., Shi, X., Jin, H.: ATCS: auto-tuning configurations of big data frameworks based on generative adversarial nets. IEEE Access 8, 50485–50496 (2020)
Liang, C.-J.M., et al.: AutoSys: the design and operation of learning-augmented systems. In: 2020 USENIX Annual Technical Conference, July 2020, pp. 323–336. USENIX Association (2020)
Liu, Y., et al.: Predict Ki-67 positive cells in H&E-stained images using deep learning independently from IHC-stained images. Front. Mol. Biosci. 7, 183 (2020)
Miao, H., Li, A., Davis, L.S., Deshpande, A.: ModelHub: deep learning lifecycle management. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 1393–1394 (2017)
Mushtaq, H., Liu, F., Costa, C., Liu, G., Hofstee, P., Al-Ars, Z.: SparkGA: a spark framework for cost effective, fast and accurate DNA analysis at scale. In: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM-BCB 2017, pp. 148–157. Association for Computing Machinery, New York, NY, USA (2017)
n/a. Tuning Spark. https://spark.apache.org/docs/latest/tuning.html (Accessed 1 June 2021)
Neary, P.: Automatic hyperparameter tuning in deep convolutional neural networks using asynchronous reinforcement learning. In: 2018 IEEE International Conference on Cognitive Computing (ICCC), pp. 73–77 (2018)
Patton, R.M., et al.: Exascale deep learning to accelerate cancer research. CoRR, abs/1909.12291 (2019)
Potapov, A., Rodionov, S.: Genetic algorithms with DNN-based trainable crossover as an example of partial specialization of general search. In: Everitt, T., Goertzel, B., Potapov, A. (eds.) AGI 2017. LNCS (LNAI), vol. 10414, pp. 101–111. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63703-7_10
Real, E., et al.: Large-scale evolution of image classifiers (2017)
Saffari, A., Leistner, C., Santner, J., Godec, M., Bischof, H.: On-line random forests. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp. 1393–1400 (2009)
Scardapane, S., Wang, D.: Randomness in neural networks: an overview. WIREs Data Min. Knowl. Discov. 7(2), e1200 (2017)
Schwarz, N., Campbell, S., Hexemer, A., Mehta, A., Thayer, J.: Enabling scientific discovery at next-generation light sources with advanced AI and HPC. In: Nichols, J., Verastegui, B., Maccabe, A.B., Hernandez, O., Parete-Koon, S., Ahearn, T. (eds.) SMC 2020. CCIS, vol. 1315, pp. 145–156. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63393-6_10
Tarlow, D., Batra, D., Kohli, P., Kolmogorov, V.: Dynamic tree block coordinate ascent. In: ICML, pp. 113–120 (2011)
Tax, T.M.S., Mediano, P.A.M., Shanahan, M.: The partial information decomposition of generative neural network models. Entropy 19(9), 474 (2017)
Vartak, M., et al.: ModelDB: a system for machine learning model management. In: Proceedings of the Workshop on Human-In-the-Loop Data Analytics. HILDA 2016. Association for Computing Machinery (2016)
Vu, Q.D., et al.: Methods for segmentation and classification of digital microscopy tissue images. Front. Bioeng. Biotechnolo. 7, 53 (2019)
Wang, H., Rafatirad, S., Homayoun, H.: A+ tuning: architecture+application auto-tuning for in-memory data-processing frameworks. In: 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS), pp. 163–166 (2019)
Wawrzyński, P., Zawistowski, P., Lepak, Ł.: Automatic hyperparameter tuning in on-line learning: classic momentum and adam. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2020)
Wozniak, J.M., et al.: Scaling deep learning for cancer with advanced workflow storage integration. In: Proceedings of MLHPC 2018, Proceedings of MLHPC 2018: Machine Learning in HPC Environments, Held in Conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis, February 2019, pp. 114–123 (2019)
Xu, Z., Kakde, D., Chaudhuri, A.: Automatic hyperparameter tuning method for local outlier factor, with applications to anomaly detection. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 4201–4207 (2019)
Yang, F., Chen, Z., Gangopadhyay, A.: Using randomness to improve robustness of tree-based models against evasion attacks. IEEE Trans. Knowl. Data Eng., 25–35 (2020)
Zhang, P.: A novel feature selection method based on global sensitivity analysis with application in machine learning-based prediction model. Appl. Soft Comput. 85, 105859 (2019)
Zhang, S., Liang, G., Pan, S., Zheng, L.: A fast medical image super resolution method based on deep learning network. IEEE Access 7, 12319–12327 (2019)
Zhang, Z., Yin, L., Peng, Y., Li, D.: A quick survey on large scale distributed deep learning systems. In: 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), pp. 1052–1056 (2018)
Zhou, J., Troyanskaya, O.G.: Deep supervised and convolutional generative stochastic network for protein secondary structure prediction (2014)
Acknowledgements
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This work was partially funded by the Center of Advanced Systems Understanding (CASUS), which is financed by Germany’s Federal Ministry of Education and Research (BMBF) and by the Saxon Ministry for Science, Culture and Tourism (SMWK) with tax funds on the basis of the budget approved by the Saxon State Parliament.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Gainaru, A. et al. (2022). Understanding and Leveraging the I/O Patterns of Emerging Machine Learning Analytics. In: Nichols, J., et al. Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation. SMC 2021. Communications in Computer and Information Science, vol 1512. Springer, Cham. https://doi.org/10.1007/978-3-030-96498-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-96498-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96497-9
Online ISBN: 978-3-030-96498-6
eBook Packages: Computer ScienceComputer Science (R0)