A Scalable and Automated Machine Learning Framework to Support Risk Management

Ferreira, Luís; Pilastri, André; Martins, Carlos; Santos, Pedro; Cortez, Paulo

doi:10.1007/978-3-030-71158-0_14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12613))

Included in the following conference series:

International Conference on Agents and Artificial Intelligence

859 Accesses
5 Citations
1 Altmetric

Abstract

Due to the growth of data and widespread usage of Machine Learning (ML) by non-experts, automation and scalability are becoming key issues for ML. This paper presents an automated and scalable framework for ML that requires minimum human input. We designed the framework for the domain of telecommunications risk management. This domain often requires non-ML-experts to continuously update supervised learning models that are trained on huge amounts of data. Thus, the framework uses Automated Machine Learning (AutoML), to select and tune the ML models, and distributed ML, to deal with Big Data. The modules included in the framework are task detection (to detect classification or regression), data preprocessing, feature selection, model training, and deployment. In this paper, we focus the experiments on the model training module. We first analyze the capabilities of eight AutoML tools: Auto-Gluon, Auto-Keras, Auto-Sklearn, Auto-Weka, H2O AutoML, Rminer, TPOT, and TransmogrifAI. Then, to select the tool for model training, we performed a benchmark with the only two tools that address a distributed ML (H2O AutoML and TransmogrifAI). The experiments used three real-world datasets from the telecommunications domain (churn, event forecasting, and fraud detection), as provided by an analytics company. The experiments allowed us to measure the computational effort and predictive capability of the AutoML tools. Both tools obtained high-quality results and did not present substantial predictive differences. Nevertheless, H2O AutoML was selected by the analytics company for the model training module, since it was considered a more mature technology that presented a more interesting set of features (e.g., integration with more platforms). After choosing H2O AutoML for the ML training, we selected the technologies for the remaining components of the architecture (e.g., data preprocessing and web interface).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/liao-iu/scalaTS/.

References

Apache Spark: extracting, transforming and selecting features - Spark 2.4.5 documentation (2020) https://spark.apache.org/docs/latest/ml-features
Apache Spark: ML pipelines - Spark 2.4.5 documentation (2020). https://spark.apache.org/docs/latest/ml-pipeline.html
Auto-Gluon: AutoGluon: AutoML toolkit for deep learning — AutoGluon documentation 0.0.1 documentation (2020). https://autogluon.mxnet.io/
Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997). https://doi.org/10.1016/s0004-3702(97)00063-5
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002). https://doi.org/10.1613/jair.953
Article MATH Google Scholar
Chollet, F., et al.: Keras (2015). https://keras.io
Cook, D.: Practical Machine Learning with H2O: Powerful, Scalable Techniques for Deep Learning and AI. O’Reilly Media, Inc., Sebastopol (2016)
Google Scholar
Cortez, P.: Data mining with neural networks and support vector machines using the R/rminer tool. In: Perner, P. (ed.) ICDM 2010. LNCS (LNAI), vol. 6171, pp. 572–583. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14400-4_44
Chapter Google Scholar
Cortez, P.: A tutorial on using the rminer r package for data mining tasks, Technical report, Universidade do Minho, Escola de Engenharia (EEng) (2015)
Google Scholar
Cortez, P.: Package ‘rminer’ (2020). https://cran.r-project.org/web/packages/rminer/rminer.pdf
Darwiche, A.: Human-level intelligence or animal-like abilities? Commun. ACM 61(10), 56–67 (2018). https://doi.org/10.1145/3271625
Elshawi, R., Maher, M., Sakr, S.: Automated machine learning: state-of-the-art and open challenges. arXiv preprint arXiv:1906.02287 (2019)
Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey. arXiv preprint arXiv:1808.05377 (2018)
Ferreira, L., Pilastri, A., Martins, C., Santos, P., Cortez, P.: An automated and distributed machine learning framework for telecommunications risk management. In: Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, pp. 99–107. INSTICC, SciTePress (2020). https://doi.org/10.5220/0008952800990107
Feurer, M., et al.: Efficient and robust automated machine learning. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 7–12 December 2015, Montreal, Quebec, Canada, pp. 2962–2970 (2015). http://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning
Feurer, M., Springenberg, J.T., Hutter, F.: Initializing Bayesian hyperparameter optimization via meta-learning. In: Bonet, B., Koenig, S. (eds.) Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 25–30 January 2015, Austin, Texas, USA, pp. 1128–1135. AAAI Press (2015). http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/10029
Gijsbers, P., LeDell, E., Thomas, J., Poirier, S., Bischl, B., Vanschoren, J.: An open source autoML benchmark. arXiv preprint arXiv:1907.00909 (2019)
Guyon, I., et al.: Design of the 2015 chalearn autoML challenge. In: 2015 International Joint Conference on Neural Networks, IJCNN 2015, Killarney, Ireland, 12–17 July 2015, pp. 1–8. IEEE (2015). https://doi.org/10.1109/IJCNN.2015.7280767
Guyon, I., et al.: A brief review of the chalearn automl challenge: any-time any-dataset learning without human intervention. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Proceedings of the 2016 Workshop on Automatic Machine Learning, AutoML 2016, co-located with 33rd International Conference on Machine Learning (ICML 2016), New York City, NY, USA, 24 June 2016. JMLR Workshop and Conference Proceedings, vol. 64, pp. 21–30. JMLR.org (2016)
Google Scholar
Guyon, I., et al.: Analysis of the AutoML challenge series 2015–2018. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Automated Machine Learning. TSSCML, pp. 177–219. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5_10
Chapter Google Scholar
H2O.ai: H2O AutoML, June 2017. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html, h2O version 3.30.0.1
He, X., Zhao, K., Chu, X.: AutoML: a survey of the state-of-the-art. arXiv preprint arXiv:1908.00709 (2019)
Jin, H., Song, Q., Hu, X.: Auto-keras: an efficient neural architecture search system. In: Teredesai, A., Kumar, V., Li, Y., Rosales, R., Terzi, E., Karypis, G. (eds.) Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, 4–8 August 2019, pp. 1946–1956. ACM (2019). https://doi.org/10.1145/3292500.3330648
Kotthoff, L., Thornton, C., Hoos, H.H., Hutter, F., Leyton-Brown, K.: Auto-weka 2.0: automatic model selection and hyperparameter optimization in WEKA. J. Mach. Learn. Res. 18, 25:1–25:5 (2017). http://jmlr.org/papers/v18/16-261.html
Le, T.T., Fu, W., Moore, J.H.: Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics 36(1), 250–256 (2020)
Article Google Scholar
Oliveira, N., Cortez, P., Areal, N.: The impact of microblogging data for stock market prediction: Using twitter to predict returns, volatility, trading volume and survey sentiment indices. Expert Syst. Appl. 73, 125–144 (2017). https://doi.org/10.1016/j.eswa.2016.12.036
Article Google Scholar
Olson, R.S., Urbanowicz, R.J., Andrews, P.C., Lavender, N.A., Kidd, L.C., Moore,J.H.: Automating biomedical data science through tree-based pipeline optimization. In: Squillero, G., Burelli, P. (eds.) EvoApplications 2016. LNCS, vol. 9597, pp. 123–137. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31204-0_9
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011). http://dl.acm.org/citation.cfm?id=2078195
Peteiro-Barral, D., Guijarro-Berdiñas, B.: A survey of methods for distributed machine learning. Prog. Artif. Intell. 2(1), 1–11 (2013). https://doi.org/10.1007/s13748-012-0035-5
Article Google Scholar
Salesforce: Transmogrifai (2019). https://docs.transmogrif.ai/en/stable/
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855 (2013). https://doi.org/10.1145/2487575.2487629
Truong, A., Walters, A., Goodsitt, J., Hines, K., Bruss, B., Farivar, R.: Towards automated machine learning: Evaluation and comparison of autoML approaches and tools. arXiv preprint arXiv:1908.05557 (2019)
Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013). https://doi.org/10.1145/2641190.2641198
Article Google Scholar
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Amsterdam (2016)
Google Scholar
Yao, Q., et al.: Taking human out of learning applications: a survey on automated machine learning. arXiv preprint arXiv:1810.13306 (2018)
Zöller, M.A., Huber, M.F.: Benchmark and survey of automated machine learning frameworks. Technical report. https://www.researchgate.net/publication/332750780

Download references

Acknowledgements

This work was executed under the project IRMDA - Intelligent Risk Management for the Digital Age, Individual Project, NUP: POCI-01-0247-FEDER-038526, co-funded by the Incentive System for Research and Technological Development, from the Thematic Operational Program Competitiveness of the national framework program - Portugal2020.

Author information

Authors and Affiliations

EPMQ - IT Engineering Maturity and Quality Lab, CCG ZGDV Institute, Guimarães, Portugal
Luís Ferreira
ALGORITMI Centre, Department of Information Systems, University of Minho, Guimarães, Portugal
Luís Ferreira, André Pilastri & Paulo Cortez
WeDo Technologies, Braga, Portugal
Carlos Martins & Pedro Santos

Authors

Luís Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
André Pilastri
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Martins
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Santos
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Cortez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luís Ferreira .

Editor information

Editors and Affiliations

LIACC, University of Porto, Porto, Portugal
Ana Paula Rocha
ICREA, Institute of Evolutionary Biology, Barcelona, Spain
Luc Steels
Leiden University, Leiden, The Netherlands
Jaap van den Herik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferreira, L., Pilastri, A., Martins, C., Santos, P., Cortez, P. (2021). A Scalable and Automated Machine Learning Framework to Support Risk Management. In: Rocha, A.P., Steels, L., van den Herik, J. (eds) Agents and Artificial Intelligence. ICAART 2020. Lecture Notes in Computer Science(), vol 12613. Springer, Cham. https://doi.org/10.1007/978-3-030-71158-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-71158-0_14
Published: 14 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71157-3
Online ISBN: 978-3-030-71158-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics