Temporal stability in predictive process monitoring

Teinemaa, Irene; Dumas, Marlon; Leontjeva, Anna; Maggi, Fabrizio Maria

doi:10.1007/s10618-018-0575-9

Temporal stability in predictive process monitoring

Published: 29 June 2018

Volume 32, pages 1306–1338, (2018)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Irene Teinemaa ORCID: orcid.org/0000-0001-8045-8238¹,
Marlon Dumas¹,
Anna Leontjeva¹ &
…
Fabrizio Maria Maggi¹

1415 Accesses
33 Citations
5 Altmetric
Explore all metrics

Abstract

Predictive process monitoring is concerned with the analysis of events produced during the execution of a business process in order to predict as early as possible the final outcome of an ongoing case. Traditionally, predictive process monitoring methods are optimized with respect to accuracy. However, in environments where users make decisions and take actions in response to the predictions they receive, it is equally important to optimize the stability of the successive predictions made for each case. To this end, this paper defines a notion of temporal stability for binary classification tasks in predictive process monitoring and evaluates existing methods with respect to both temporal stability and accuracy. We find that methods based on XGBoost and LSTM neural networks exhibit the highest temporal stability. We then show that temporal stability can be enhanced by hyperparameter-optimizing random forests and XGBoost classifiers with respect to inter-run stability. Finally, we show that time series smoothing techniques can further enhance temporal stability at the expense of slightly lower accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predictive Business Process Monitoring with LSTM Neural Networks

Machine Learning in Business Process Monitoring: A Comparison of Deep Learning and Classical Approaches Used for Outcome Prediction

Article Open access 08 April 2020

Predictive Process Monitoring in Operational Logistics: A Case Study in Aviation

Notes

Inter-run stability refers to the MSPD metric introduced in Liu et al. (2017): \(\textit{MSPD}(f) = 2\mathbb {E}_{x_i}[Var(f(x_i)) - Cov(f_j (x_i), f_k(x_i))],\) where \(\mathbb {E}_{x_i}\) is the expectation over all validation data, f is a mapping from a sample \(x_i\) to a label \(y_i\) on a given run, \(Var(f(x_i))\) is the variance of the predictions of a single data point over the model runs, and \(Cov(f_j (x_i), f_k(x_i))\) is the covariance of predictions of a single data point over two model runs.
Production log: https://data.4tu.nl/repository/uuid:68726926-5ac5-4fab-b873-ee76ea412399, other logs: https://data.4tu.nl/repository/collection:event_logs_real.
Preprocessed data: https://github.com/irhete/stability-predictive-monitoring.
http://scikit-learn.org/.
https://github.com/fchollet/keras/.
http://www.deeplearning.net/software/theano/.

References

Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305
MathSciNet MATH Google Scholar
Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2:499–526
MathSciNet MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 785–794
de Leoni M, van der Aalst WM, Dees M (2016) A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs. Inf Syst 56:235–257
Article Google Scholar
Di Francescomarino C, Dumas M, Maggi FM, Teinemaa I (2017) Clustering-based predictive process monitoring. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2016.2645153
Dumas M, La Rosa M, Mendling J, Reijers HA (2013) Fundamentals of business process management. Springer, Berlin
Book Google Scholar
Elisseeff A, Evgeniou T, Pontil M (2005) Stability of randomized learning algorithms. J Mach Learn Res 6:55–79
MathSciNet MATH Google Scholar
Evermann J, Rehse JR, Fettke P (2017) Predicting process behaviour using deep learning. Decis Support Syst 100:129–40
Article Google Scholar
Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15(1):3133–3181
MathSciNet MATH Google Scholar
Guo C, Pleiss G, Sun Y, Weinberger KQ (2017) On calibration of modern neural networks. arXiv preprint arXiv:1706.04599
Lakshmanan GT, Duan S, Keyser PT, Curbera F, Khalaf R (2010) Predictive analytics for semi-structured case oriented business processes. In: International conference on business process management. Springer, Berlin, pp 640–651
Leontjeva A, Conforti R, Di Francescomarino C, Dumas M, Maggi FM (2015) Complex symbolic sequence encodings for predictive monitoring of business processes. In: International conference on business process management. Springer, Berlin, pp 297–313
Lin YF, Chen HH, Tseng VS, Pei J, et al (2015) Reliable early classification on multivariate time series with numerical and categorical attributes. In: PAKDD (1), pp 199–211
Liu CB, Chamberlain BP, Little DA, Cardoso Â (2017) Generalising random forest parameter optimisation to include stability and cost. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp 102–113
Maggi FM, Di Francescomarino C, Dumas M, Ghidini C (2014) Predictive monitoring of business processes. In: International conference on advanced information systems engineering. Springer, Berlin, pp 457–472
Marquez-Chamorro AE, Resinas M, Ruiz-Cortes A (2017) Predictive monitoring of business processes: a survey. IEEE Trans Serv Comput
Metzger A, Leitner P, Ivanovic D, Schmieders E, Franklin R, Carro M, Dustdar S, Pohl K (2015) Comparing and combining predictive business process monitoring techniques. IEEE Trans Syst Man Cybern Syst 45(2):276–290
Article Google Scholar
Mori U, Mendiburu A, Keogh E, Lozano JA (2017) Reliable early classification of time series based on discriminating the classes over time. Data Min Knowl Discov 31(1):233–263
Article MathSciNet Google Scholar
Niculescu-Mizil A, Caruana R (2005) Predicting good probabilities with supervised learning. In: Proceedings of the 22nd international conference on machine learning, ACM, pp 625–632
Olson RS, La Cava W, Mustahsan Z, Varik A, Moore JH (2018) Data-driven advice for applying machine learning to bioinformatics problems. Pac Symp Biocomput 23:192–203
Google Scholar
Osborne J (2013) Dealing with missing or incomplete data: debunking the myth of emptiness. In: Best practices in data cleaning: a complete guide to everything you need to do before and after collecting your data. Sage, Thousand Oaks, pp 105–138
Chapter Google Scholar
Parrish N, Anderson HS, Gupta MR, Hsiao DY (2013) Classifying with confidence from incomplete information. J Mach Learn Res 14(1):3561–3589
MathSciNet MATH Google Scholar
Platt J et al (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 10(3):61–74
Google Scholar
Polato M, Sperduti A, Burattin A, de Leoni M (2014) Data-aware remaining time prediction of business process instances. In: International joint conference on IEEE neural networks (IJCNN), pp 816–823
Rogge-Solti A, Weske M (2013) Prediction of remaining service execution time using stochastic petri nets with arbitrary firing delays. In: International conference on service-oriented computing (ICSOC). Springer, Berlin, pp 389–403
Santos T, Kern R (2016) A literature survey of early time series classification and deep learning. In: Proceedings of the 1st international workshop on science, application and methods in industry 4.0 co-located with i-KNOW 2016. CEUR workshop proceedings, vol 1793. CEUR-WS.org
Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol methods 7(2):147
Article Google Scholar
Senderovich A, Di Francescomarino C, Ghidini C, Jorbina K, Maggi FM (2017) Intra and inter-case features in predictive process monitoring: a tale of two dimensions. In: International conference on business process management. Springer, Berlin, pp 306–323
Tax N, Verenich I, La Rosa M, Dumas M (2017) Predictive business process monitoring with LSTM neural networks. In: International conference on advanced information systems engineering. Springer, Berlin, pp 477–492
Tax N, Verenich I, La Rosa M, Dumas M (2017) Predictive business process monitoring with LSTM neural networks. In: International conference on advanced information systems engineering. Springer, Berlin, pp 477–492
Teinemaa I, Dumas M, La Rosa M, Maggi FM (2017) Outcome-oriented predictive process monitoring: review and benchmark. arXiv preprint arXiv:1707.06766
van der Aalst WM (2016) Process mining: data science in action. Springer, Berlin
Book Google Scholar
van Dongen BF, Crooy RA, van der Aalst WM (2008) Cycle time prediction: when will this case finally be finished? In: OTM confederated international conferences“ on the move to meaningful internet systems”. Springer, pp 319–336
Xing Z, Pei J, Dong G, Yu PS (2008) Mining sequence classifiers for early prediction. In: Proceedings of the 2008 SIAM international conference on data mining, SIAM, pp 644–655
Xing Z, Pei J, Philip SY (2012) Early classification on time series. Knowl Inf Syst 31(1):105–127
Article Google Scholar

Download references

Acknowledgements

This research was partly funded by the Estonian Research Council (Grant IUT20-55).

Author information

Authors and Affiliations

University of Tartu, Juhan Liivi 2, 50409, Tartu, Estonia
Irene Teinemaa, Marlon Dumas, Anna Leontjeva & Fabrizio Maria Maggi

Authors

Irene Teinemaa
View author publications
You can also search for this author in PubMed Google Scholar
Marlon Dumas
View author publications
You can also search for this author in PubMed Google Scholar
Anna Leontjeva
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Maria Maggi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Irene Teinemaa.

Additional information

Responsible editors: Jesse Davis, Elisa Fromont, Derek Greene, Björn Bringmann.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Tables 5, 6, 7, 8, 9, 10 and Figs. 7, 8, 9, 10.

Table 5 Hyperparameters and distributions used in optimization via random search

Full size table

Table 6 Optimized hyperparameters (RF)

Full size table

Table 7 Optimized hyperparameters for single classifiers (XGBoost)

Full size table

Table 8 Optimized hyperparameters for multiclassifiers (XGBoost)

Full size table

Table 9 Optimized hyperparameters (LSTM)

Full size table

Table 10 Optimized hyperparameters (combined inter-run stability and AUC)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Teinemaa, I., Dumas, M., Leontjeva, A. et al. Temporal stability in predictive process monitoring. Data Min Knowl Disc 32, 1306–1338 (2018). https://doi.org/10.1007/s10618-018-0575-9

Download citation

Received: 09 December 2017
Accepted: 20 June 2018
Published: 29 June 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10618-018-0575-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Temporal stability in predictive process monitoring

Abstract

Access this article

Similar content being viewed by others

Predictive Business Process Monitoring with LSTM Neural Networks

Machine Learning in Business Process Monitoring: A Comparison of Deep Learning and Classical Approaches Used for Outcome Prediction

Predictive Process Monitoring in Operational Logistics: A Case Study in Aviation

Notes

References

Acknowledgements