Skip to main content
Log in

Temporal stability in predictive process monitoring

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Predictive process monitoring is concerned with the analysis of events produced during the execution of a business process in order to predict as early as possible the final outcome of an ongoing case. Traditionally, predictive process monitoring methods are optimized with respect to accuracy. However, in environments where users make decisions and take actions in response to the predictions they receive, it is equally important to optimize the stability of the successive predictions made for each case. To this end, this paper defines a notion of temporal stability for binary classification tasks in predictive process monitoring and evaluates existing methods with respect to both temporal stability and accuracy. We find that methods based on XGBoost and LSTM neural networks exhibit the highest temporal stability. We then show that temporal stability can be enhanced by hyperparameter-optimizing random forests and XGBoost classifiers with respect to inter-run stability. Finally, we show that time series smoothing techniques can further enhance temporal stability at the expense of slightly lower accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Inter-run stability refers to the MSPD metric introduced in Liu et al. (2017): \(\textit{MSPD}(f) = 2\mathbb {E}_{x_i}[Var(f(x_i)) - Cov(f_j (x_i), f_k(x_i))],\) where \(\mathbb {E}_{x_i}\) is the expectation over all validation data, f is a mapping from a sample \(x_i\) to a label \(y_i\) on a given run, \(Var(f(x_i))\) is the variance of the predictions of a single data point over the model runs, and \(Cov(f_j (x_i), f_k(x_i))\) is the covariance of predictions of a single data point over two model runs.

  2. Production log: https://data.4tu.nl/repository/uuid:68726926-5ac5-4fab-b873-ee76ea412399, other logs: https://data.4tu.nl/repository/collection:event_logs_real.

  3. Preprocessed data: https://github.com/irhete/stability-predictive-monitoring.

  4. http://scikit-learn.org/.

  5. https://github.com/fchollet/keras/.

  6. http://www.deeplearning.net/software/theano/.

References

  • Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305

    MathSciNet  MATH  Google Scholar 

  • Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2:499–526

    MathSciNet  MATH  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  • Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 785–794

  • de Leoni M, van der Aalst WM, Dees M (2016) A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs. Inf Syst 56:235–257

    Article  Google Scholar 

  • Di Francescomarino C, Dumas M, Maggi FM, Teinemaa I (2017) Clustering-based predictive process monitoring. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2016.2645153

  • Dumas M, La Rosa M, Mendling J, Reijers HA (2013) Fundamentals of business process management. Springer, Berlin

    Book  Google Scholar 

  • Elisseeff A, Evgeniou T, Pontil M (2005) Stability of randomized learning algorithms. J Mach Learn Res 6:55–79

    MathSciNet  MATH  Google Scholar 

  • Evermann J, Rehse JR, Fettke P (2017) Predicting process behaviour using deep learning. Decis Support Syst 100:129–40

    Article  Google Scholar 

  • Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15(1):3133–3181

    MathSciNet  MATH  Google Scholar 

  • Guo C, Pleiss G, Sun Y, Weinberger KQ (2017) On calibration of modern neural networks. arXiv preprint arXiv:1706.04599

  • Lakshmanan GT, Duan S, Keyser PT, Curbera F, Khalaf R (2010) Predictive analytics for semi-structured case oriented business processes. In: International conference on business process management. Springer, Berlin, pp 640–651

  • Leontjeva A, Conforti R, Di Francescomarino C, Dumas M, Maggi FM (2015) Complex symbolic sequence encodings for predictive monitoring of business processes. In: International conference on business process management. Springer, Berlin, pp 297–313

  • Lin YF, Chen HH, Tseng VS, Pei J, et al (2015) Reliable early classification on multivariate time series with numerical and categorical attributes. In: PAKDD (1), pp 199–211

  • Liu CB, Chamberlain BP, Little DA, Cardoso  (2017) Generalising random forest parameter optimisation to include stability and cost. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp 102–113

  • Maggi FM, Di Francescomarino C, Dumas M, Ghidini C (2014) Predictive monitoring of business processes. In: International conference on advanced information systems engineering. Springer, Berlin, pp 457–472

  • Marquez-Chamorro AE, Resinas M, Ruiz-Cortes A (2017) Predictive monitoring of business processes: a survey. IEEE Trans Serv Comput

  • Metzger A, Leitner P, Ivanovic D, Schmieders E, Franklin R, Carro M, Dustdar S, Pohl K (2015) Comparing and combining predictive business process monitoring techniques. IEEE Trans Syst Man Cybern Syst 45(2):276–290

    Article  Google Scholar 

  • Mori U, Mendiburu A, Keogh E, Lozano JA (2017) Reliable early classification of time series based on discriminating the classes over time. Data Min Knowl Discov 31(1):233–263

    Article  MathSciNet  Google Scholar 

  • Niculescu-Mizil A, Caruana R (2005) Predicting good probabilities with supervised learning. In: Proceedings of the 22nd international conference on machine learning, ACM, pp 625–632

  • Olson RS, La Cava W, Mustahsan Z, Varik A, Moore JH (2018) Data-driven advice for applying machine learning to bioinformatics problems. Pac Symp Biocomput 23:192–203

    Google Scholar 

  • Osborne J (2013) Dealing with missing or incomplete data: debunking the myth of emptiness. In: Best practices in data cleaning: a complete guide to everything you need to do before and after collecting your data. Sage, Thousand Oaks, pp 105–138

    Chapter  Google Scholar 

  • Parrish N, Anderson HS, Gupta MR, Hsiao DY (2013) Classifying with confidence from incomplete information. J Mach Learn Res 14(1):3561–3589

    MathSciNet  MATH  Google Scholar 

  • Platt J et al (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 10(3):61–74

    Google Scholar 

  • Polato M, Sperduti A, Burattin A, de Leoni M (2014) Data-aware remaining time prediction of business process instances. In: International joint conference on IEEE neural networks (IJCNN), pp 816–823

  • Rogge-Solti A, Weske M (2013) Prediction of remaining service execution time using stochastic petri nets with arbitrary firing delays. In: International conference on service-oriented computing (ICSOC). Springer, Berlin, pp 389–403

  • Santos T, Kern R (2016) A literature survey of early time series classification and deep learning. In: Proceedings of the 1st international workshop on science, application and methods in industry 4.0 co-located with i-KNOW 2016. CEUR workshop proceedings, vol 1793. CEUR-WS.org

  • Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol methods 7(2):147

    Article  Google Scholar 

  • Senderovich A, Di Francescomarino C, Ghidini C, Jorbina K, Maggi FM (2017) Intra and inter-case features in predictive process monitoring: a tale of two dimensions. In: International conference on business process management. Springer, Berlin, pp 306–323

  • Tax N, Verenich I, La Rosa M, Dumas M (2017) Predictive business process monitoring with LSTM neural networks. In: International conference on advanced information systems engineering. Springer, Berlin, pp 477–492

  • Tax N, Verenich I, La Rosa M, Dumas M (2017) Predictive business process monitoring with LSTM neural networks. In: International conference on advanced information systems engineering. Springer, Berlin, pp 477–492

  • Teinemaa I, Dumas M, La Rosa M, Maggi FM (2017) Outcome-oriented predictive process monitoring: review and benchmark. arXiv preprint arXiv:1707.06766

  • van der Aalst WM (2016) Process mining: data science in action. Springer, Berlin

    Book  Google Scholar 

  • van Dongen BF, Crooy RA, van der Aalst WM (2008) Cycle time prediction: when will this case finally be finished? In: OTM confederated international conferences“ on the move to meaningful internet systems”. Springer, pp 319–336

  • Xing Z, Pei J, Dong G, Yu PS (2008) Mining sequence classifiers for early prediction. In: Proceedings of the 2008 SIAM international conference on data mining, SIAM, pp 644–655

  • Xing Z, Pei J, Philip SY (2012) Early classification on time series. Knowl Inf Syst 31(1):105–127

    Article  Google Scholar 

Download references

Acknowledgements

This research was partly funded by the Estonian Research Council (Grant IUT20-55).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irene Teinemaa.

Additional information

Responsible editors: Jesse Davis, Elisa Fromont, Derek Greene, Björn Bringmann.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Tables 5, 6, 7, 8, 9, 10 and Figs. 7, 8, 9, 10.

Table 5 Hyperparameters and distributions used in optimization via random search
Table 6 Optimized hyperparameters (RF)
Table 7 Optimized hyperparameters for single classifiers (XGBoost)
Table 8 Optimized hyperparameters for multiclassifiers (XGBoost)
Table 9 Optimized hyperparameters (LSTM)
Table 10 Optimized hyperparameters (combined inter-run stability and AUC)
Fig. 7
figure 7

Case length histograms for positive and negative classes

Fig. 8
figure 8

Prediction accuracy on long cases only

Fig. 9
figure 9

Prediction accuracy on original (not truncated) traces

Fig. 10
figure 10

Temporal stability on original (not truncated) traces

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Teinemaa, I., Dumas, M., Leontjeva, A. et al. Temporal stability in predictive process monitoring. Data Min Knowl Disc 32, 1306–1338 (2018). https://doi.org/10.1007/s10618-018-0575-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-018-0575-9

Keywords

Navigation