Skip to main content
Log in

Best strategy to win a match: an analytical approach using hybrid machine learning-clustering-association rule framework

  • Original Research
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

One of the significant challenges in the sports industry is identifying the factors influencing match results and their respective weightage. For appropriate recommendations to the team management and the team players, there is a need to predict the match and quantify the important factors for which prediction models need to be developed. The second thing required is identifying talented and emerging players and performing an associative analysis of the important factors to the match-winning outcome. This paper formulates a hybrid machine learning-clustering-associative rules model. This paper also implements the framework for cricket matches, one of the most popular sports globally watched by billions around the world. We predict the match outcome for One day Internationals (ODIs) and Twenty 20 s (T20s) (two formats of Cricket representing fifty over and twenty over versions respectively) adopting state-of-the-art machine learning algorithms, Random Forest, Gradient Boosting, and Deep neural networks. The variable importance is computed using machine-learning techniques and further statistically validated through the regression model. The emerging talented players are identified by clustering. Association rules are generated for determining the best possible winning outcome. The results show that environmental conditions are equally crucial for determining a match result, as are internal quantitative factors. The model is thus helpful for both team management and for players to improve their winning strategy and also for discovering emerging players to form an unbeatable team.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Source: https://stats.espncricinfo.com/ci/engine/stats/index.html

Fig. 4

Source: https://stats.espncricinfo.com/

Fig. 5

Source: https://stats.espncricinfo.com/

Fig. 6

Source https://stats.espncricinfo.com/

Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31

Similar content being viewed by others

Notes

  1. Retrieved from: https://stats.espncricinfo.com/ci/engine/stats/index.html.

References

  • Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. M. (2020). Sampling-based versus design-based uncertainty in regression analysis. Econometrica, 88(1), 265–296.

    Article  Google Scholar 

  • Adam, E., Mutanga, O., Abdel-Rahman, E. M., & Ismail, R. (2014). Estimating standing biomass in papyrus (Cyperus papyrus L) swamp: Exploratory of in situ hyper-spectral indices and random forest regression. International Journal of Remote Sensing, 35(2), 693–714.

    Article  Google Scholar 

  • Bendazzoli, S., Brusini, I., Damberg, P., Smedby, Ö., Andersson, L., & Wang, C. (2019). Automatic rat brain segmentation from MRI using statistical shape models and random forest. In Medical Imaging 2019: Image Processing (Vol. 10949, p. 109492O). International Society for Optics and Photonics.

  • Bose, A., Mitra, S., Ghosh, S., Ghosh, R., Patra, T., & Chakrabarti, S. (2021). Unsupervised learning based evaluation of player performances. Innovations in Systems and Software Engineering, 17(2), 121–130.

    Article  Google Scholar 

  • Bliss, A., Ahmun, R., Jowitt, H., Scott, P., Jones, T. W., & Tallent, J. (2021). Variability and physical demands of international seam bowlers in one-day and Twenty20 international matches across five years. Journal of Science and Medicine in Sport, 24(5), 505–510.

    Article  Google Scholar 

  • Cappelli, C., Di Iorio, F., Maddaloni, A., & D’Urso, P. (2019). Atheoretical regression trees for classifying risky financial institutions. Annals of Operations Research, 1–21.

  • Cea, S., Durán, G., Guajardo, M., Sauré, D., Siebert, J., & Zamorano, G. (2020). An analytics approach to the FIFA ranking procedure and the World Cup final draw. Annals of Operations Research, 286(1), 119–146.

    Article  Google Scholar 

  • Chauhan, S., Pande, R., & Sharma, S. (2020). The causal relationship between Indian energy consumption and the GDP: A shift from conservation to feedback hypothesis post economic liberalisation. Theoretical & Applied Economics, 27(3), 203–212.

    Google Scholar 

  • D’Urso, P., De Giovanni, L., & Massari, R. (2019). Trimmed fuzzy clustering of financial time series based on dynamic time warping. Annals of Operations Research, 1–17.

  • D’Urso, P., De Giovanni, L., Massari, R., D’Ecclesia, R. L., & Maharaj, E. A. (2020). Cepstral-based clustering of financial time series. Expert Systems with Applications, 161, 113705.

    Article  Google Scholar 

  • D’Urso, P., De Giovanni, L., & Vitale, V. (2021). Spatial robust fuzzy clustering of COVID 19 time series based on B-splines. Spatial Statistics, 100518.

  • Deval, G., Hamid, F., & Goel, M. (2021). When to declare the third innings of a test cricket match?. Annals of Operations Research, 1–19.

  • de Zepeda, M. V. N., Meng, F., Su, J., Zeng, X. J., & Wang, Q. (2021). Dynamic clustering analysis for driving styles identification. Engineering Applications of Artificial Intelligence, 97, 104096.

    Article  Google Scholar 

  • Goossens, D. R., Beliën, J., & Spieksma, F. C. (2012). Comparing league formats with respect to match importance in Belgian football. Annals of Operations Research, 194(1), 223–240.

    Article  Google Scholar 

  • Hubáček, O., Šourek, G., & Železný, F. (2019). Learning to predict soccer results from relational data with gradient boosted trees. Machine Learning, 108(1), 29–47.

    Article  Google Scholar 

  • Huang, J., Tan, J., & Hua, D. (2021). Data mining of association between hyperuricemia and common chronic diseases based on evolutionary apriori algorithm (EAA). In 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA) (pp. 73–77). IEEE.

  • Jain, P. K., Quamer, W., & Pamula, R. (2021). Sports result prediction using data mining techniques in comparison with base line model. Opsearch, 58(1), 54–70.

    Article  Google Scholar 

  • Jiang, Y., & Chen, N. C. (2019). Event attendance motives, host city evaluation, and behavioral intentions. International Journal of Contemporary Hospitality Management.

  • Kamath, G. B., Ganguli, S., & George, S. (2020). Attachment points, team identification and sponsorship outcomes: evidence from the Indian Premier League. International Journal of Sports Marketing and Sponsorship.

  • Kamble, R. R. (2021). Cricket score prediction using machine learning. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(1S), 23–28.

    Article  Google Scholar 

  • Kong, Y. S., Abdullah, S., Schramm, D., Omar, M. Z., & Haris, S. M. (2019). Development of multiple linear regression-based models for fatigue life evaluation of automotive coil springs. Mechanical Systems and Signal Processing, 118, 675–695.

    Article  Google Scholar 

  • Lumbantobing, I. P., Sulivyo, L., Sukmayuda, D. N., & Riski, A. D. (2020). The effect of debt to asset ratio and debt to equity ratio on return on assets in hotel, restaurant, and tourism sub sectors listed on Indonesia stock exchange for the 2014–2018 period. International Journal of Multicultural and Multireligious Understanding, 7(9), 176–186.

    Article  Google Scholar 

  • Loureiro, A. L., Miguéis, V. L., & da Silva, L. F. (2018). Exploring the use of deep neural networks for sales forecasting in fashion retail. Decision Support Systems, 114, 81–93.

    Article  Google Scholar 

  • Mondal, S., Plumley, D., & Wilson, R. (2021). The evolution of competitive balance in men’s international Cricket. Managing Sport and Leisure, 1–20.

  • Nikolaidis, Y. (2015). Building a basketball game strategy through statistical analysis of data. Annals of Operations Research, 227(1), 137–159.

    Article  Google Scholar 

  • Reyers, M., & Swartz, T. B. (2021). Quarterback evaluation in the national football league using tracking data. AStA Advances in Statistical Analysis, 1–16.

  • Saha, D., (2020). 10 Reasons why cricket is the most famous sport In India. Retrieved from: https://sportzwiki.com/cricket/why-cricket-most-famous-sport-india

  • Sahu, A. (2021). Predictive analysis of cricket. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(6), 5111–5124.

    Google Scholar 

  • Schneider, M. J., & Sachin, G. (2016). Forecasting sales of new and existing products using consumer reviews: A random projections approach. International Journal of Forecasting, 32(2), 243–256.

    Article  Google Scholar 

  • Stern, S. E. (2016). The Duckworth-Lewis-Stern method: Extending the Duckworth-Lewis methodology to deal with modern scoring rates. Journal of the Operational Research Society, 67(12), 1469–1480.

    Article  Google Scholar 

  • Thomson, J., Perera, H., & Swartz, T. B. (2021). Contextual batting and bowling in limited overs Cricket. South African Statistical Journal, 55(1), 73–86.

    Article  Google Scholar 

  • Thorley, J. (2021). Age-related changes in the performance of bowlers in Test match cricket. International Journal of Sports Science & Coaching, 17479541211001726.

  • Vörösmarty, G., & Dobos, I. (2020). Green purchasing frameworks considering firm size: A multicollinearity analysis using variance inflation factor. Supply Chain Forum: an International Journal, 21(4), 290–301.

    Article  Google Scholar 

  • Weeraddana, N., & Premaratne, S. (2021). Unique approach for cricket match outcome prediction using Xgboost algorithms. Journal of Theoretical and Applied Information Technology, 99(9), 2162–2173.

    Google Scholar 

  • Xia, H., Yang, Y., Pan, X., Zhang, Z., & An, W. (2019). Sentiment analysis for online reviews using conditional random fields and support vector machines. Electronic Commerce Research, 1–18.

  • Zhang, B., Guan, X., & Zhang, Q. (2020). Inverse optimal value problem on minimum spanning tree under unit l∞ norm. Optimization Letters, 14(8), 2301–2322.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ajay Kumar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Srivastava, P.R., Eachempati, P., Kumar, A. et al. Best strategy to win a match: an analytical approach using hybrid machine learning-clustering-association rule framework. Ann Oper Res 325, 319–361 (2023). https://doi.org/10.1007/s10479-022-04541-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-022-04541-6

Keywords

Navigation