Abstract
Water quality is an important issue because of its relationship to humans and other living organisms. Predicting water quality parameters is very important for better management of water resources. The decision tree is one of the data mining methods that can create rules for classifying and predicting data using a tree structure. The purpose of this study is to use data mining techniques to investigate and predict the parameters of soluble phosphorus and oxygen in Lake Erie to achieve this purpose. The Classification And Regression Tree (CART) model is compared with the Chi-squared Automatic Interaction Detector (CHAID) model and the Quick Unbiased Efficient Statistical Trees (QUEST) model with the C5 model. Comparison and review of these models to express their applicability to identify water quality parameters are conducted. The results show that decision tree methods with the help of hydrochemical parameters can classify and predict water quality with high accuracy and in a short time. The number of available data is 327. To check the accuracy of the models, the difference between the observed data and the predicted data is used. In the prediction of dissolved oxygen, 214 cases with the CART model and 185 cases with the CHAID model differ by less than 2 units from the observed data. For phosphorus, 245 cases in the CART model and 237 cases in the CHAID model differ less than 0.2 the predicted data with the observed data. Therefore, the accuracy of the CART model is better. The prediction of 256 phosphorus parameter group numbers and 230 dissolved oxygen parameter group numbers with the C5 algorithm is correct. The results show that CART model is better than CHAID model in predicting data, and C5 model is better than QUEST model in predicting group numbers.
Similar content being viewed by others
Data availability
All data used for this study will be available under a reasonable request.
References
Ahmed M, Mumtaz R, Zaidi SM (2021) Analysis of water quality indices and machine learning techniques for rating water pollution: a case study of Rawal Dam Pakistan. Water Supply 21(6):3225–3250
Alaboz P, Dengiz O, Demir S, Şenol H (2021) Digital mapping of soil erodibility factors based on decision tree using geostatistical approaches in terrestrial ecosystem. CATENA 207:105634
Anmala J, Turuganti V (2021) Comparison of the performance of decision tree (DT) algorithms and extreme learning machine (ELM) model in the prediction of water quality of the Upper Green River watershed. Water Environ Res 93(11):2360–2373
Azam M, Aslam M, Khan K, Mughal A, Inayat A (2017) Comparisons of decision tree methods using water data. Commun Stat Simul Comput 46(4):2924–2934
Bashari H, Tarkesh M, Besalatpour AA (2021) Identifying the determinant habitat characteristics influencing the spatial distribution of Ferula ovina (Boiss.) in semiarid rangelands of Iran using machine learning methods. Ecol Complex 45:100909
Bayatvarkeshi M, Alam Imteaz M, Kisi O, Zarei M, Mundher Yaseen Z (2020) Application of M5 model tree optimized with Excel Solver Platform for water quality parameter estimation. Environ Sci Pollut Res 28(6):7347–7364
Bertani I, Steger CE, Obenour DR, Fahnenstiel GL, Bridgeman TB, Johengen TH, Sayers MJ, Shuchman RA, Scavia D (2017) Tracking cyanobacteria blooms: do different monitoring approaches tell the same story? Sci Total Environ 575:294–308
Chen K, Chen H, Zhou C, Huang Y, Qi X, Shen R, Liu R, Zuo M, Zou X, Wang J, Zhang Y, Chen D, Chen X, Deng Y, Ren H (2020) Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data. Water Res 171:115454
Chou SJ (2012) Comparison of multilabel classification models to forecast project dispute resolutions. Expert Syst Appl 39:10202–10211
Chou SJ, Ho CC, Hoang HS (2018) Determining quality of water in reservoir using machine learning. Eco Inform 44:57–75
Delen D, Kuzey C, Uyar A (2013) Measuring firm performance using financial ratios: A decision tree approach. Expert Syst Appl 40:3970–3983
Hsu C-Y, Ou S-J, Hsieh W-F (2018) predicting fish ecological as indicator of river pollution using decision tree technique. Paper presented at the 2nd International Conference on Energy and Environmental Science.
Huan J, Li H, Li M, Chen B (2020) Prediction of dissolved oxygen in aquaculture based on gradient boosting decision tree and long short-term memory network: a study of Chang Zhou fishery demonstration base China. Comput Electron Agric 175:105530
Lee S, Park I (2013) Application of decision tree model for the ground subsidence hazard mapping near abandoned underground coal mines. J Environ Manage 127:166–176
Liao H, Sun W (2010) Forecasting and evaluating water quality of Chao Lake based on an Improved Decision Tree method. Procedia Environ Sci 2:970–979
Lu H, Ma X (2020) Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere 249:126169
Neissi L, Golabi M, Gorman JM (2020) Spatial interpolation of sodium absorption ratio: A study combining a decision tree model and GIS. Ecol Indic 117:106611
Ren D, Guo X, Li C (2021) Research on big data analysis model of multi energy power generation considering pollutant emission—empirical analysis from Shanxi Province. J Clean Prod 316:128154
Sekaluvu L, Zhang L, Gitau M (2018) Evaluation of constraints to water quality improvements in the Western Lake Erie Basin. J Environ Manage 205:85–98
Shukla S, Rajta A, Setia H, Bhatia R (2020) Simultaneous nitrification–denitrification by phosphate accumulating microorganisms. World J Microbiol Biotechnol 36(10):151
Stow CA, Glassner-Shwayder K, Lee D, Wang L, Arhonditsis G, DePinto JV, Twiss MR (2020) Lake Erie phosphorus targets: an imperative for active adaptive management. J Great Lakes Res 46:672–676
Thoe W, Gold M, Griesbach A, Grimmer M, Taggart ML, Boehm AB (2014) Predicting water quality at Santa Monica Beach: evaluation of five different models for public notification of unsafe swimming conditions. Water Res 67:105–117
Varrà MO, Husakova L, Patočka J, Ghidini S, Zanardi E (2021) Classification of transformed anchovy products based on the use of element patterns and decision trees to assess traceability and country of origin labelling. Food Chem 360:129790
Vasistha P, Ganguly R (2020) Water quality assessment of natural lakes and its importance: an overview. Mater Today Proc 32:544–552
Xu T, Coco G, Neale M (2020) A predictive model of recreational water quality based on adaptive synthetic sampling algorithms and machine learning. Water Res 177:115788
Zhang Y, Liang J, Zeng G, Tang W, Lu Y, Luo Y, Xing W, Tang N, Ye S, Li X, Huang W (2020) How climate change and eutrophication interact with microplastic pollution and sediment resuspension in shallow lakes: a review. Sci Total Environ 705:135979
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gorgan-Mohammadi, F., Rajaee, T. & Zounemat-Kermani, M. Decision tree models in predicting water quality parameters of dissolved oxygen and phosphorus in lake water. Sustain. Water Resour. Manag. 9, 1 (2023). https://doi.org/10.1007/s40899-022-00776-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40899-022-00776-0