Skip to main content
Log in

Analysing the accuracy of machine learning techniques to develop an integrated influent time series model: case study of a sewage treatment plant, Malaysia

  • Research Article
  • Published:
Environmental Science and Pollution Research Aims and scope Submit manuscript

Abstract

The function of a sewage treatment plant is to treat the sewage to acceptable standards before being discharged into the receiving waters. To design and operate such plants, it is necessary to measure and predict the influent flow rate. In this research, the influent flow rate of a sewage treatment plant (STP) was modelled and predicted by autoregressive integrated moving average (ARIMA), nonlinear autoregressive network (NAR) and support vector machine (SVM) regression time series algorithms. To evaluate the models’ accuracy, the root mean square error (RMSE) and coefficient of determination (R2) were calculated as initial assessment measures, while relative error (RE), peak flow criterion (PFC) and low flow criterion (LFC) were calculated as final evaluation measures to demonstrate the detailed accuracy of the selected models. An integrated model was developed based on the individual models’ prediction ability for low, average and peak flow. An initial assessment of the results showed that the ARIMA model was the least accurate and the NAR model was the most accurate. The RE results also prove that the SVM model’s frequency of errors above 10% or below − 10% was greater than the NAR model’s. The influent was also forecasted up to 44 weeks ahead by both models. The graphical results indicate that the NAR model made better predictions than the SVM model. The final evaluation of NAR and SVM demonstrated that SVM made better predictions at peak flow and NAR fit well for low and average inflow ranges. The integrated model developed includes the NAR model for low and average influent and the SVM model for peak inflow.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

Download references

Acknowledgments

The authors would like to acknowledge the financial support from the Ministry of Education and University of Malaya for grants FRGS (FP016-2014A) and UMRG (FL001-13SUS), respectively. We would also like to acknowledge the Water Research Centre of University of Malaya for their support. We appreciate the cooperation given by relevant parties and companies for providing the necessary data and assistance. We are most grateful and would like to thank the reviewers for their valuable suggestions, which have led to substantial improvement of the article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Faridah Othman.

Additional information

Responsible editor: Marcus Schulz

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ansari, M., Othman, F., Abunama, T. et al. Analysing the accuracy of machine learning techniques to develop an integrated influent time series model: case study of a sewage treatment plant, Malaysia. Environ Sci Pollut Res 25, 12139–12149 (2018). https://doi.org/10.1007/s11356-018-1438-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11356-018-1438-z

Keywords

Navigation