A comparison of methods to avoid overfitting in neural networks training in the case of catchment runoff modelling
Highlights
► Comparison of different methods to avoid ANN overfitting to data. ► Noise injection outperform early stopping and optimized approximation algorithm methods. ► Different results were obtained using Evolutionary Computation-based and gradient-based algorithms for ANN training. ► Rainfall–runoff modelling by means of MLP ANN provide reasonable results.
Introduction
During the last 20 years artificial neural networks (ANNs) (Haykin, 1999) have become very popular in various scientific disciplines (Paliwar and Kumar, 2009, Wen et al., 2009, Al-Garni, 2010). Within the field of hydrology different Artificial Intelligence methods, including ANNs, also gained much popularity (Maier and Dandy, 2000, Cheng et al., 2002, Cheng et al., 2008, Muttil and Chau, 2006, Lin et al., 2006, Piotrowski et al., 2007, Maier et al., 2010, Acharya et al., 2012, Huo et al., 2012, Nourani et al., 2012). ANN applications to rainfall–runoff modelling are plentiful and include: ASCE Task Committee, 2000, Solomatine, 2003, Cherkassy et al. (2006), Dawson et al., 2006, Piotrowski et al., 2006, Solomatine and Ostfeld, 2008, Wu et al., 2009, Siou et al., 2011 and Wu and Chau (2011). Among different ANN types, multi-layer perceptron neural networks (MLPs) are especially popular due to their simplicity, relatively low number of parameters, clear biological inspirations and a debate whether they may be considered as universal approximators or not (Hecht-Nielsen, 1987, Girosi and Poggio, 1989, Nakamura et al., 1993, Braun and Griebel, 2009). MLPs are of special interest also in the present paper.
Recently Wang et al. (2009) and Elshorbagy et al. (2010) showed a comparison of various Artificial Intelligence techniques for rainfall–runoff forecasting, encouraging the search for novel methods to improve ANN training and selecting their different features. Apart from choosing the neural network type, the successful application of a neural networks to a particular problem requires the determination of a model architecture (which defines number of parameters), an optimization algorithm and a method to avoid overfitting. However, in practice ANNs are frequently used at hand without discussing such details, what may have significant impact on model performance.
The present paper is a continuation of Piotrowski and Napiorkowski (2011) study, which aimed at choosing the best optimization method for MLPs training applied for daily catchment runoff forecasting in colder climate zones. The main objective of the current paper is the comparison of different methods to avoid overfitting when MLPs are applied to similar task. We put special attention to noise injection approach, which is rarely considered in hydrological applications. Also the performance of a new method called optimized approximation algorithm and of early stopping, the most popular approach to deal with overfitting, are studied in details. However, the methods to avoid overfitting cannot be compared or discussed apart of ANN architecture and training algorithms. For instance some optimization algorithms perform poorly in uncertain environments (Jin and Branke, 2005) of which neural networks with noise injection method to avoid overfitting may be an example. This emphasizes the importance of proper coupling methods to avoid overfitting with training algorithms. On the other hand some optimization methods may quickly converge to good solutions for simple ANN architecture with small number of parameters, but perform poorly for more complicated ones with more parameters. One may note that different ANN architectures means different number of parameters and different fitness landscapes, hence formally different problems. It is well known that the performance of optimization algorithms depends on the problem. It was verified empirically (e.g. Epitropakis et al., 2011); it was also proved (Wolpert and Macready, 1997, Wolpert and Macready, 2005) that under certain assumptions the performance of any two algorithms averaged over all possible problems (fitness landscapes) is equal. Of course, in practice few people may be interested in all problems, but the proof presented in Wolpert and Macready (1997) results in important warning: an optimization algorithm applied to a novel task can fail even if it was successful in solving some other problems. In the present paper at first based on the previous findings in the literature two training methods are chosen, one gradient-based and one based on Evolutionary Computation (EC). Then the optimal set of input variables and MLP architecture for each training algorithm is found experimentally. Finally three different methods to avoid ANN overfitting of data are compared for chosen ANN architectures and training algorithms. Below we very briefly introduce the main features connected with MLP training algorithms, architecture and methods to avoid overfitting.
A number of studies addressed the problem of application of different optimization algorithms to the ANN training for various regression problems. The most popular optimization approaches are gradient-based methods and among them the Levenberg–Marquardt (LM) algorithm (Press et al., 2006) is considered as one of the most efficient (see e.g. Adamowski and Karapataki, 2010). In a few studies EC methods were applied to the same problems – with various opinions on their performance. In some papers the advantage of the EC algorithms over the gradient-based methods applied to different ANN training was suggested (Sexton and Gupta, 2000, Jain and Srinivasulu, 2004, Martinez-Estudillo et al., 2006, Chau, 2006, Zhang et al., 2007, Huang et al., 2009), but a number of other studies showed that EC approaches are at least not better than the gradient-based algorithms in terms of ANN model performance and are of course much slower (Mandischer, 2002; Ilonen et al., 2003; Socha and Blum, 2007). Motivated by this diversity of opinions, authors of the present study conducted a detailed survey of recently developed EC methods from two “families” – Differential Evolution (DE) (Storn and Price, 1995) and Particle Swarm Optimization (Kennedy and Eberhart, 1995). Eight EC algorithms were compared with LM method for MLP training with early stopping approach for rainfall–runoff modelling at Annapolis River, Nova Scotia, Canada (Piotrowski and Napiorkowski, 2011). The results are generally in agreement with Mandischer (2002), Ilonen et al. (2003) and Socha and Blum (2007) opinions. Only one of the EC algorithms, namely the Differential Evolution with Global and Local neighbors (DEGL, Das et al., 2009) showed similar performance to LM method, with much slower convergence. It is worth noting that DEGL was also found to be among the best EC-based algorithms, along with Grouped Differential Evolution (GDE) (Piotrowski and Napiorkowski, 2010) and Self-Adaptive Differential Evolution (SADE) (Qin et al., 2009) in training MLP applied to estimation of longitudinal dispersion coefficient in rivers (Piotrowski et al., 2012b). In that study number of data was very small (under 100 observations) imposing the use of very simple MLP architecture, objective function was non-differentiable and a kind of noise injection method to avoid overfitting was used. This suggests that DEGL may be well suited for MLP training in general. Based on above findings, in the present paper LM and DEGL methods are used as training algorithms.
The architecture of MLP defines the number of parameters to be optimized. This architecture should always be adopted to the problem (Zhang et al., 1998, Mahmoud and Ben-Nahki, 2003, Siou et al., 2011, De et al., 2011), as it depends on the number of input and output variables, the number and the quality of available data, the presence of noise in the data, etc. Over-parameterization may have significant negative impact on the performance of neural networks, also in the case of rainfall–runoff modelling (Gaume and Gosset, 2003). The smaller architecture usually allows to obtain better generalization properties (Haykin, 1999) and is easier to train, especially by means of the non-gradient-based methods. Some Evolutionary Computation algorithms were proposed to determine an optimal architecture of ANN (Castillo et al., 2000, Huang et al., 2009) and were applied to hydrological problems (Chen and Chang, 2009), however they are applicable rather to problems when neither expert-knowledge is available nor physically-based choice of input variables and model complexity is possible. Although a number of other methods how to develop ANN architecture exists (Sietsma and Dow, 1991; Wang et al., 1994, Islam et al., 2009, Ssegane et al., 2012, Nourani and Sayyah Frad, 2012), they usually rely on heuristic or subjective decisions and none is widely applied (Zhang et al., 1998).
Impact of different methods used to avoid overfitting on ANN performance, which is the main focus of the present paper, was rarely studied in the literature and such papers usually dealt with classification problems (Holmstrom and Koistinen, 1992, Hua et al., 2006, Zur et al., 2009) or used artificial functions for comparison (Holmstrom and Koistinen, 1992, Reed et al., 1995). Only Giustolisi and Laucelli (2005) studied the impact of a number of methods to avoid ANN overfitting for hydrological data, namely the case of rainfall–runoff modelling for two very small catchments (up to 5 km2) in Italy. However, EC-based optimization methods were not used and the popular noise injection method based on maximization of the cross-validated likelihood (Holmstrom and Koistinen, 1992) was not compared. The early stopping technique led to poor results in Giustolisi and Laucelli (2005) what may be surprising as this is very popular and usually successful approach to avoid overfitting. Moreover, recently a novel methodology called optimized approximation algorithm (Liu et al., 2008) was proposed and gained much interest. The present paper tries to fill the gaps left by Giustolisi and Laucelli, 2005, Hua et al., 2006 and Zur et al. (2009) and presents the comparison of the catchment runoff modelling results obtained when the mentioned three techniques designed to avoid overfitting are coupled with MLPs of different architectures and gradient-based or EC-based optimization algorithms. Neural networks are applied to Annapolis River runoff forecasting, which is located in moderate climate zone.
Section snippets
Study area and hydro-meteorological data
The present paper is a continuation of Piotrowski and Napiorkowski (2011) study for the same catchment, namely upper part of Annapolis River (Nova Scotia, Canada) up to Wilmot settlement, with area of 546 km2. Hydrological and meteorological data are available from Water Survey of Canada and Canada’s National Climate Data and Information Archive for the gauge station situated in Wilmot settlement, (44°56′57″N, 65°01′45″W) and meteorological station at Greenwood Airfield (44°58′40″N, 64°55′33″W),
Multi-layer perceptron artificial neural networks and optimization algorithms
MLP neural network (Haykin, 1999) is a nonlinear data-based model that approximates the values of output variables (y) dependent on the set of input variables (x). MLP is formed by several nodes arranged in groups called layers (see Fig. 3). Usually three layers, an input layer, a hidden layer, and an output layer are sufficient in practice (Haykin, 1999, see also real-world data applications in De et al., 2011 and Siou et al., 2011). The number of nodes in input and output layers is determined
Methods to avoid neural network overfitting
To be successfully applied in practice, ANN should have abilities to generalize input–output mapping. In other words, model should be able to correctly approximate observations not included in training set (Geman et al., 1992). In the case of catchment runoff modelling it means the ability to make good runoff predictions for the future hydro-meteorological conditions. To allow proper generalization capabilities one must avoid ANN overfitting of the training data, i.e. model should be fitted
Selection of input variables and MLP architecture
To predict one lead day runoff Q(t + 1), different variants of input variables are considered (see Table 2). The best MLP architecture is chosen according to MSE criterion, Eq. (3). In the simplest combination of inputs, only the most recent measurements of meteorological variables are used, namely UT(t), LT(t), RF(t), SF(t), SC(t) together with two last runoff measurements Q(t), Q(t − 1), what gives seven input variables in total. Each of them is physically important for runoff forecasting – UT
Results and discussion
Three criteria are used in the present paper to compare the results obtained by means of different methods to avoid ANN overfitting, namely mean square error (MSE), mean absolute error (MAE), and Nash–Sutcliffe coefficient (NSC).
During the optimization MSE is used as the objective function (Eq. (3)). MAE is defined as
Very popular in river runoff forecasting NSC is computed according to the following equationThe maximum
Conclusions
The present paper aims at comparison of applications of a number of techniques to avoid ANN overfitting in case of catchment runoff modelling in the area located in moderately cold climate zone. Three methods were considered, namely the noise injection with spread factor h estimated by means of maximizing cross-validation likelihood function (Holmstrom and Koistinen, 1992), optimized approximation algorithm proposed by Liu et al. (2008) and the most popular early stopping (Prechlet, 1998,
Acknowledgments
This work has been supported by the Inner Grant of the Institute of Geophysics, Polish Academy of Sciences Nr. 1b/IGF PAN/2012/MŁ.
References (88)
- et al.
G-prop: global optimization of multilayer perceptrons using gas
Neurocomputing
(2000) Particle swarm optimization training algorithm for ANNs in stage prediction of Shing Mun River
J. Hydrol.
(2006)- et al.
Evolutionary artificial neural networks for hydrological systems forecasting
J. Hydrol.
(2009) - et al.
Combining a fuzzy optimal model with genetic algorithm to solve multiobjective rainfall–runoff model calibration
J. Hydrol.
(2002) - et al.
Symbolic adaptive neuro-evolution applied to rainfall–runoff modelling in northern England
Neural Networks
(2006) - et al.
Evaluating the process of a genetic algorithm to improve the back-propagation network: a Monte Carlo study
Expert Syst. Appl.
(2009) - et al.
Integrated neural networks for monthly river flow estimation in arid inland basin of Northwest China
J. Hydrol.
(2012) - et al.
The relationship between snowpack dynamics and NAO/AO indices in SW Spitsbergen
Phys. Chem. Earth
(2011) - et al.
Architecture and performance of neural networks for efficient A/C control in buildings
Energy Convers. Manage.
(2003) - et al.
Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications
Environ. Modell. Softw.
(2000)
Methods used for the development of neural networks for the prediction of water resource variables in river systems: current status and future directions
Environ. Modell. Softw.
A comparison of evolution strategies and backpropagation for neural network training
Neurocomputing
Evolutionary product unit based neural networks for regression
Neural Networks
Sensitivity analysis of the artificial neural network outputs in simulation of the evaporation process at different climatological regimes
Adv. Eng. Softw.
Optimizing neural networks for river flow forecasting – evolutionary computation methods versus the Levenberg–Marquardt approach
J. Hydrol.
Comparative valuation of genetic algorithm and backpropagation for training neural networks
Inf. Sci.
Creating artificial neural networks that generalize
Neural Networks
Complexity selection of a neural network model for karst flood forecasting: the case of the Lez Basin (southern France)
J. Hydrol.
Advances in variable selection methods I: causal selection methods versus stepwise regression and principal component analysis on data of known and unknown functional relationships
J. Hydrol.
Forest canopy effect on snow accumulation and ablation: an integrative review of empirical results
J. Hydrol.
A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series
J. Hydrol.
A procedure for determining the topology of multilayer feedforward neural networks
Neural Networks
A review of Hopfield neural networks for solving mathematical programming problems
Eur. J. Oper. Res.
Methods to improve neural network performance in daily flow prediction
J. Hydrol.
Rainfall–runoff modeling using artificial neural network coupled with singular spectrum analysis
J. Hydrol.
Forecasting with artificial neural networks: the state of the art
Int. J. Forecast.
A hybrid particle swarm optimization – back-propagation algorithm for feedforward neural network training
Appl. Math. Comput.
A neurocomputing approach to predict monsoon rainfall in monthly scale using SST anomaly as a predictor
Acta Geophys.
Comparison of multivariate regression and artificial neural networks for peak urban water-demand forecasting: evaluation of different ANN learning algorithms
J. Hydrol. Eng.
Interpretation of spontaneous potential anomalies from some simple geometrically shaped bodies using neural network inversion
Acta Geophys.
Asymptotic statistical theory of overfitting and cross-validation
IEEE Trans. Neural Networks
The effect of adding noise during backpropagation training on a generalization performance
Neural Comput.
On a constructive proof of the Kolmogorov’s superposition theorem
Constr. Approx.
Use of noise to augment training data: a neural network method of mineral–potential mapping in regions of limited known deposit examples
Nat. Resour. Res.
Optimizing hydropower reservoir operation using hybrid genetic algorithm and chaos
Water Resour. Manage.
Computational intelligence in earth sciences and environmental applications: issues and challenges
Neural Networks
Differential evolution using a neighborhood-based mutation operator
IEEE Trans. Evol. Comput.
Differential evolution – a survey of the state-of-the-art
IEEE Trans. Evol. Comput.
Identification of the best architecture of a multilayer perceptron in modelling daily total ozone concentration in Kolkata, India
Acta Geophys.
Improving classical and decentralized differential evolution with new mutation operator and population topologies
IEEE Trans. Evol. Comput.
Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology – part 2: application
Hydrol. Earth Syst. Sci.
Enhancing differential evolution utilizing proximity-based mutation operators
IEEE Trans. Evol. Comput.
Over-parameterisation, a major obstacle to the use of artificial neural networks in hydrology?
Hydrol. Earth Syst. Sci.
Cited by (205)
Prediction of California bearing ratio and modified proctor parameters using deep neural networks and multiple linear regression: A case study of granular soils
2024, Case Studies in Construction MaterialsMetaheuristic learning algorithms for accurate prediction of hydraulic performance of porous embankment weirs
2024, Applied Soft Computing