Forecasting daily potential evapotranspiration using machine learning and limited climatic data

https://doi.org/10.1016/j.agwat.2010.10.012Get rights and content

Abstract

Anticipating, or forecasting near-term irrigation demands is a requirement for improved management of conveyance and delivery systems. The most important component of a forecasting regime for irrigation is a simple, yet reliable, approach for forecasting crop water demands, which in this paper is represented by the reference or potential evapotranspiration (ETo). In most cases, weather data in the area is limited to a reduced number of variables measured, therefore current or future ETo estimation is restricted. This paper summarizes the results of testing of two proposed forecasting ETo schemes under the mentioned conditions. The first or “direct” approach involved forecasting ETo using historically computed ETo values. The second or “indirect” approach involved forecasting the required weather parameters for the ETo calculation based on historical data and then computing ETo. An statistical machine learning algorithm, the Multivariate Relevance Vector Machine (MVRVM) is applied to both of the forecastings schemes. The general ETo model used is the 1985 Hargreaves Equation which requires only minimum and maximum daily air temperatures and is thus well suited to regions lacking more comprehensive climatic data. The utility and practicality of the forecasting methodology is demonstrated with an application to an irrigation project in Central Utah. To determine the advantage and suitability of the applied algorithm, another learning machine, the Multilayer Perceptron (MLP), is used for comparison purposes. The robustness and stability of the proposed schemes are tested by the application of the bootstrap analysis.

Research highlights

▶ Forecasting ETo models for water managing purposes (1985 Hargreaves ETo). ▶ 2 ETo forecasting approaches using air temperature and Multivariate Relevance Vector Machine. ▶ Ind. Approach provides better and larger forecast lags than Dir. Approach. ▶ Better ETo forecast when using forecasted weather variables rather than ETo values. ▶ In study case, ETo is forecasted of up to 4 days using the Ind. Approach.

Introduction

Population increases over the next decades will place a substantial emphasis on achieving higher irrigation efficiencies and greater production per unit of water. A key component of any strategy to improve irrigation water management will be the improvement of water delivery strategies and efficiencies within the irrigation delivery networks. As Kumar et al. (2002) note, evapotranspiration (ET) is one of the most important components of the hydrologic cycle and its accurate estimation is of vital importance for such diverse areas as hydrologic water balance, irrigation system design and management, crop yield simulation and water resources planning and management. Likewise, achieving higher irrigation system performance will depend on reliable forecasts of cropland ET and will require that such forecasts be far enough in the future to compensate for the time travel of the water supply. ET estimation is an important input to water management and irrigation scheduling because crop demands are generally the largest component of water diversions.

A number of computational methods have been developed to estimate potential evapotranspiration (ETo) from climatic data. These methods vary in complexity from models that require only basic information, such as maximum and minimum air temperature (Hargreaves, 1974), to complex models that estimate ETo through energy balance models, such as the Penman-Montieth method (Allen et al., 1998). The advantage of simple models is their use in regions of minimal available weather data. Their major disadvantage is that these may not reflect the effects of localized climatic and geographic variations such as narrow valleys, presence of high ground elevations, extreme latitudes or strong winds. Also, simple methods are usually best suited to weekly or monthly ETo estimates and are less suited to daily estimates.

In recent years there have been several attempts to estimate and forecast ETo with a higher degree of accuracy and over extended futures. Some of them involve numerical and statistical approaches that attempt to accurately simulate the random nature of the meteorological variables (Yamashita and Walker, 1994). The occurrence of difficulties related with these attempts forced researchers to look for other techniques using data-driven tools or statistical learning machines, such as Artificial Neural Networks (Kumar et al., 2002, Smith et al., 2006, Lai et al., 2004), Simple Bayes Classifier and k-Nearest Neighbors (Verdes et al., 2000), Support Vector Machines and Relevance Vector Machines (Gill et al., 2006). These newer approaches have been used primarily to forecast hourly ETo values up to 24 h in advance. Forecasting of daily ETo for 1 to K days ahead in time using data-driven tools have not been reported, even though these methods are known for having excellent modeling accuracy, particularly in representing complex nonlinear behavior (Lai et al., 2004).

The objective of this paper is to demonstrate the adequacy of two different approaches for forecasting daily ETo using statistical learning machines and limited climatic information. The learning machine algorithm used is the Multivariate Relevance Vector Machine (MVRVM). The potential crop evapotranspiration (ETo) is estimated by the 1985 Hargreaves Equation. These results are then compared with crop ET values of the area under study to determine the accuracy of the forecasted estimates obtained. Also, for comparison and benchmarking analysis, another learning machine was tested, the Multi-layer Perceptron, to determine the suitability of the proposed forecasting schemes.

The potential or reference evapotranspiration (ETo) expresses the evaporating power of the atmosphere at a specific location and time of the year and does not consider the crop characteristics and soil factors. As it is mentioned by Allen et al. (1998), the only factors affecting ETo are climatic parameters. Consequently, ETo is a climatic parameter and can be computed from weather data. Among the several methods to estimate ETo, the FAO Penman–Monteith method is recommended as the sole method for determining ETo. This method has been selected because it closely approximates grass ETo at the location evaluated, is physically based, and explicitly incorporates both physiological and aerodynamic parameters.

Situations might occur where data for some weather variables are missing. The use of an alternative ETo calculation procedure, requiring only limited meteorological parameters, should generally be avoided. It is recommended that one calculate ETo using the standard FAO Penman–Monteith method after resolving the specific problem of the missing data (Allen et al., 1998). Despite of this, with limited climatic data as Net Radiation not available, Allen et al. (1998) suggest the Hargreaves ETo equation for its use, specially for water management purposes.

Hargreaves and Allen (2003) note that the current Hargreaves equation was developed in 1975 in an attempt to improve the ETo equation developed by Christiansen (1968). Using eight years of daily cool season grass in precision weighting lysimeters and weather data, Hargreaves performed regressions among measured ETo and temperature data using several ETo methods. Several posterior attempts to improve the resulting ETo equation led to the 1985 Hargreaves ETo equation:ETo=0.0023Ra(TC+17.8)TR0.5where

  • ETo: potential evapotranspiration (mm/day) of a reference crop (grass),

  • Tmax, Tmin: maximum and minimum daily air temperature (°C),

  • TC: 0.5 (Tmax + Tmin),

  • TR: Tmax  Tmin and;

  • Ra: extraterrestrial solar radiation (mm/day).

The following empirical simplifications allows to estimate Ra using the latitude and the day of the year, as mentioned by Allen et al. (1998):Ra=37.6×dr(ωssin(ϕl)sin(δ)+cos(ϕl)sin(ωs))λδ=0.4093×sin2π(284+J)365dr=1+0.033×cos2πJ365ωs=cos(tan(ϕl)tan(δ))where

  • dr: relative distance from the earth to the sun,

  • J: julian day,

  • ωs: sunset hour angle (rad),

  • φl: latitude (rad),

  • δ: declination of the sun (rad) and

  • λ: latent heat of vaporization, λ  2.54.

Hargreaves and Allen (2003) stated that the best use of Eq. (1) would be for ETo estimation in regional planning and reservoir operation studies. The attractiveness of the Hargreaves ETo model is its simplicity, reliability, minimal data requirement, and ease of computation. The viability of using Hargreaves instead of Penman-Montieth ETo equation is demonstrated by the study performed by Trajkovic and Kolakovic (2009) where the difference between Hargreaves and Penmman-Montieth ETo equations is in the range of −4.7% to 6.9% for all the weather stations used in their study.

Once determined the reference or potential evapotranspiration value, the actual evapotranspiration from a cropped area, ET, is determined by adjusting ETo for crop variety and stage of grow as follows:ET=Kc×EToin which Kc is the crop growth stage factor. Tables and information related with the crop stage factor have been developed for most of the agricultural crops around the world. Average values of Kc for various crops can be found in Allen et al. (1998), while location-specific Kc can be found in research publications (Wright, 1982).

Artificial Neural Networks (ANN) have been used extensively for simulation and forecasting in such diverse areas as finances, power generation, water resources and environmental science (Maier and Dandy, 2000). These algorithms can be considered as benchmark for comparison purposes with newer or existing learning machine algorithms for given tasks or analysis. Among the vast implementation of Artificial Neural Network models, the Multi-layer Perceptron (MLP) is one of the most widely used models of ANNs (Nabney, 2002) and they are attractive because of their ability to approximate any smooth function, provided there is enough data to estimate the MLP parameters. Also, despite the successful application of ANN-based models, one of the critical issues often mentioned is the absence of uncertainty measurements associated with their predicted output (Khan and Coulibaly, 2005). To overcome this limitation, the implementation of the Bayesian Inference Method (MacKay, 1992) to calibrate the MLP has made possible measures of the uncertainty related with the predicted outputs.

The MLP architecture can be described as mentioned by Pierce et al. (2008):yk=m=11Mwkm(II)tanh×d=1Dwmd(I)xd+bm+bkwhere

  • yk: kth component of the MLP output vector y(n); yk  1  k  K,

  • xd: dth component of the input vector x(n); xd  1  d  D,

  • wmd(I), wkm(II): matrix weights for the first and second layer respectively, 1  m  M,

  • K: number of outputs or forecasted values,

  • D: number of inputs or values in the past required by the MLP,

  • M: number of hidden neurons,

  • bm, bk: bias vector for the first and second layer respectively.

Using a dataset Λ = [x(n), t(n)] with n = 1…N, where t(n) is the vector of the target or actual values t(n) = (t1, … tk, … tK), and N is the number of training examples provided to the learning machine, the training of the MLP is performed by optimizing the network parameters W=(w,b) in order to minimize the Overall Error Function E (Bishop, 1995):E=β2n=1N(t(n)y(n))2+α2i=1Wwi2=β×EΛ+α×EWwhere

  • EΛ: data error function,

  • EW: penalization term,

  • W = number of weights and biases in the neural network, and

  • α and β: Bayesian hyperparameters.

In Bayesian terms, the goal is to estimate the probability of the weights and bias of the MLP model, given the dataset Λ (Ungar et al., 1996):pW|t(n)=pt(n)|W*pWpt(n)where as explained by MacKay (1992),

  • p(W|t(n)): the posterior probability of the weights,

  • p(t(n)|W): the likelihood function,

  • p(W): the prior probability of the weights, and

  • p(t(n)): the evidence for the dataset.

Assuming a Gaussian distribution for the error term ξ(n) = t(n)  y(n) and the weights W, the likelihood and the prior probabilities can be expressed as mentioned by Petrovic (2007):p(t(n)|W,β)=(2πβ1)N/2exp(βEΛ)p(t(n)|W,α)=(2πα1)N/2exp(αEW)EΛ models the uncertainty (or error) of the target variables as Gaussian zero-mean noise and variance σ2 6-point triple bond β−1. EW defines the conditional probability of W with variance σW2α1. Then Eq. (9) can be expressed as:p(W|t(n),α,β)=p(t(n)|W,β)p(W|α)p(t(n)|α,β)p(W|t(n),α,β)=exp(E(WMP)(1/2)ΔWTHΔW)exp(E(WMP)(2π)W/2|H|1/2)in which,

  • E(WMP): expected most probable values for the weights and bias matrices,

  • H = Hessian matrix H=βEΛMP+αI, and

  • ΔW = W  WMP.

Once the distribution of W has been estimated by maximizing the likelihood for α and β, the prediction y(n) and the variance of the predictions σy2 can be estimated by integrating (marginalizing) over W and the regularization parameters α and β (Bishop, 1995):p(yk|x(n),t(n))=p(t(n)|x(n),W)p(W|t(n))dWwhich can be approximated by:p(yk|x(n),t(n))(2πσy2)1/2exp12σy(yMP(n)t(n))2where σy2 is the output or estimate variance vector σy2=(σ12,,σk2,,σK2).

Then the output variance can be expressed as:σy2=βMP1+gTH1gwhere g denotes the gradient of y(n) with respect to the weights; g 6-point triple bond Wy(n)|WMP. The output variance has then two sources; the first arises from the intrinsic noise in the target data; and the second from the posterior distribution of the ANN weights (Pierce et al., 2008). The output standard deviation vector σy of the predictive distribution for y(n) can be interpreted as an error bar for confidence interval estimation (Bishop, 1995).

The Multivariate Relevance Vector Machine (MVRVM), developed by Thayananthan et al. (2008), is a general Bayesian framework for obtaining sparse solutions to regression and classification tasks based on the Relevance Vector Machines developed by Tipping (2001), and Tipping and Faul (2003). These learning machines are also being used in many of the same areas as the ANNs and are particularly useful in hydrology and water resources because of the generalization properties and the probabilistic estimation, useful to estimate prediction uncertainty (Tripathi and Govindaraju, 2007). The mathematical formulation of the MVRVM is:tk=r=1Rwr,kφr(x(n))+ξkwhere

  • x(n) and t(n): the input and output values that belong to the dataset Λ, as defined for the MLP,

  • wr,k: rth and kth component of the weight matrix W=(w1,1,,wr,k,,wR,K),

  • R: number of relevance vectors or cases selcted by the MVRVM, 1  r  R,

  • φr(x(n)): rth component of the design matrix or basis function Φ that can be related with a kernel function φr = κ(x(n),x(r)).

The kernel function is a weighting function for the inputs used in non-parametric estimation techniques. It provides an adjustment ratio to the x(n) input vector based on x(R) selected or ‘relevant’ input vectors, which are selected automatically among the N training input vectors (R  N). For the calibration of the MVRVM, this learning machine uses a variation of the Overall Error function (Eq. (8)) and, by using the Bayesian Inference Method, estimates the distribution of the weights of the model (Eq. (9)), similar to the MLP. Also, the error term or noise ξ(n) is assumed to be probabilistic independent zero-mean Gaussian, with variance σ2. The Overall Error function to be minimized is:Lk=n=1NlogNt^k(n)|Wkϕ^kx(n),Bkwith:t^k(n)=ckt(n)ϕ^k(x(n))=ckϕk(x(n))ck=p(t(n)|x(n),Wk,Bk)k=1Kp(y(n)|x(n),Wk,,Bk)where

  • k is the kth component of the K different regression functions (outputs) to learn by the MVRVM,

  • t^k(n) is the adjusted target values,

  • ϕ^k(x(n)) is the kth component of the adjusted design matrix,

  • ck is the adjustment factor.

Assuming a Gaussian prior probability distribution for the weights (Tipping, 2001), and representing Adiag(α12,,αN2),andBk=diag(β11,,βR1), where each element αn is a hyperparameter that determines the relevance of the associated basis function and βr represents the noise variance in the target data (Thayananthan et al., 2008), the prior is represented by:pWk|Ak=r=1RNwr,k|0,Awhere wrk is the element at (r,k) of the weighting matrix WRK. Dropping the index k for clarity in the following equations, the likelihood of the target t(n) can be written as:pt(n)|W,B=r=1RNτr|wrΦ^,βr1τr is the vector with the rth component of the target data. The posterior probability of W can be written as the product of separate Gaussians of the weights vectors of each output dimension:pW|t(n),B,At(n)|W,BpW|ApW|t(n),B,Ar=1RNwr|μr,Σrwhere μr = βrΣrΦTτr and Σr = (βrΦTΦ + A)−1 are the mean and the variance of the weights. Marginalizing the data likelihood over the weights:pt(n)|A,B=pt(n)|W,BpW|Adwpt(n)|A,B=r=1R|Hr|1/2exp12τrTHr1τrwhere Hr is the Hessian matrix for the rth relevance vector, Hr=βr1I+Φ^A1Φ^T. A most probable set of hyperparameters AMP and noise parameters BMP is obtained by maximizing the marginal likelihood as described by Tipping and Faul (2003). The final hyperparameter values are:AMP=diagα1MP,...,αNMPΣrMP=βrMPΦ^TΦ+AMP1andμrMP=βrMPΣrMPΦTτrWMP=μ1MP,...,μRMPwith output and variance:y(n)=WMPΦx(n)σy2=BMP1+ϕx(n)TΣMPϕx(n)

Section snippets

Area of study

The water resources of the Sevier River Basin in Central Utah (Fig. 1) are among the most heavily utilized in the Western US. Substantial efforts to increase efficiency via canal lining and on-farm improvement such as conversion to sprinkler irrigation and laser land leveling were made during 1960–1990 period. From 1990 to the present, all reservoirs and stream offtakes have been equipped with SCADA technology and web-based data summaries (SRWUA, 2009). Canal automation was introduced in 1994

Results

Using the training and testing datasets described earlier in Section 2.3, the calibration of the machine learning algorithms was made using both the Direct and Indirect Approaches. To determine the performance and accuracy of the learning machines for forecasting, a 7-day forecast horizon was solicited for the ETo analyses. This value (K = 7) is in fact larger than the required 4-day ahead forecast for the area under study, nevertheless, additional information could be useful for extended daily

Conclusions

The present study demonstrates the adequacy of forecasting near term daily ETo information necessary for water management purposes based on the 1985 Hargreaves ETo equation. Two approaches were tested using the Multivariate Relevance Vector Machine algorithm. The first approach, Direct Approach, involves the estimation of future ETo times series from historical ETo data. The second approach, Indirect Approach, considers to forecast the required climatic data for the 1985 Hargreaves ETo

Acknowledgments

The authors would like to thank the Utah Water Research Laboratory, the Utah Center for Water Resources Research, and the Utah State University Research Foundation for their support of this research. The authors would also like to thank the US Bureau of Reclamation and the Lower Basin Commissioner of the Sevier River Water Users Association for their continued assistance and support in this research.

References (31)

  • A.F. Khalil et al.

    Multiobjective analysis of chaotic dynamic systems with sparse learning machines

    Advances in Water Resources

    (2006)
  • S.G. Pierce et al.

    Uncertainty analysis of a neural network used for fatigue lifetime prediction

    Mechanical Systems and Signal Processing

    (2008)
  • A. Thayananthan et al.

    Pose estimation and tracking using multivariate regression

    Pattern Recognition Letters

    (2008)
  • R.G. Allen et al.

    Crop evapotranspiration guidelines for computing crop water requirements

    Irrigation and Drainage, FAO

    (1998)
  • C.M. Bishop

    Neural Networks for Pattern Recognition

    (1995)
  • J.E. Christiansen

    Pan evaporation and evapotranspiration from climatic data

    Proc. Am. Soc. Civil Eng., Journal of Irrigation and Drainage Engineering, ASCE

    (1968)
  • M.K. Gill et al.

    Soil moisture prediction using support vector machines

    American Water Resources Association

    (2006)
  • G.H. Hargreaves

    Moisture availability and crop production

    Transactions, ASAE

    (1974)
  • G.H. Hargreaves et al.

    History and evaluation of hargreaves evapotranspiration equation

    Journal of Irrigation and Drainage Engineering, ASCE

    (2003)
  • M. Khan et al.

    Streamflow forecasting with uncertainty estimate using Bayesian learning For ANN

  • M. Kumar et al.

    Estimating evapotranspiration using Artificial Neural Network

    Journal of Irrigation and Drainage Engineering, ASCE

    (2002)
  • L.L. Lai et al.

    Intelligent weather forecast

  • LandSat Imagery Program website, www.landsat.gsfc.nasa.gov, accessed November,...
  • D. MacKay

    A practical Bayesian framework for backpropagation networks

    Neural Computation

    (1992)
  • R.M Maier et al.

    Neural Networks for the prediction and forecasting of water resources variables: a review of modeling issues and applications

    Environmental Modeling and Software

    (2000)
  • Cited by (143)

    View all citing articles on Scopus
    View full text