Elsevier

Journal of Hydrology

Volume 529, Part 1, October 2015, Pages 146-158
Journal of Hydrology

Using an ensemble smoother to evaluate parameter uncertainty of an integrated hydrological model of Yanqi basin

https://doi.org/10.1016/j.jhydrol.2015.07.024Get rights and content

Highlights

  • Multiple Data Assimilation is used to assess uncertainty of a distributed model.

  • The posterior parameter distribution is obtained by incorporating field observations.

  • The posterior forecasts have lower uncertainty than the prior forecasts.

  • Uncertainty in the hydraulic conductivity scaling factor is reduced considerably.

  • Uncertainty in van Genuchten α is not improved when measurements are included.

Summary

Model uncertainty needs to be quantified to provide objective assessments of the reliability of model predictions and of the risk associated with management decisions that rely on these predictions. This is particularly true in water resource studies that depend on model-based assessments of alternative management strategies. In recent decades, Bayesian data assimilation methods have been widely used in hydrology to assess uncertain model parameters and predictions. In this case study, a particular data assimilation algorithm, the Ensemble Smoother with Multiple Data Assimilation (ESMDA) (Emerick and Reynolds, 2012), is used to derive posterior samples of uncertain model parameters and forecasts for a distributed hydrological model of Yanqi basin, China. This model is constructed using MIKESHE/MIKE11software, which provides for coupling between surface and subsurface processes (DHI, 2011a, DHI, 2011b, DHI, 2011c, DHI, 2011d). The random samples in the posterior parameter ensemble are obtained by using measurements to update 50 prior parameter samples generated with a Latin Hypercube Sampling (LHS) procedure. The posterior forecast samples are obtained from model runs that use the corresponding posterior parameter samples. Two iterative sample update methods are considered: one based on an a perturbed observation Kalman filter update and one based on a square root Kalman filter update. These alternatives give nearly the same results and converge in only two iterations. The uncertain parameters considered include hydraulic conductivities, drainage and river leakage factors, van Genuchten soil property parameters, and dispersion coefficients. The results show that the uncertainty in many of the parameters is reduced during the smoother updating process, reflecting information obtained from the observations. Some of the parameters are insensitive and do not benefit from measurement information. The correlation coefficients among certain parameters increase in each iteration, although they generally stay below 0.50.

Introduction

Hydrological models can provide detailed information that improves our understanding of complex natural systems. They can also be used to forecast the effects of different water resource management scenarios (eg. Cartwright et al., 2006, Li et al., 2010, Matial and Johnes, 2012). The reliability of a model forecast depends primarily on uncertainties in model structure, in model inputs (initial conditions and forcing terms), and in model parameters. In this article we describe two efficient methods for quantifying the effects of uncertainty, illustrating concepts with a field study that involves complex interactions among surface water, ground water, and salinity. We consider both input and parameter uncertainties and account for information provided by imperfect measurements of some of the forecast variables. Our approach illustrates how data assimilation techniques can be used to combine prior information, model predictions, and measurements to provide an integrated picture of uncertainty.

Bayesian methods provide a convenient framework for both data assimilation and uncertainty analysis. In practical applications that involve spatially distributed nonlinear models with non-Gaussian uncertainties the Bayesian approach is usually only feasible when implemented in an ensemble, or Monte Carlo, form. Ensemble uncertainty analysis works with a set (or ensemble) of parameter samples that describe the likely range of the unknown parameter values. The algorithm provides interval estimates (or probabilistic ranges) that explicitly provide information on uncertainty. This is in contrast to methods that provide a single point (or deterministic) parameter estimate without an accompanying assessment of uncertainty. In the Bayesian application considered here the end goal is to produce informative posterior probability distributions rather than “best estimates” of model parameters and forecast variables. In this context, point values such as posterior means and modes help to characterize the posterior density but are not ends in themselves.

Ensemble algorithms start with user-generated prior parameter samples that represent the best available information about the parameter prior distribution before considering field observations of forecast variables. The prior samples are modified by the Bayesian algorithm to give posterior (or updated) parameter samples that incorporate the field observations. The posterior statistics computed by an ensemble algorithm can be viewed as approximations to the corresponding distributional properties of the exact Bayesian posterior probability distribution, which is generally not possible to derive in closed form.

In the past few decades, ensemble methods have been widely used in uncertainty assessment and data assimilation applications, reflecting dramatic advances in computational hardware and methods (Kuczera and Parent, 1998, Helton and Davis, 2003, Ramirez et al., 2008). Examples include Markov Chain Monte Carlo (MCMC) and Importance Sampling (IS) methods, which implement the Bayesian approach with minimal simplifications and assumptions (Vrugt et al., 2009, Jin et al., 2010). In spatially distributed hydrologic applications such as the one considered here the number of uncertain model parameters and states can become very large. MCMC and other exact ensemble methods can require very large sample sizes to give accurate results for large problems, making them computationally expensive.

The computational limitations of these methods have prompted the development of alternative ensemble approaches that reduce computational effort by making application-specific approximations and simplifications. It should be kept in mind that some ensemble updating procedures used to derive posterior distributions from sampled prior distributions may be suboptimal for nonlinear problems. Nevertheless, these approximate methods are typically much more computationally feasible than other sampling methods (such as MCMC) that make fewer approximations (but are still suboptimal due to finite sample sizes and slow convergence). This is especially true when the model of interest is computationally demanding.

Examples of efficient but approximate alternatives to MCMC include the ensemble Kalman filter and related ensemble smoothing algorithms (Evensen, 1992, Evensen, 1994). These algorithms have been used for data assimilation in oceanic models (Evensen and Van Leeuwen, 1996, Houtekamer and Mitchell, 1998), meteorological forecasting (Houtekamer and Mitchell, 2001), soil moisture investigations (Reichle et al., 2002), snow data analysis (Slater and Clark, 2004), and parameter updating in hydrological models (Moradkhani et al., 2005, Chen and Zhang, 2006).

The ensemble Kalman filter (EnKF) is a sequential algorithm that derives an ensemble of updated parameter values at the current time from a linear combination of the current measurement and an ensemble of model forecasts initialized at the previous update time. The weighting between the measurement and forecast ensemble depends on covariances between these quantities. In the EnKF the forecast ensemble implicitly defines a Bayesian prior distribution while the updated ensemble implicitly defines a Bayesian posterior distribution. The original form of the EnKF requires the current measurement to be perturbed with a random term that has the statistical properties assumed to apply to the actual (but unknown) measurement error. Square Root Filters (SRF) were introduced, in part, to avoid the need for such perturbed observations (Whitaker and Hamill, 2002, Tippett et al., 2003, Evensen, 2004, Sakov and Oke, 2008). The SRF method updates the posterior mean and square root of posterior covariance of the parameters separately. Livings et al. (2008) provides a generic set of necessary and sufficient conditions for the SRF to yield an unbiased state estimate.

When the EnKF or SRF are used to assess the uncertainty of time-invariant parameters, such as aquifer hydraulic conductivities, the updated parameter and state ensembles change at each measurement update. The complete set of measurements is incorporated only at the final time of the historical period. In this case, the samples of the states obtained by inserting the posterior parameter samples into the model are different from the corresponding samples obtained from the Kalman filter. This is because the Kalman filter posterior state samples computed at a given time depend only on measurements obtained at or before this time while the smoother samples depend on all the measurements. Also, the sequential filter update requires that the forecasting model be stopped, reinitialized, and restarted at each measurement update. These difficulties can be resolved by using a version of the ensemble smoother (ES) proposed by van Leeuwen and Evensen (1996). The smoother updates the parameters and states simultaneously using all observations, without stopping and restarting the numerical model. The ES typically runs faster than the EnKF but can give less accurate posterior distributions when the states are nonlinear functions of the parameters (Emerick and Reynolds, 2012, van Leeuwen and Evensen, 1996). The different computational time required in the two methods occurs mainly because the EnKF requires additional time to restart model simulations. Examples of computation time differences observed for the two methods can be found in Emerick and Reynolds (2012). Bailey et al. (2012) applied ES to a more complicated agricultural aquifer-canal-stream example. However, in hydrology the ES has been applied primarily to synthetic cases or relatively simple examples (Dunne and Entekhabi, 2005, Bailey and Bau, 2010, Bailey and Bau, 2012, Emerick and Reynolds, 2012). Generally speaking, the EnKF has been more popular for real-world applications, despite the conceptual limitations and inconsistencies that arise when it is used to assess uncertainty in time-invariant parameters.

Emerick and Reynolds (2012) developed an improved version of the ES, which they call an Ensemble Smoother with Multiple Data Assimilation (ESMDA). The ESMDA is an iterative smoothing algorithm that takes an approach that differs from either the ensemble Kalman filter or the traditional ensemble smoother. On each iteration of the algorithm the parameter ensemble is updated with all measurements at once, rather than sequentially. The state prediction ensemble for the current iteration is derived by running the forecast model for the entire data period with the current parameter values. This gives a predicted state ensemble that is consistent with the parameter ensemble, overcoming the EnKF inconsistency mentioned above. In this respect ESMDA is similar to the ES. However, in the ESMDA the entire smoothing process is repeated for several iterations until the parameter samples and associated state forecasts converge. The measurement noise covariance is adjusted between iterations to compensate for the reuse of the same data. Emerick and Reynolds (2012) show that this adjustment process is optimal for linear smoothing problems and a good approximation for some nonlinear problems.

Since the ESMDA is a relatively new method it has not yet been applied to complex hydrological models. This paper investigates the application of ESMDA in an uncertainty assessment of a computationally intensive distributed flow and transport model of the Yanqi Basin of Northwest China. We consider both the original version of ESDMA, which uses perturbed observations similar to those used in the traditional EnKF, and a new version of ESDMA that uses a square root update similar to the SRF (Livings et al., 2008, Evensen, 2004). The square root updating procedure presented in this study does not require observation perturbations during the updating process and also produces unbiased parameter ensembles. These advantages could be attractive to some researchers. Therefore in this article the comparison of the two approaches is done to show that the new combination of ESMDA with a square root filter also works well in practice.

The paper is organized as follows. First, basic information on the Yanqi Basin is provided. Then the numerical model of the basin is introduced and the ESMDA method is applied. After that the posterior distributions of parameters and outputs are discussed and conclusions are presented.

Section snippets

Study area

The Yanqi basin, the area of interest, is located in Xinjiang province, in the northwest of China (Fig. 1). The topography is high in the north and low in the south, sloping from northwest to southeast with the elevation decreasing from 1200 m to 1048 m around Bosten Lake. The basin is hot in summer with high evaporation and scarce precipitation and cold in winter with little evaporation. The evaporation from April to September amounts to 81% of the total annual evaporation. The precipitation and

MIKESHE/MIKE11

MIKESHE (DHI, 2011a, DHI, 2011b) is a physically based and spatially distributed numerical hydrological modelling system that is a version of the Système Hydrologique Européen (SHE) (Abbott et al., 1986). MIKE11 (DHI, 2011c, DHI, 2011d) is an associated modelling system for rivers and channels. The modular structure and the process-based framework of MIKESHE/MIKE11 provide the flexibility to simulate different hydrologic processes at their relevant spatial and temporal scales. The MIKESHE model

Posterior time-invariant parameters

In an ensemble analysis it is possible to describe parameter uncertainty in a number of ways, including histograms or probability values, ensemble means and standard deviations, and correlations. All of these statistical indicators provide distinctive insights that can support the design of data collection programs, improvements in models, and decision-making. Fig. 3 shows, as an example, how the histogram of the CondScaleFactor parameter ensemble changes over different iteration steps for the

Conclusions

This paper demonstrates how a Bayesian model uncertainty analysis can be carried out for a field study that considers complex interactions among surface water, groundwater, and salinity. The uncertainty analysis relies on an efficient iterative ensemble smoother, the ESMDA algorithm of Emerick and Reynolds (2012). This Bayesian algorithm distinguishes prior statistics, which do not depend on field observations, from posterior (or conditional) statistics that rely on observations. The ensemble

Acknowledgements

The research could not be accomplished without the time series of flow and transport inputs and observations provided by Dr. Yang Pengnian and Dr. Guo Yuchuan (Xinjiang Agricultural University, Urumqi, China). The authors thank Dr. Binghuai Lin (DCEE, MIT) for his helpful discussion about the methods. Many thanks are due for financial support provided by the Sino-Swiss Science and Technology Cooperation project ‘Aquifer storage and utilization in arid areas of northwest China’. Finally we

References (55)

  • H.L. Liu et al.

    Investigation of groundwater response to overland flow and topography using a coupled MIKESHE/MIKE11 modelling system for an arid watershed

    J. Hydrol.

    (2007)
  • B. Minasny et al.

    A conditioned Latin hypercube method for sampling in the presence of ancillary information

    Comput. Geosci.

    (2006)
  • H. Moradkhani et al.

    Dual state-parameter estimation of hydrologic models using ensemble Kalman filter

    Adv. Water Resour.

    (2005)
  • A. Ramirez et al.

    Monte Carlo analysis of uncertainties in the Netherlands greenhouse gas emission inventory for 1990–2004

    Atmos. Environ.

    (2008)
  • J.R. Thompson et al.

    Application of the coupled MIKESHE/MIKE11 modelling system to a lowland wet grassland in southeast England

    J. Hydrol.

    (2004)
  • S. Arnold et al.

    Uncertainty in parameterisation and model structure affect simulation results in coupled ecohydrological models

    Hydrol. Earth Syst. Sci.

    (2009)
  • R.T. Bailey et al.

    Ensemble smoother assimilation of hydraulic head and return flow data to estimate hydraulic conductivity distribution

    Water Resour. Res.

    (2010)
  • R.T. Bailey et al.

    Estimating geostatistical parameters and spatially-variable hydraulic conductivity within a catchment system using an ensemble smoother

    Hydrol. Earth Syst. Sci.

    (2012)
  • Brunner P., 2005. Water and salt management in the Yanqi Basin, China. Ph.D. thesis, Institute of Environmental...
  • K. Christiaens et al.

    Use of sensitivity and uncertainty measures in distributed hydrological modeling with an application to the MIKE SHE model

    Water Resour. Res.

    (2002)
  • C. Demetriou et al.

    Evaluating sustainable groundwater management options using the MIKE SHE integrated hydrogeological modelling package

    Environ. Modell. Softw.

    (1999)
  • DHI, 2011. MIKE SHE User Manual, Volume1: User...
  • DHI, 2011. MIKE SHE User Manual, Volume 2: Reference...
  • DHI, 2011. MIKE 11 A modelling system for Rivers and Channels User...
  • DHI, 2011. MIKE 11 A Modelling Systems for Rivers and Channels Reference...
  • Y. Ding et al.

    Identification of Manning’s roughness coefficients in shallow water flows

    J. Hydraul. Eng-ASCE

    (2004)
  • S. Dunne et al.

    An ensemble-based reanalysis approach to land data assimilation

    Water Resour. Res.

    (2005)
  • Cited by (0)

    View full text