Abstract
This study focuses on the potential improvement of environmental variables modelling by using linear state-space models, as an improvement of the linear regression model, and by incorporating a constructed hydro-meteorological covariate. The Kalman filter predictors allow to obtain accurate predictions of calibration factors for both seasonal and hydro-meteorological components. This methodology can be used to analyze the water quality behaviour by minimizing the effect of the hydrological conditions. This idea is illustrated based on a rather extended data set relative to the River Ave basin (Portugal) that consists mainly of monthly measurements of dissolved oxygen concentration in a network of water quality monitoring sites. The hydro-meteorological factor is constructed for each monitoring site based on monthly precipitation estimates obtained by means of a rain gauge network associated with stochastic interpolation (kriging). A linear state-space model is fitted for each homogeneous group (obtained by clustering techniques) of water monitoring sites. The adjustment of linear state-space models is performed by using distribution-free estimators developed in a separate section.
Similar content being viewed by others
References
Alpuim T, El-Shaarawi A (2009) Modeling monthly temperature data in Lisbon and Prague. Environmetrics 20:835–852
Anagnostou EN, Krajewski WF, Seo DJ, Johnson ER (1998) Mean-field rainfall bias studies for WSR-88D. J Hydrol Eng 28:27–39
Ato AF, Samuel O, Oscar YD, Moi PA (2010) Mining and heavy metal pollution: assessment of aquatic environments in Tarkwa (Ghana) using multivariate statistical analysis. J Environ Stat 1:1–13
Bengtsson T, Cavanaugh JE (2008) State-space discrimination and clustering of atmospheric time series data based on Kullback information measures. Environmetrics 19:103–121
Ciach GJ, Krajewski WF (2006) Analysis and modeling of spatial correlation structure of small-scale rainfall in Central Oklahoma. Adv Water Resour 29:1450–1463
Charles SP, Bates BC, Smith IN, Hughes JP (2004) Statistical downscaling of daily precipitation from observed and modelled atmospheric fields. Hydrol Process 18:1373–1394
Chokmani K, Ouarda TBMJ (2004) Physiographical space-based kriging for regional flood frequency estimation at ungauged sites. Water Resour Res 40:1–13
Cressie NAC (1989) The many faces of spatial prediction. In Armstrong M (ed) Geostatistics vol 1. Kluwer, Dordrecht, pp 163–176
Costa M, Alpuim T (2010) Parameter estimation of state space models for univariate observations. J Stat Plan Inference 140:1889–1902
Costa M, Alpuim T (2011) Adjustment of state space models in view of area rainfall estimation. Environmetrics 22:530–540
Costa M, Gonçalves AM (2011) Clustering and forecasting of dissolved oxygen concentration on a river basin. Stoch Environ Res Risk Assess 25:151–163
De Marsily G (1986) Quantitative hydrogeology. Academic Press, London, pp 440
Dirks KN, Hay JE, Stow CD, Harris D (1998) High-resolution studies of rainfall on Norfolk Island Part II: interpolation of rainfall data. J Hydrol 208:187–193
Elhatip H, Hinis MA, G lbahar N (2008) Evaluation of the water quality at Tahtali dam watershed in Izmir-Turkey by means of statistical methodology. Stoch Environ Res Risk Assess 22:391–400
Gonçalves AM, Alpuim T (2011) Water quality monitoring using cluster analysis and linear models. Environmetrics 22:933–945
Goodrich DC, Faures J, Woolhiser DA, Lane LJ, Sorooshian S (1995) Measurement and analysis of small-scale convective storm rainfall variability. J Hydrol 173:283–308
Goovaerts P (2000) Geostatistical approaches for incorporating elevation into the spatial interpolation of rainfall. J Hydrol 228:113–129
Greene AM, Robertson AW, Kirshner S (2008) Analysis of Indian monsoon daily rainfall on subseasonal to multidecadal time scales using a hidden Markov model. Q J R Meteorol Soc 134:875–887
Harvey AC (1996) Forecasting structural time series models and the Kalman filter. Cambridge University Press, Cambridge
Helena B, Pardo R, Vega M, Barrado E, Fernandez JM, Fernandez L (2000) Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga river, Spain) by principal component analysis. Wat Res 34:807–816
Isaaks EH, Srivastava RM (1989) Applied geostatistics. Oxford University Press, New York, pp 572
Journel AG, Huijbregts, ChJ (1978) Mining geostatistics. Academic Press, London, pp 600
Kokic P, Crimp S, Howden M (2011) Forecasting climate variables using a mixed-effect state-space model. Environmentrics 22:409–419
Kyriakidis PC, Journel AG (1999) Geostatistical space-time models: a review. Math Geol 31(6):651–684
Leybourne SJ (2006) Estimation and testing of time-varying coefficient regression models in the presence of linear restrictions. J Forecast 12(1):49–62
Lischeid J (2009) Non-linear visualization and analysis of large water quality data sets: a model-free basis for efficient monitoring and risk assessment. Stoch Environ Res Risk Assess 23:977–990
Liu CW, Lin KH, Kuo YM (2003) Application of factor analysis in the assessment of ground-water quality in a blackfoot disease area in Taiwan. Sci Total Environ 313:77–89
Machado A, Silva M, Valentim H (2010) A contribute for the evaluation of water bodies status in Northern Region. Revista Recursos Hídricos 31(1):57–63
Matheron G (1963) Principles of geostatistics. Econ Geol 58:1246–1266
Mc Kenna JE (2003) An enhanced cluster analysis program with bootstrap signficance testing for ecological community analysis. Environ Model Softw 18:205–220
Mirás-Avalos JM, Paz-González A, Vidal-Vázquez E, Sande-Fouz P (2007) Mapping monthly rainfall data in Galicia (NW Spain) using inverse distances and geostatistical methods. Adv Geosci 10:51–57
Nicolau R, Rodrigues R (2000) Comparação de técnicas de interpolação espacial para mapeamento da precipitação máxima diária anual (krigagem utilizando a altitude com deriva externa). Documento Interno do INAG 17:1261–1272
Oliveira RES, Lima MMCL, Vieira JMP (2005) An indicator system for surface water quality in river basins. In The fourth inter-celtic colloquium on hydrology and management of water resources, Universidade do Minho, Guimarães, Portugal
Pagan A (1980) Some identification and estimation results for regression models with stochastically varying coefficients. J Econom 13:341–363
Rathbun SL (1998) Spatial modelling in irregularly shaped regions: kriging estuaries. Environmetrics 9:109–129
Renwich JA, Mullan AB, Porteous A (2009) Statistical downscaling of New Zealand climate. Weather Clim 29:24–44
Rossi RE, Mulla DJ, Journel AG, Franz EH (1992) Geostatistical tools for modelling and interpreting ecological spatial dependence. Ecol Monogr 62:277–314
Severino E, Alpuim T (2005) Spatiotemporal models in the estimation of area precipitation. Environmetrics 16:773–802
Simeonov V, Stratis JA, Samara C, Zachariadis G, Voutsa D, Anthemidis A, Sofoniou M, Kouimtzis TH (2003) Assessment of the surface water quality in northern Greece. Water Res 37:4119–4124
Shrestha S, Kazama F (2007) Assessment of surface water quality using multivariate techniques: a case study of the Fuji river basin, Japan. Environ Modell Softw 22:464–475
Shumway R, Stoffer D (1982) An approach to time series smoothing and forecasting using EM algorithm. J Time Ser Anal 3:253–264
Varol M, Sen B (2009) Assessment of surface water quality using multivariate statistical techniques: a case study of Behrimaz Stream, Turkey. Environ Monit Assess 159:543–553
Vega M, Pardo RE, Barrado E, Debán L (1998) Assessment of seasonal and polluting effects on the quality of river water by exploratory data analysis. Water Res 32(12):3581–3592
Wurderlin DA, Diaz MP, Ame MV, Pesce SF, Hued AC, Bistoni MA (2001) Pattern recognition techniques for the evaluation of spatial and temporal variations in water quality. A case study: Suquia river basin (Cordoba-Argentina). Wat Res 35:2881–2894
Acknowledgements
The authors would like to thank to Eng. Pimenta Machado from the Portuguese Regional Directory for the Northern Environment and Natural Resources, and to Eng. Cláudia Brandão from the Portuguese Institute of Water, for sharing their skills and experiences and for supplying the monitored data. A. Manuela Gonçalves acknowledges the financial support provided by the Research Centre of Mathematics of the University of Minho through the FCT Pluriannual Funding Program.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Distribution-free estimators for the mean and for the transition matrix
In the parameters estimation of state-space models were performed distribution-free estimators developed from the original work by Costa and Alpuim (2010). However, in that work it was proposed a distribution-free estimator for state-space models with univariate observations. Thus, a straightforward generalization of these estimators is presented in order to allow their application to a class of multivariate state-space models that largely covers the present work’s needs.
To estimate the unknown parameters in the model
it is assumed a set of observations \({{\mathcal Y}_n=(\mathbf{Y}_1, {\mathbf Y}_2, \ldots, {\mathbf Y}_n),}\) and regular matrices of known constants \({\mathbf H}_1, {\mathbf H}_2, \ldots, {\mathbf H}_n\) are available. The mean vector \({\mathbf \mu}\) can be easily estimated by the method of moments, i.e., \(\widehat{\mathbf \mu}=n^{-1}\sum_{t=1}^n {\mathbf H}_t^{-1}{\mathbf Y}_t.\)
As variables Y t are not stationary, we are not under the usual conditions of the consistency of generalized method of moments. Thus, it is necessary to establish additional conditions to guarantee this consistency. By construction, the estimator \(\widehat{\mathbf \mu}\) of the mean vector is unbiased, so we can guarantee its consistency by proving that \(var(\widehat{\mathbf \mu})\rightarrow {\mathbf 0}\) when \(n\rightarrow+\infty,\) and thus establishing sufficient conditions. Covariance matrix of \(\widehat{\mathbf \mu}\) is given by
Applying the Kronecker product ⊗ and the operator vec, we get
Under the stationarity conditions of process {β t } the first parcel is an O p , seeing that \(\sum_{k=-\infty}^{+\infty}\mathbf{\Upgamma}(k)<\infty, \) (e.g., Hamilton 1994, p. 279). To guarantee that
it is sufficient to admit the additional condition |h −1 t,(i,j) | < c for all \(t=1,2,.., i,j=1,2,\ldots,m\) and for some positive constant c, where h −1 t,(i,j) represents the (i, j) element of H −1 t matrix.
The autoregressive matrix \(\mathbf{\Upphi}\) is estimated by means of covariance structure of process \({\mathbf H}_t^{-1}{\mathbf Y}_t.\) We see that
In a VAR(1) process, the relation \(\mathbf{\Upgamma}_k=\mathbf{\Upphi \Upgamma}_{k-1}\) is valid, for k = 1, 2, .... Thus, we proposed the autoregressive matrix estimator \(\widehat{\Upphi}\) based on the least squares method of these equations by taking \(k=1,2,\ldots,\ell_\mathbf{\Upphi}.\) Thus, we have
where \(\widehat{\mathbf{\Upgamma}}_k=\frac{1}{n}\sum_{t=1}^{n-k}\left[\left(\mathbf {H}_{t+k}^{-1}\mathbf{Y}_{t+k}-\widehat{\mathbf{\mu}}\right)\left(\mathbf {H}_{t}^{-1}\bf{Y}_{t}-\widehat{\mathbf{\mu}}\right)^{\prime}\right]. \)
By construction, the autoregressive matrix estimator is consistent, since \(\widehat{\mathbf{\Upgamma}}_k\) is a consistent estimator of \(\mathbf{\Upgamma}_k.\) Whereas we have proposed a consistent estimator to \(\mathbf{\mu},\) we consider that the mean vector \(\mathbf{\mu}\) is known. To analyse the consistency of \(\widehat{\mathbf{\Upgamma}}_k\) we have
Under the previously established condition, the last three parcels converge in probability to a null matrix. Indeed, by defining the second parcel as \({\mathbf A}=[A_{ij}]_{i,j=1,2,\ldots,m}\) and, with some algebraic manipulation, we have
and considering σ e,(r,s) = cov(e t,(r), e t,(r)), the variance is given by
If the additional condition |h −1 t,(i,j) | < c is valid, this parcel tends to 0 when \(n\rightarrow+\infty. \) In a similar way, we defined the third parcel by \(\mathbf{B}=[B_{ij}]_{i,j=1,2,\ldots,m}\) with elements given by
with variance
Again, we guarantee that B ij = O p through the same condition |h −1 t,(i,j) | < c. As we shall see, this condition is a sufficient condition, as the last parcel also tends to a null matrix. Indeed, if we denote the last parcel as \(\mathbf{C}=[C_{ij}]_{i,j=1,2,\ldots,m}, \) we have
with variance given by
These results allow us to conclude that if |h −1 t,(i,j) | < c, the estimator \(\widehat{\mathbf{\Upgamma}}_k\) is consistent to \(\mathbf{\Upgamma},\) when we replace the mean vector \(\mathbf{\mu}\) by a consistent estimator.
1.2 Distribution-free estimators to noise variances
The estimation of covariance matrices of errors terms e t and ε t is an important and difficult step at the same time. At times, the recursive procedures applied to the obtained Gaussian likelihood estimates diverge or produce non-positive semidefined matrices. Sometimes, these problems occur when the initial solution is not as close to estimates as necessary. We propose an estimator to \(\mathbf{\Upsigma}_\mathbf{\varepsilon}\) based on covariance structure of a VAR(1) stationary process.
We know that the relation \(\mathbf{\Upsigma}_{\mathbf{\beta}}=\mathbf{\Upphi\Upsigma\Upphi}^{\prime}+ \mathbf{\Upsigma}_{\mathbf{\varepsilon}}\) is valid in a VAR(1) stationary process, or by applying the Kronecker product ⊗ and the operator vec
By applying the vec operator to the equation \(\mathbf{\Upgamma}_k=\mathbf{\Upphi\Upgamma}_{k-1},\) with \(k=1,2,\ldots,\) we have:
or
Note that the matrix \({\mathbf \Upsigma}_{\mathbf \varepsilon}\) is symmetric, that is, \(vec\left({\mathbf \Upsigma}_{\mathbf \varepsilon}\right)^{\prime}_{1,(j-1) m+i-1}=vec \left({\mathbf \Upsigma}_{\mathbf \varepsilon}\right)^{\prime}_{1,im+j}\) with 1 ≤ i ≤ m − 1 and 1 ≤ j ≤ i. Thus, we constructed a line matrix \(vec\left({\mathbf \Upsigma}_{\mathbf \varepsilon}\right)^{\prime*},\) with m + m(m − 1)/2 columns, that we got from \(vec\left({\mathbf \Upsigma}_{\mathbf \varepsilon}\right)^{\prime}\) by removing the elements \(vec\left({\mathbf \Upsigma}_{\mathbf \varepsilon}\right)^{\prime}_{1,im+j},\) with 1 ≤ i ≤ m − 1 and 1 ≤ j ≤ i.
By applying the same methodology to the matrix \({\mathbf \Updelta}_k\) defined as
we summed the columns (two by two) with the index im + j and (j − 1)m + i − 1, with 1 ≤ i ≤ m − 1 and 1 ≤ j ≤ i, thus obtaining a new matrix \({\mathbf \Updelta}_k^*\) with m + m(m − 1)/2 columns.
The estimator for \({\mathbf \Upsigma}_{\mathbf \varepsilon}\) is constructed via the least squares method applied to equations
with \(k=1,2,\ldots,\ell_{\mathbf \varepsilon}.\) Thus, we obtained the estimator
The consistency of \(\widehat{\mathbf \Upsigma}_{\mathbf \varepsilon}\) is guaranteed under the same conditions of the consistency of \(\widehat{\mathbf \Updelta}_k.\) As we have seen, a sufficient condition for this is |h −1 t,(i,j) | < c.
In order to estimate the covariance matrix \({\mathbf \Upsigma}_{\mathbf e},\) we defined
Therefore, we had the expectation
By applying the vec operator, and with some algebraic manipulation, we got
As the matrix \({\mathbf \Upsigma}_{\mathbf e}\) is symmetric, it is necessary to adopt the same procedure as in the estimation of \({\mathbf \Upsigma}_{\mathbf \varepsilon}.\) Thus, we estimated the m + m(m − 1)/2 elements of the covariance matrix.
If we have a consistent estimator to \({\mathbf \Upsigma}_{\mathbf \beta},\) for example given by the proposed estimators to \({\mathbf \Upphi}\) and \({\mathbf \Upsigma}_{\mathbf \varepsilon},\) the consistency of \(\widehat{\mathbf \Upsigma}_{\mathbf e}\) boils down to the limit of variance of each element of \(vec({\mathbf \Upupsilon})= n vec(\widehat{\mathbf \Uppsi})^{\prime}[\sum_{t=1}^n({\mathbf H}_t^{-1}\otimes {\mathbf H}_t^{-1})^{\prime}]^{-1}.\) The variance of the (i, j) element of \({\mathbf \Upupsilon}\) is given by
where h −1 t,(i,j) represents the (i,j) element of the matrix \({\mathbf H}_t^{-1}\) and a ij the (i, j) element of the matrix \([\sum_{t=1}^n({\mathbf H}_t^{-1}\otimes {\mathbf H}_t^{-1})^{\prime}]^{-1}.\)
For simplicity, we adopt β t,i − μ i = β * t,i . If we take in account that the states \({\mathbf \beta}_t\) are uncorrelated to noise \({\mathbf e}_s\) for all t and s, the previous expression can be decomposed into four parcels. The first parcel has the form
The first parcel can be decomposed into
but we can write
In order for this parcel to be an O p , it is sufficient to admit the additional regularity conditions, such as cov(β t,i β t,j, β s,i β t,j ) for all t and s, that do not depend on time.
The cross terms have a similar structure. For example, the first term can be computed by,
So, if we admit that the elements of matrix \(\mathbf{H}^{-1}_t\) are limited as c 1 < |h −1 t,(i,j) | < c 2, where c 1 and c 2 are positive constants, it follows that this term is an O p . In addition to these conditions on h −1 t,(i,j) , if we ensure that the vector of error e t is stationary of fourth-order, then we conclude that the last parcel of variance of the (i, j) element of \({\mathbf \Upupsilon}\) is an O p , too.
Thus, under the additional stationarity conditions of fourth-order on the vector of disturbances and the above restrictions on the elements of the matrices \({\mathbf H}_t^{-1}, \) the proposed distribution-free estimator to \({\mathbf \Upsigma}_{\mathbf e}\) is consistent.
Rights and permissions
About this article
Cite this article
Gonçalves, A.M., Costa, M. Predicting seasonal and hydro-meteorological impact in environmental variables modelling via Kalman filtering. Stoch Environ Res Risk Assess 27, 1021–1038 (2013). https://doi.org/10.1007/s00477-012-0640-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-012-0640-7