Skip to main content
Log in

The good, the bad and the outliers: automated detection of errors and outliers from groundwater hydrographs

Les bonnes, les mauvaises et les aberrantes: détection automatisée des erreurs et des données aberrantes des hydrogrammes des eaux souterraines

Lo bueno, lo malo y lo extraño: detección automatizada de errores y valores atípicos de los hidrogramas de agua subterránea

理想值、不理想值及异常值:地下水水位图中误差和异常值的自动检测

O bem, o mau e os dados discrepantes: detecção automática de erros e dados discrepantes a partir de hidrogramas das águas subterrâneas

  • Paper
  • Published:
Hydrogeology Journal Aims and scope Submit manuscript

Abstract

Suspicious groundwater-level observations are common and can arise for many reasons ranging from an unforeseen biophysical process to bore failure and data management errors. Unforeseen observations may provide valuable insights that challenge existing expectations and can be deemed outliers, while monitoring and data handling failures can be deemed errors, and, if ignored, may compromise trend analysis and groundwater model calibration. Ideally, outliers and errors should be identified but to date this has been a subjective process that is not reproducible and is inefficient. This paper presents an approach to objectively and efficiently identify multiple types of errors and outliers. The approach requires only the observed groundwater hydrograph, requires no particular consideration of the hydrogeology, the drivers (e.g. pumping) or the monitoring frequency, and is freely available in the HydroSight toolbox. Herein, the algorithms and time-series model are detailed and applied to four observation bores with varying dynamics. The detection of outliers was most reliable when the observation data were acquired quarterly or more frequently. Outlier detection where the groundwater-level variance is nonstationary or the absolute trend increases rapidly was more challenging, with the former likely to result in an under-estimation of the number of outliers and the latter an overestimation in the number of outliers.

Résumé

Les observations suspectes du niveau des eaux souterraines sont fréquentes et peuvent survenir pour de nombreuses raisons, allant d’un processus biophysique imprévu aux défauts de forage et aux erreurs de gestion de données. Les observations imprévues peuvent fournir de précieuses informations qui remettent en questions les prévisions existantes et peuvent être considérées comme des valeurs aberrantes, tandis que les défauts de suivi et de traitement des données peuvent être considérés comme des erreurs, et, si ignorées, peuvent compromettre l’analyse des tendances et la calibration des modèles hydrogéologiques. Idéalement, des valeurs aberrantes et des erreurs doivent être identifiées, mais à ce jour il s’agit d’un processus subjectif qui n’est. pas reproductible et qui est. inefficace. Cet article présente une approche permettant d’identifier de manière objective et efficaces de multiples types d’erreurs et de valeurs aberrantes. L’approche ne nécessite que l’hydrogramme des niveaux d’eaux souterraines observés, ne requiert aucune attention particulière concernant l’hydrogéologie, des paramètres d’influence (par exemple les pompages) ou la fréquence du suivi, et est. disponible gratuitement dans la boîte à outils HydroSight. Dans ce cas, les algorithmes et les modèles de séries chronologiques sont détaillés et appliqués à quatre piézomètres possédant des dynamiques variées. La détection des valeurs aberrantes était la plus fiable lorsque les données d’observation étaient acquises trimestriellement ou plus fréquemment. La détection des valeurs aberrantes où la variance du niveau d’eaux souterraines est. non stationnaire ou la tendance absolue augmente rapidement était plus difficile, la première pouvant entraîner une sous-estimation du nombre de valeurs aberrantes et la dernière une surestimation du nombre de valeurs aberrantes.

Resumen

Las observaciones sospechosas del nivel de agua subterránea son comunes y pueden surgir por muchas razones que van desde un proceso biofísico imprevisto hasta errores por fallas en la perforación o en el manejo de los datos. Las observaciones imprevistas pueden aportar valiosas ideas que desafían las expectativas existentes y pueden considerarse valores atípicos, mientras que las fallas en el monitoreo y en el manejo de datos pueden considerarse errores y, si se ignoran, pueden comprometer el análisis de tendencias y la calibración del modelo de agua subterránea. Idealmente, se deben identificar los valores atípicos y los errores, pero hasta la fecha esto ha sido un proceso subjetivo que no es reproducible y es ineficiente. Este artículo presenta un enfoque para identificar objetiva y eficientemente múltiples tipos de errores y valores atípicos. El enfoque sólo requiere el hidrograma de agua subterránea observado, no requiere consideración especial de la hidrogeología, de los impulsos (por ejemplo, el bombeo) o de la frecuencia de monitoreo, y está libremente disponible en la caja de herramientas de HydroSight. Aquí, los algoritmos y el modelo de serie temporal se detallan y se aplican a cuatro pozos de observación con variables dinámicas. La detección de valores atípicos fue más confiable cuando los datos de observación se adquirieron trimestralmente o con mayor frecuencia. La detección de valores atípicos en que la varianza del nivel del agua subterránea no es estacionaria o la tendencia absoluta aumenta rápidamente era más difícil, ya que la primera probablemente daría lugar a una subestimación del número de valores atípicos y la última a una sobreestimación del número de valores atípicos.

摘要

可疑的地下水位观测结果很常见,有多种原因可造成意料之外的生物物理过程、钻孔故障及资料管理误差。意料之外的观测结果可提供宝贵的启示,这些启示挑战已有的期望值,可被认为是异常值,而监测和数据处理故障可被认为是误差,如果忽略不计,这些异常值和误差可危害趋势分析和地下水模型校正。理想的是,异常值和误差应当辨别出来,但到目前为止,这一直是个凭经验的过程,这个过程是不可复制的,也是低效的。本文介绍了一种客观、有效地辨别多种类型的误差和异常值的方法。该方法只需要观测的地下水位图,不需要特别考虑水文地质条件、驱动因素(例如抽水)或者监测频率,在HydroSight工具箱免费获得。在此,详述了算法和时间序列模型,并应用到四个具有不同动力学的观测孔中。当每个季度或者更频繁地需要观测数据时,异常值的检测最可靠。地下水位变化非稳定或者绝对趋势快速增加的地方,异常值检测更具挑战,前者可能导致异常值数量的低估,后者可能导致异常值数量的高估。

Resumo

Observações suspeitas sobre o nível das águas subterrâneas são comuns e podem surgir por diversas razões que vão desde um processo biofísico imprevisto até a falha no furo e erros de gerenciamento de dados. As observações imprevistas podem fornecer informações valiosas que desafiam as expectativas existentes e podem ser considerados dados discrepantes, enquanto o monitoramento e as falhas no tratamento de dados podem ser considerados erros e, se ignorados, podem comprometer a análise de tendências e a calibração de modelo de águas subterrâneas. Idealmente, dados discrepantes e erros devem ser identificados, mas até agora este tem sido um processo subjetivo que não é reprodutível e é ineficiente. Este artigo apresenta uma abordagem objetiva e eficiente para identificar múltiplos tipos de erros e dados discrepantes. A abordagem requer apenas o hidrograma de águas subterrâneas observado, não requer nenhuma consideração particular da hidrogeologia, as forçantes (p.ex. bombeamento) ou a frequência de monitoramento, e é disponível gratuitamente na caixa de ferramentas HydroSight. Aqui, os algoritmos e modelos de séries temporais são detalhados e aplicados em quatro furos de observação com diferentes dinâmicas. A detecção de dados discrepantes foi mais confiável quando os dados de observação foram adquiridos trimestralmente ou mais frequentemente. A detecção de dados discrepantes em que a variância do nível da água subterrânea não é estacionária ou a tendência absoluta aumenta rapidamente foi mais desafiadora, com a primeira provavelmente resultando em uma subestimação do número de dados discrepantes e a última uma sobre-estimava no número de dados discrepantes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Battaglia F, Orfei L (2005) Outlier detection and estimation in nonlinear time series. J Time Ser Anal 26:107

    Article  Google Scholar 

  • Berendrecht W, van Geer F (2016) A dynamic factor modeling framework for analyzing multiple groundwater head series simultaneously. J Hydrol 536:50–60

    Article  Google Scholar 

  • Chan W-S (1995) Understanding the effect of time series outliers on sample autocorrelations. TEST 4:179–186. https://doi.org/10.1007/BF02563108

    Article  Google Scholar 

  • Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 09(2009):15

  • Chu W, Gao X, Sorooshian S (2011) A new evolutionary search strategy for global optimization of high-dimensional problems. Inf Sci 181:4909–4927. https://doi.org/10.1016/j.ins.2011.06.024

    Article  Google Scholar 

  • Cipra T (2006) Exponential smoothing for irregular data. Appl Math :597

  • Li L, Wen Z, Wang Z (2016) Outlier detection and correction during the process of groundwater lever monitoring base on Pauta criterion with self-learning and smooth processing. In: Zhang L, Song X, Wu Y (ed) Theory, methodology, tools and applications for modeling and simulation of complex systems. Springer, Singapore, pp 497–503

  • Peterson TJ, Western AW (2014) Nonlinear time-series modeling of unconfined groundwater head. Water Resour Res 50:8330–8355. https://doi.org/10.1002/2013WR014800

    Article  Google Scholar 

  • Peterson TJ, Western AW, Shapoori V (2017) HydroSight: a toolbox for data-driven hydrogeological insights. http://peterson-tim-j.github.io/HydroSight/. Accessed August 2017

  • Shapoori V, Peterson T, Western A, Costelloe J (2015a) Estimating aquifer properties using groundwater hydrograph modelling. Hydrol Process 29:5424–5437. https://doi.org/10.1002/hyp.10583

    Article  Google Scholar 

  • Shapoori V, Peterson TJ, Western AW, Costelloe JF (2015b) Top-down groundwater hydrograph time-series modeling for climate-pumping decomposition. Hydrogeol J. https://doi.org/10.1007/s10040-014-1223-0

  • Shapoori V, Peterson TJ, Western AW, Costelloe JF (2015c) Decomposing groundwater head variations into meteorological and pumping components: a synthetic study. Hydrogeol J 23:1431–1448. https://doi.org/10.1007/s10040-015-1269-7

    Article  Google Scholar 

  • Tremblay Y, Lemieux J-M, Fortier R, Molson J, Therrien R, Therrien P, Comeau G, Talbot Poulin M-C (2015) Semi-automated filtering of data outliers to improve spatial analysis of piezometric data. Hydrogeol J 23:851–868

    Article  Google Scholar 

  • van der Spek JE, Bakker M (2017) The influence of the length of the calibration period and observation frequency on predictive uncertainty in time series modeling of groundwater dynamics. Water Resour Res 53:2294–2311. https://doi.org/10.1002/2016WR019704

    Article  Google Scholar 

  • von Asmuth JR, Bierkens MFP (2005) Modeling irregularly spaced residual series as a continuous stochastic process. Water Resour Res 41. https://doi.org/10.1029/2004WR003726

  • von Asmuth JR, Maas K, Bakker M, Petersen J (2008) Modeling time series of ground water head fluctuations subjected to multiple stresses. Groundwater 46:30–40. https://doi.org/10.1111/j.1745-6584.2007.00382.x

    Google Scholar 

  • Yihdego Y, Webb J (2011) Modeling of bore hydrographs to determine the impact of climate and land-use change in a temperate subhumid region of southeastern Australia. Hydrogeol J 19:877–887. https://doi.org/10.1007/s10040-011-0726-1

    Article  Google Scholar 

Download references

Acknowledgements

This research was funded by the Australian Research Council Linkage Project LP130100958 and funding partners: Bureau of Meteorology (Australia); Department of Environment, Land, Water and Planning (Vic., Australia); Department of Economic Development, Jobs, Transport and Resources (Vic., Australia); and Power and Water Corporation (N.T., Australia). The authors are grateful to Dr. Elisabetta Carrara (Bureau of Meteorology) for her constructive input during the development of the algorithms.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tim J. Peterson.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peterson, T.J., Western, A.W. & Cheng, X. The good, the bad and the outliers: automated detection of errors and outliers from groundwater hydrographs. Hydrogeol J 26, 371–380 (2018). https://doi.org/10.1007/s10040-017-1660-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10040-017-1660-7

Keywords

Navigation