Comparison of four learning-based methods for predicting groundwater redox status

doi:10.1016/j.jhydrol.2019.124200

Journal of Hydrology

Volume 580, January 2020, 124200

https://doi.org/10.1016/j.jhydrol.2019.124200 Get rights and content

Highlights

•
Supervised learning methods (LDA, BRT, RF) did not generalize well to independent data.
•
Unsupervised learning method (MSOM) generalized to independent data.
•
MSOM redox and depth predictions used to make 3D maps of anoxic probability.
•
Regional redox sequence (oxic-mixed-anoxic) indicates predominantly vertical recharge.
•
Local redox sequences diverse, indicating heterogeneity of flow path orientation and electron donors.

Abstract

Knowing the location where groundwater denitrification occurs, or by proxy the groundwater redox status (oxic, mixed, and anoxic), is valuable information for assessing and managing potential agricultural land-use impacts on freshwater quality. We compare the efficacy of supervised (Linear Discriminant Analysis LDA; Boosted Regression Trees, BRT; and Random Forest, RF) and unsupervised (Modified Self-Organizing Map, MSOM) learning-based methods to predict groundwater redox status in the agriculturally dominated Tasman, Waikato, and Wellington regions of New Zealand. Thresholds applied to regional groundwater-quality samples provide redox status variables and learn heuristics constrained by these variables and applied to spatial factors (climate, elevation, geologic, hydrology soils, and well depth) identify optimal sets of regional predictor variables. A split-sample approach is used to train and test the learning methods ability to predict redox status using the optimal predictor variables. Overall, the supervised methods demonstrate a prediction bias toward oxic conditions and inability to perform statistically well when using independent regional data; for example, consider kappa statistics for BRT (Tasman: 0.42, Waikato: 0.38, Wellington: 0.17), RF (Tasman: 0.42, Waikato: 0.47, Wellington: 0.17 and LDA (Tasman: 0.46, Waikato: 0.32, Wellington: 0.17). By contrast, the unsupervised method performs statistically well when predicting oxic, mixed, and anoxic conditions and corresponding depths when using independent regional data; for example, consider MSOM kappa statistics for Tasman: 0.78, Waikato: 0.80, Wellington: 0.76. The unsupervised learning method provides the added benefits of being (1) able to combine predictions into 3D regional anoxic probability plots for interpreting the spatial influence of paleosols and groundwater flowpaths on redox status, and (2) readily extended to map 3D redox status across New Zealand and other countries despite data bias and sparsity.

Graphical abstract

Introduction

The sustainability of New Zealand’s freshwater resource is facing increasing pressure from agricultural nitrate leaching (Ministry for the Environment, 2007), irrigation demand, and current and future climate change effects (Robertson et al., 2016). The integrity of groundwater quality is of increasing concern because about 40% of the population depends on groundwater for drinking water supply (Ministry for the Environment, 2007), and nutrient-rich baseflow contributions are impacting the health of lowland streams (Ministry for the Environment, 2007). To address the deterioration of freshwater quality, regulators are required to establish nitrate leaching and water-quality limits by 2025 (Ministry for the Environment, 2007). To sustain agricultural production, these limits need to account for attenuation that occurs along groundwater flow paths. Knowing the location where groundwater denitrification occurs, or by proxy the groundwater redox status, thus forms an important link between agricultural leaching sources and water-quality objectives.

The best tools currently available for mapping the redox status for groundwater denitrification are predictive models (Koch et al., 2019). As the complexity of real-world groundwater systems increase from catchment to regional or national scales, it becomes difficult and often impractical to make spatial predictions based on process-based models. Learning-based modeling is an alternative approach to predict the distribution of groundwater redox status based solely on the analysis of available measurements. This approach is possible because learning-based models build relationships between state variables (input, internal and output variables) using a limited number of assumptions about the physical behaviour of the system (Solomatine et al., 2009). That said, the development of a learning-based model is often challenging because failure can occur at any one of several model-building steps: choice of response variables, choice of predictor variables, choice of model architecture, choice of model structure and complexity, choice of model parameters, model training, testing and validation, prediction, and uncertainty quantification. Ultimately, the model performance using this approach is limited by the quality of available data.

Learning-based groundwater-quality models are grouped based on the type of problem being solved, such as regression or classification (Solomatine et al., 2009). Regression problems typically involve predicting a single response variable, such as nitrate concentration, based on learning a function that maps inputs to outputs. Classification is a special form of learning-based modeling in which the problem involves identifying the sub-population to which a new observation belongs, such as redox status (oxic, mixed, or anoxic). Early groundwater studies mainly used linear models, such as logistic regression to generate probabilistic maps of diffuse nitrate contamination (Nolan et al., 2002, Gurdak and Qi, 2012) and depths to the oxic-suboxic interface (Tesoriero et al., 2015). Spatial predictions of groundwater redox status using Linear Discriminant Analysis (LDA) have previously been made by Lee et al., 2008, Close et al., 2016, Wilson et al., 2018. These methods tacitly assume that redox status can be modelled as a linear combination of characteristics whose class samples are continuous (not missing) and normally distributed (Martinez and Kak, 2001). These considerations pose limitations when attempting to build models for predicting 3D regional redox conditions with data that are biased (type, frequency, and spatial sampling), disparate (different physics), and sparse (missing samples). Although linear learning-based modeling is used to study various aspects of groundwater systems, the linkages and interactions among climate, hydrological and biogeochemical cycles across spatiotemporal scales more appropriately favour nonlinear learning-based modeling.

Nonlinear learning-based modeling includes supervised, unsupervised and hybrid machine-Learning (ML) algorithms (Green et al., 2016). The methods associated with these algorithms are known to fit nonlinear relationships while accommodating missing data and interactions among the different predictor variables. Supervised ML algorithms analyze training data and produce an inferred function which can be used for mapping new examples. Unsupervised ML algorithms build models by deducing structures present in the input data; this process may be used to extract general rules, reduce redundancy, or organize data by similarity.

Recent applications of supervised ML algorithms involve testing efficacy of the random forest regression method (Koch et al., 2019) to model depth of the redox interface across Denmark, and the Artificial Neural Network (ANN), Bayesian Network (BN) and Boosted Regression Trees (BRT) methods to predict nitrate concentrations in groundwater of the Central Valley, California (Nolan et al., 2015). In former and later studies, the cross-validation results gave respective R² for independent validation tests of 0.48 and 0.25. The cross-validation results for these methods suggest that the models did not generalize well to independent data possibly due to overfitting given the relatively large number of predictor variables. Also noteworthy in the earlier study is that generalizing the models to holdout (independent) data resulted in a bias with the model overpredicting low concentrations and underpredicting high concentrations. This phenomenon reveals one of the potential challenges when using learning-based models in the presence of sample frequency bias.

Another potential reason for poor predictive performance of learning-based models is attributed to the high degree of model correlation among pairs of predictor variables (Guyon and Elisseeff, 2003). In the case of perfectly correlated predictor variables, the increases in one variable offset corresponding decreases in the second variable with no effect on the training response but with increases in associated prediction uncertainty (Low et al., 2013). These findings underscore the need for identifying an optimal set of predictor variables through some quantitative feature selection process (Singh et al., 2014). Feature selection algorithms fall into two categories, the filter model or the wrapper model (Das, 2001, Kohavi and John, 1997).

The filter model relies on general characteristics of the training data to select some features without involving any learning algorithm. For example, Povak et al. (2014) selectively removed one variable of each paired linear predictor variables characterized by strong Pearson coefficients (ρ > 0.77). By reducing the number of collinear predictor variables (from 50 to 33), the cross-validation statistics for both BRT and Random Forest (RF) models increased (R² > 0.85) suggesting the subsequent model generalized well to independent data. Other investigators applied a form of backward stepwise regression to BRT and RF models that begins with a full model and at each step eliminates variables to find a reduced model that best explains the data (Ransom et al., 2017, Nolan et al., 2018). The removal of unimportant variables is necessary for building robust models that generalize to independent data, but this stepwise process is not likely to be reliable when considering relative importance reported by the BRT and RF methods. The reason is that the relative importance is reported as normalized values. For models characterized by strong correlations among predictor variables, these variables will split importance, thereby reducing their apparent influence in the prediction process and giving the false impression of their true ranked importance.

The wrapper model requires one predetermined learning algorithm in feature selection and uses its performance to evaluate and determine which features are selected (Yu and Liu, 2003, Calvet et al., 2017). The benefits of nonlinear feature selection in groundwater-quality modeling using Learn Heuristics (Metaheuristics in Machine Learning) was demonstrated by Friedel and Buscema (2016) and using evaluated filters, embedded and wrapper methods was demonstrated by Rodriguez-Galiano et al. (2018). These studies demonstrated that the overfitting of predictor variables can be reduced when identifying an optimal number of predictor variables in accordance with the principle of Occam’s razor – unnecessarily complex models should not be preferred to simpler ones (Encyclopaedia Britannica, 2010).

The aim of this study is to predict groundwater redox status across selected agricultural regions of New Zealand. We hypothesize that the redox status classified from groundwater chemistry sampled across regional monitoring networks can be combined with well depths and nationally available climate, geology, hydrology, soils, and topography coverages to provide mutual information (measure of entropy describing mutual dependence among random variables) suitable for learning-based model building and redox class prediction. To test this hypothesis, we evaluate supervised (LDA, BRT, and RF) and unsupervised (Modified Self-Organizing Map, MSOM) learning-based methods. The objectives are to compare the performance of these methods for predicting the probability of groundwater redox status (oxic, mixed, anoxic) and associated depths in groundwater systems of the Tasman, Waikato, and Wellington regions. This study extends the work of Close et al., 2016, Wilson et al., 2018 who, because of the single-dependent model response variable restriction when using LDA, developed separate models for classifying redox status over shallow and deep zones. In addition to evaluating supervised learning-based methods, we develop an innovative unsupervised learning-based workflow to simultaneously predict four response functions: oxic, mixed, anoxic, and depth. In this approach, the relative benefit of mutual information content in the predictor variables can be evaluated by comparing the probability of predicted redox conditions. In information theory, this concept reflects the mutual pairwise dependence among random variables.

Section snippets

Sources of water-quality data

The Tasman, Waikato and Wellington regions of New Zealand were selected to test and compare learning-based methods for predicting groundwater redox status (Fig. 1). Groundwater resources are considered nationally significant with many water-quality monitoring well data that are collected by the regional councils. Tasman District is in the northern part of the South Island and covers an area of 9650 km². The main groundwater resources in the district lie within the alluvial terraces and coastal

Response variables

Regional groundwater sampling data are inherently biased for both the response variable (redox status) and associated predictor variables (spatial attributes). For example, most samples in the groundwater dataset are oxic (Table 2). This bias in redox status is attributed to the increased frequency of sampling in areas where greater water demand is concomitant with oxic conditions, and/or the increased frequency of shallow sampling across the New Zealand landscape dominated by oxic groundwater (

Conclusions

Results from the unsupervised learning-based method (MSOM) are statistically superior to the three commonly used supervised learning-based methods (LDA, BRT, and RF). The MSOM can simultaneously predict oxic, mixed, and anoxic redox status and their associated depths (four response functions) at unstructured grid locations associated with the predictor variables. The statistical robustness of redox predictions is attributed to using learn heuristics for identifying optimal sets of regional

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

We thank staff at the Waikato Regional Council, Greater Wellington Regional Council, Tasman District Council for providing sample data, Institute of Geological and Nuclear Sciences Limited (Rogier Westerhoff), National Institute of Water and Atmospheric Research (Roddy Henderson), Landcare Research (James Barringer) for providing spatial attribute data. Funding for this project was from the New Zealand Ministry of Business, Innovation and Employment as a component of the National Science

References (59)

M.E. Close et al.
Predicting groundwater redox status on a regional scale using linear discriminant analysis
J. Contam. Hydrol.
(2016)
A.M. Kalteh et al.
Review of the self-organizing map (SOM) approach in water resources: analysis, modeling and application
Environ. Model. Softw.
(2008)
R. Kohavi et al.
Wrappers for feature subset selection
Artif. Intell.
(1997)
L. Lilburne et al.
Soil and informatics science combine to develop S-map: a new generation soil information system for New Zealand
Geoderma
(2012)
F. Low et al.
Impact of feature selection on the accuracy and spatial uncertainty of per-field crop classification using support vector machines
ISPRS J. Photogramm. Remote Sens.
(2013)
B.T. Nolan et al.
A statistical learning framework for groundwater nitrate models
J. Hydrol.
(2015)
B.T. Nolan et al.
Metamodeling and mapping of nitrate flux in the unsaturated zone and groundwater, Wisconsin, USA
J. Hydrol.
(2018)
R. Rallo et al.
Neural virtual sensor for the inferential prediction of product quality form process variables
Comput. Chem. Eng.
(2002)
K.M. Ransom et al.
A hybrid machine learning model to predict and visualize nitrate concentration throughout the Central Valley aquifer, California, USA
Sci. Total Environ.
(2017)
V.F. Rodriguez-Galiano et al.
Feature selection approaches for predictive modelling of groundwater nitrate pollution: an evaluation of filters, embedded and wrapper methods
Sci. Total Environ.
(2018)

S. Wilson et al.

Applying linear discriminant analysis to predict groundwater redox conditions conducive to denitrification

J. Hydrol.

(2018)

D.J. Booker

Spatial and temporal patterns in the frequency of events exceeding three times the median flow (FRE3) across New Zealand

J. Hydrol. (NZ)

(2013)

Booker, D.J, 2015. Hydrological Indices for National Environmental Reporting. NIWA report prepared for Ministry for the...

L. Breiman

Random forests

Mach. Learn.

(2001)

M. Buscema

Genetic doping algorithm (GenD): theory and applications

Expert Syst.

(2004)

M. Buscema et al.

Training with input selection and testing (TWIST) algorithm: a significant advance in pattern recognition performance of machine learning

J. Intell. Learn. Syst.

(2013)

L. Calvet et al.

Learnheuristics: hybridizing metaheuristics with machine learning for optimization with dynamic inputs

Open Mathemat.

(2017)

J. Cohen

A coefficient of agreement for nominal scales

Educ. Psychol. Meas.

(1960)

S. Das

Filters, wrappers and a boosting-based hybrid for feature selection

G. Death

Boosted trees for ecological modeling and prediction

Ecology

(2007)

Dietterich, T.G., 2000. Ensemble Methods in Machine Learning, Proceedings of the First International Workshop on...

J.R. Dymond et al.

Nitrate and phosphorous leaching in New Zealand: a national perspective

N. Z. J. Agric. Res.

(2013)

Efron B., Tibshirani, R.J., 1993. An introduction to the bootstrap. In: Monographs on statistics and applied...

J. Elith et al.

A working guide to boosted regression trees

J. Anim. Ecol.

(2008)

Encyclopædia Britannica. Encyclopædia Britannica Online. 2010. “Ockham's razor”. Archived from the original on 23...

Friedel, M.J., Buscema, M., 2016. Aquatic ecosystem modeling under natural and anthropogenic stresses: using an...

M.J. Friedel et al.

Mapping fractional soils and vegetation components from Hyperion satellite imagery using an unsupervised machine-learning workflow

Int. J. Digital Earth

(2017)

Geographx 2012. NZ 8m DEM. Available from...

C.S. Green et al.

Big data bioinformatics

J. Cell Physiol.

(2016)

Cited by (22)

Predicting coastal harmful algal blooms using integrated data-driven analysis of environmental factors
2024, Science of the Total Environment
Coastal harmful algal blooms (HABs) have become one of the challenging environmental problems in the world's thriving coastal cities due to the interference of multiple stressors from human activities and climate change. Past HAB predictions primarily relied on single-source data, overlooked upstream land use, and typically used a single prediction algorithm. To address these limitations, this study aims to develop predictive models to establish the relationship between the HAB indicator – chlorophyll-a (Chl-a) and various environmental stressors, under appropriate lagging predictive scenarios. To achieve this, we first applied the partial autocorrelation function (PACF) to Chl-a to precisely identify two prediction scenarios. We then combined multi-source data and several machine learning algorithms to predict harmful algae, using SHapley Additive exPlanations (SHAP) to extract key features influencing output from the prediction models. Our findings reveal an apparent 1-month autoregressive characteristic in Chl-a, leading us to create two scenarios: 1-month lead prediction and current-month prediction. The Extra Tree Regressor (ETR), with an R² of 0.92, excelled in 1-month lead predictions, while the Random Forest Regressor (RFR) was most effective for current-month predictions with an R² of 0.69. Additionally, we identified current month Chl-a, developed land use, total phosphorus, and nitrogen oxides (NOx) as critical features for accurate predictions. Our predictive framework, which can be applied to coastal regions worldwide, provides decision-makers with crucial tools for effectively predicting and mitigating HAB threats in major coastal cities.
Application of machine learning in groundwater quality modeling - A comprehensive review
2023, Water Research
Groundwater is a crucial resource across agricultural, civil, and industrial sectors. The prediction of groundwater pollution due to various chemical components is vital for planning, policymaking, and management of groundwater resources. In the last two decades, the application of machine learning (ML) techniques for groundwater quality (GWQ) modeling has grown exponentially. This review assesses all supervised, semi-supervised, unsupervised, and ensemble ML models implemented to predict any groundwater quality parameter, making this the most extensive modern review on this topic.
Neural networks are the most used ML model in GWQ modeling. Their usage has declined in recent years, giving rise to more accurate or advanced techniques such as deep learning or unsupervised algorithms. Iran and the United States lead the world in areas modeled, with a wealth of historical data available. Nitrate has been modeled most exhaustively, targeted by nearly half of all studies. Advancements in future work will be made with further implementation of deep learning and explainable artificial intelligence or other cutting-edge techniques, application of these techniques for sparsely studied variables, the modeling of new or unique study areas, and the implementation of ML techniques for groundwater quality management.
Mapping areas prone to piping using random forest with key explanatory variables
2023, Geoderma
In previous studies, areas prone to soil piping were mapped on the catchment scale by integrating data-mining algorithms and geographical databases. However, this left the question of which input data layers are most influential on soil pipe distribution in semiarid regions, and how they can be exploited for accurate predictions of areas prone to piping. Here, a random forest (RF) procedure was applied to classify areas prone to piping using input data layers selected based on knowledge gained through previous theoretical modeling and empirical measurements. The input data layers were: potential incoming solar radiation; distance from the closest streambank; streambank slope gradient; topographic wetness index; flow accumulation; and vegetation cover as a proxy for the shading effect. An extensive dataset of field measurements (N = 774) was used to train and validate the RF classification procedure. Our results indicated that: (i) RF—based on carefully selected influential input data layers—can be used to map areas prone to soil piping at the catchment scale with high accuracy; (ii) the effect of potential solar radiation and—to a lesser extent—tree shading is noteworthy for soil cracking and subsequent piping development; and (iii) soil pipes were found at short distances (<30 m) from steep bank slopes due to the effect of tension cracks in the streambanks. These observations, over a wide region, further establish the effect of soil cracking, and drying and wetting processes, on soil pipe development in semiarid regions. Future studies applying piping susceptibility mapping at the catchment scale may benefit from using these input layers.
Preemptive warning and control strategies for algal blooms in the downstream of Han River, China
2022, Ecological Indicators
Riverine blooms have become a challenging global environmental problem owing to strong disturbances from intensified human activities and the construction or operation of hydraulic projects. Previous studies mainly paid attention to algal blooms in the lakes and reservoirs, while less focused on the prediction and prevention of algal blooms in large rivers. As one of the highly regulated rivers in China, the downstream of Han River frequently occurred consecutive algal blooms in recent decades, was selected as the case study. Firstly, algal blooms in the downstream of Han River during 1992–2021 were investigated to find out the key environmental factors governing algal blooms. Secondly, the distribution lag model was applied to ascertain the time lag between key environmental factors and algal growth in January-April 2021. Thirdly, a random forest machine learning (RFML) model was established for prediction and early warning of river algal bloom. Finally, the threshold of controllable hydro-meteorological conditions and control strategies for the algal bloom prevention was proposed. Results reveal that: (1) The importance ranking of key environmental variables for algal bloom are antecedent air temperature, total phosphorus, flow discharge, total nitrogen, solar radiation and river turbidity; (2) The time-lag between algal growth and the key environmental drivers is the previous 1–5 days period; (3) The RFML model based on antecedent environmental variables can effectively predict the concentration of Chl-a; (4) The diatom bloom is very possible to outbreak if the 5-day sliding accumulated temperature is more than 43.13 °C and the average flow discharge is less than 780 m³/s. Our study may provide potential scientific guidance for the preemptive warning and control strategies of riverine blooms.
Machine learning predictions of nitrate in groundwater used for drinking supply in the conterminous United States
2022, Science of the Total Environment
Groundwater is an important source of drinking water supplies in the conterminous United State (CONUS), and presence of high nitrate concentrations may limit usability of groundwater in some areas because of the potential negative health effects. Prediction of locations of high nitrate groundwater is needed to focus mitigation and relief efforts. A three-dimensional extreme gradient boosting (XGB) machine learning model was developed to predict the distribution of nitrate. Nitrate was predicted at a 1 km resolution for two drinking water zones, each of variable depth, one for domestic supply and one for public supply. The model used measured nitrate concentrations from 12,082 wells and included predictor variables representing well characteristics, hydrologic conditions, soil type, geology, land use, climate, and nitrogen inputs. Predictor variables derived from empirical or numerical process-based models were also included to integrate information on controlling processes and conditions. The model provided accurate estimates at national and regional scales: the training (R² of 0.83) and hold-out (R² of 0.49) data fits compared favorably to previous studies. Predicted nitrate concentrations were less than 1 mg/L across most of the CONUS. Nationally, well depth, soil and climate characteristics, and the absence of developed land use were among the most influential explanatory factors. Only 1% of the area in either water supply zone had predicted nitrate concentrations greater than 10 mg/L; however, about 1.4 M people depend on groundwater for their drinking supplies in those areas. Predicted high concentrations of nitrate were most prevalent in the central CONUS. In areas of predicted high nitrate concentration, applied manure, farm fertilizer, and agricultural land use were influential predictor variables. This work represents the first application of XGB to a three-dimensional national-scale groundwater quality model and provides a significant milestone in the efforts to document nitrate in groundwater across the CONUS.
A novel deep neural network architecture for real-time water demand forecasting
2021, Journal of Hydrology
Short-term water demand forecasting (StWDF) is the foundation stone in the derivation of an optimal plan for controlling water supply systems. Deep learning (DL) approaches provide the most accurate solutions for this purpose. However, they suffer from complexity problem due to the massive number of parameters, in addition to the high forecasting error at the extreme points. In this work, an effective method to alleviate the error at these points is proposed. It is based on extending the data by inserting virtual data within the actual data to relieve the nonlinearity around them. To our knowledge, this is the first work that considers the problem related to the extreme points. Moreover, the water demand forecasting model proposed in this work is a novel DL model with relatively low complexity. The basic model uses the gated recurrent unit (GRU) to handle the sequential relationship in the historical demand data, while an unsupervised classification method, k-means, is introduced for the creation of new features to enhance the prediction accuracy with less number of parameters. Real data obtained from two different water plants in China are used to train and verify the model proposed. The prediction results and the comparison with the state-of-the-art illustrate that the method proposed reduces the complexity of the model six times of what achieved in the literature while conserving the same accuracy. Furthermore, it is found that extending the data set significantly reduces the error by about 30%. However, it increases the training time.

View all citing articles on Scopus

¹: Pacific Northwest National Laboratory, P.O. Box 999, Richland, WA 99352, United States.

View full text

Published by Elsevier B.V.

Research papersComparison of four learning-based methods for predicting groundwater redox status

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Sources of water-quality data

Response variables

Conclusions

Declaration of Competing Interest

Acknowledgements

J. Contam. Hydrol.

Environ. Model. Softw.

Artif. Intell.

Geoderma

ISPRS J. Photogramm. Remote Sens.

J. Hydrol.

J. Hydrol.

Comput. Chem. Eng.

Sci. Total Environ.

Sci. Total Environ.

J. Hydrol.

Spatial and temporal patterns in the frequency of events exceeding three times the median flow (FRE3) across New Zealand

J. Hydrol. (NZ)

Random forests

Mach. Learn.

Genetic doping algorithm (GenD): theory and applications

Expert Syst.

Training with input selection and testing (TWIST) algorithm: a significant advance in pattern recognition performance of machine learning

J. Intell. Learn. Syst.

Learnheuristics: hybridizing metaheuristics with machine learning for optimization with dynamic inputs

Open Mathemat.

A coefficient of agreement for nominal scales

Educ. Psychol. Meas.

Filters, wrappers and a boosting-based hybrid for feature selection

Boosted trees for ecological modeling and prediction

Ecology

Nitrate and phosphorous leaching in New Zealand: a national perspective

N. Z. J. Agric. Res.

A working guide to boosted regression trees

J. Anim. Ecol.

Mapping fractional soils and vegetation components from Hyperion satellite imagery using an unsupervised machine-learning workflow

Int. J. Digital Earth

Big data bioinformatics

J. Cell Physiol.

Research papers
Comparison of four learning-based methods for predicting groundwater redox status