Profiting from pilot studies: Analysing mortality using Bayesian models with informative priors

doi:10.1016/j.baae.2012.11.003

Basic and Applied Ecology

Volume 14, Issue 1, February 2013, Pages 81-89

https://doi.org/10.1016/j.baae.2012.11.003 Get rights and content

Abstract

Pilot studies are often used to help design ecological studies. Ideally the pilot data are incorporated into the full-scale study data, but if the pilot study's results indicate a need for major changes to experimental design, then pooling pilot and full-scale study data is difficult. The default position is to disregard the preliminary data. But ignoring pilot study data after a more comprehensive study has been completed forgoes statistical power or costs more by sampling additional data equivalent to the pilot study's sample size. With Bayesian methods, pilot study data can be used as an informative prior for a model built from the full-scale study dataset. We demonstrate a Bayesian method for recovering information from otherwise unusable pilot study data with a case study on eucalypt seedling mortality. A pilot study of eucalypt tree seedling mortality was conducted in southeastern Australia in 2005. A larger study with a modified design was conducted the following year. The two datasets differed substantially, so they could not easily be combined. Posterior estimates from pilot dataset model parameters were used to inform a model for the second larger dataset. Model checking indicated that incorporating prior information maintained the predictive capacity of the model with respect to the training data. Importantly, adding prior information improved model accuracy in predicting a validation dataset. Adding prior information increased the precision and the effective sample size for estimating the average mortality rate. We recommend that practitioners move away from the default position of discarding pilot study data when they are incompatible with the form of their full-scale studies. More generally, we recommend that ecologists should use informative priors more frequently to reap the benefits of the additional data.

Zusammenfassung

Pilotstudien werden oft genutzt, um das Design von ökologischen Untersuchungen zu bestimmen. Idealerweise werden die Daten aus der Pilotstudie in den Datensatz der Hauptstudie inkorporiert, aber wenn die Pilotstudie die Notwendigkeit größerer Veränderungen an der Versuchsanlage anzeigt, ist das Zusammenführen von Daten aus Pilot- und Hauptstudie schwierig. Die normale Entscheidung ist dann, die vorläufigen Daten nicht zu berücksichtigen. Aber die Ergebnisse aus der Pilotstudie zu ignorieren, nachdem die Hauptstudie abgeschlossen wurde, bedeutet, auf Teststärke zu verzichten, oder der Aufwand steigt durch das Sammeln zusätzlicher Daten, die den Probenumfang der Pilotstudie ausgleichen. Mit Bayesschen Methoden können Daten aus der Pilotstudie als informative a-priori-Verteilung (‘informative prior’) für ein Modell genutzt werden, das aus dem Datensatz der Hauptstudie hergestellt wird. Wir demonstrieren eine Bayessche Methode zur Gewinnung von Information aus anders nicht nutzbaren Pilotstudiendaten anhand einer Fallstudie zur Mortalität von Eukalyptussetzlingen. Eine Pilotstudie zur Mortalität von Eukalyptussetzlingen wurde 2005 in SO-Australien durchgeführt. Eine größere Studie mit einem modifizierten Design wurde im Folgejahr durchgeführt. Die beiden Datensätze unterschieden sich erheblich, so dass sie nicht ohne weiteres zusammengeführt werden konnten. A-posteriori-Schätzwerte der Modellparameter für die Pilotstudie wurden einem Modell für den zweiten, größeren Datensatz zugrundegelegt. Die Überprüfung des Modells zeigte, dass die Hinzunahme einer informativen a-priori-Verteilung die Vorhersagekraft des Models in Bezug auf die Trainingsdaten erhielt. Die Hinzunahme einer informativen a-priori-Verteilung verbesserte die Genauigkeit des Modells für die Vorhersage eines Validierungsdatensatzes und steigerte Genauigkeit und effektive Probengröße für die Bestimmung der durchschnittlichen Mortalitätsrate. Wir empfehlen, dass Praktiker von der Standardpraxis abrücken sollten, Daten aus Pilotstudien zu verwerfen, wenn diese mit ihrer Hauptstudie inkompatibel sind. Ganz allgemein empfehlen wir, dass Ökologen informative a-priori-Verteilungen häufiger einsetzen sollten, um die Vorteile zusätzlicher Daten zu nutzen.

Introduction

The ability of Bayesian analyses to formally incorporate prior information has been little exploited in ecological research despite its distinct appeal in a world where data and resources for research are limited and the impetus for rapid learning is great. Many textbooks on Bayesian methods for ecologists introduce the concept of informative priors in the first few pages (e.g., Kéry 2010; McCarthy 2007), yet researchers typically use very vague priors. In effect, these researchers assert that they have no prior knowledge of model parameters. Informally and formally, researchers use prior information to determine questions, sampling regimes, and model structures, and to interpret results. Whereas use of informative priors is rare, perhaps because it is hard to express prior knowledge as a probability distribution (Clyde 1999) or because informative priors are perceived as overly subjective (Dennis 1996), though subjectivity is not a requirement of priors (Hobbs & Hilborn 2006). Another reason why informative priors are not used is a fear that they could reduce model accuracy. A prior not only affects the precision of estimates, but also the location of the posterior and therefore, potentially the predictive accuracy.

Here we extend the domain of using informative priors in ecological modelling (see Choy, O’Leary, & Mengersen 2009; Dupuis & Joachim 2006; McCarthy & Masters 2005; McCarthy, Citroen, & McCall 2008) with pilot study data. The primary goal of a pilot study is to inform the design of the subsequent full-scale study. Pilot studies are small studies aimed to help reduce important uncertainties, and reveal the sample size needed to detect particular effects. Or, they can reveal major drivers of system variation and help identify at what spatial and temporal scales the variation is propagated, or help refine a set of predictors. However, the data generated by a pilot study may not simply inform the design of the full-scale study, but may also help address the fundamental research question. Many ecologists’ default position is to disregard the preliminary data, a stance recommended by texts on data collection and analysis (e.g. Green 1979). However, treating the results of a pilot study as an informative prior using Bayesian methods provides a formal and transparent way to combine two otherwise incompatible data sources, improving the cost-effectiveness of the research.

We illustrate the use of Bayesian informative priors to recover the inferential and predictive power of otherwise unusable pilot study data with a case study on eucalypt tree seedling mortality. Mortality rate is a key demographic parameter to be estimated. Understanding how and why it varies, is key to describing and learning about population dynamics of all species (Zens & Peart 2003). However, the mortality events from which rates are calculated are often rare in absolute terms, and in many systems may also be episodic. Combined, these issues make mortality rates difficult to characterise, so large datasets that include many individuals and span large spatial and temporal ranges are often required.

Including prior information in a Bayesian model will increase the precision of relevant parameter estimates and posterior predictive distributions (McCarthy 2007). The effect on model accuracy is harder to define though no less important and can only be done via model validation. Effects of priors on accuracy have received limited attention in ecology. If a prior disagrees with the likelihood then the model will be less accurate with respect to the training data than it would without the prior information. But the cost to predictive accuracy specific to the training data may be outweighed by the increased generality of the model, as it can include information from a wider range of sources than the training data alone. In this paper, we demonstrate how to treat the knowledge learned during a pilot study on eucalypt seedling mortality as an informative Bayesian prior. We express this source of prior knowledge as the degree to which the full-scale study budget would need to increase to recover the loss of information by not including the prior. The particular demonstration of using informative priors we provide here highlights their general benefit.

Section snippets

Seedling survival experiment design and analyses

During 2005–2009 a pilot and full-scale transplant survival experiment were undertaken at 21 sites on 14 farming properties in the Goulburn–Broken Catchment, Victoria, Southeastern Australia. The pilot study began in October 2005 when 54 Grey Box eucalypt (Eucalyptus microcarpa) seedlings were planted in each of four grazing exclosures in a split plot design with two treatments. For the first treatment plants were watered during the first six months at fortnightly intervals while the second

Results

The average mortality rate was lower for the seedlings planted during the pilot study (13% per month, assuming an average maximum temperature of 28 °C), than for those planted for the full-scale study (39% per month, assuming an average maximum temperature of 28 °C). But in both cases there was much variation between sites. In the full-scale study topographically wetter sites tended to have lower rates of mortality. In both experiments, periods and places with higher average maximum temperatures

Discussion

The effect of priors on model precision is well known and documented (e.g. McCarthy & Masters 2005). However, the effect on accuracy is rarely examined, if at all. We know of no such examples in the ecological literature. Here, the observed small improvement in accuracy may be due to the increase in model generality and scope achieved by including the prior derived from the pilot study. This represents a general benefit of including prior information, and in particular from a pilot study, as it

Conclusion

We have demonstrated how preliminary data can be used as an informative Bayesian prior. A key finding of this study is that including the prior information increased the precision of some parameters at the same time as improving or at least not compromising the model's predictive accuracy. As well as changing the precision, including prior information also changed the location of some posterior distributions. Changing the location could be beneficial or undesirable depending on what is driving

Acknowledgements

We thank Libby Rumpff, Megan Watson, James Camac, Chris Jones, Rhiannon Apted, Warwick McCallum and Alex Thompson for help in the field. We also acknowledge the assistance of Carla Miles (Goulburn Broken Catchment Management Authority), Kate Hill (Department of Sustainability and Environment) and the land owners who allowed us to undertake this study on their properties. We also thank Bob O’hara, Rod Fensham, and anonymous reviewers for helpful comments on earlier versions of this manuscript.

References (26)

M. Zens et al.
Dealing with death data: Individual hazards, mortality and bias
Trends in Ecology & Evolution
(2003)
P. Allison
Discrete-time methods for the analysis of event histories
Sociological Methodology
(1982)
S. Choy et al.
Elicitation by design in ecology: Using expert opinion to inform priors for Bayesian statistical models
Ecology
(2009)
M. Clyde
Bayesian model averaging: A tutorial: Comment
Statistical Science
(1999)
B. Dennis
Discussion: Should ecologists become Bayesians?
Ecological Applications
(1996)
J.A. Dupuis et al.
Bayesian estimation of species richness from quadrat sampling data in the presence of prior information
Biometrics
(2006)
G.E. Garrard et al.
A predictive model of avian natal dispersal distance provides prior information for investigating response to landscape change
Journal of Animal Ecology
(2012)
A. Gelman
Prior distributions for variance parameters in hierarchical models
Bayesian Analysis
(2006)
A. Gelman
Scaling regression inputs by dividing by two standard deviations
Statistics in Medicine
(2008)
A. Gelman et al.
Bayesian data analysis. Texts in Statistical Science
(2004)

R.H. Green

Sampling design and statistical methods for environmental biologists

(1979)

J.A. Hanley et al.

The meaning and use of the area under a receiver operating characteristic (ROC) curve.

Radiology

(1982)

N.T. Hobbs et al.

Alternatives to statistical hypothesis testing in ecology: A guide to self teaching

Ecological Applications

(2006)

Cited by (17)

Modelling invasive alien plant distribution: A literature review of concepts and bibliometric analysis
2021, Environmental Modelling and Software
Citation Excerpt :
Modelling using Bayesian inference can increase the precision of model parameter estimates once it allies previous knowledge (a prior) with newly collected data (the likelihood) to produce a posterior distribution (Morris et al., 2015). Increased accuracy of estimates from the use of informative priors is well established, and improvement has been proved in ecological contexts (e.g., McCarthy and Masters 2005; McCarthy et al., 2008; Morris et al., 2013; 2015; Marcot et al., 2019). Indeed, the increase in precision is an inherent feature of using informative priors (Morris et al., 2015).
In the last decades, the number of publications dedicated to the application of species distribution models (SDMs) to invasive alien plants (IAPs) has constantly increased. Although recent reviews have addressed very relevant issues in the application of SDMs, the modelling approaches (i.e., algorithms) applied to IAPs have not been systematized. Therefore, we undertook a bibliographic review of articles devoted to SDMs and IAPs, from 1996 to 2019. Our results indicate that maximum entropy, generalized linear models, boosted regression trees and random forest were the four most frequent types of modelling approaches. It was clear that there was a variety of different approaches, regarding the type of algorithm to be used. We discuss the characteristics of the most cited algorithms, providing examples of their application in SDMs dedicated to IAPs. We advocate the use of a combination of different algorithms, an intensive evaluation of predictors, a thorough validation process, and a critical analysis of model predictions.
Modeling the impact of temperature on the population abundance of the ambrosia beetle Xyleborus affinis (Curculionidae: Scolytinae) under laboratory-reared conditions
2021, Journal of Thermal Biology
Modeling the impact of temperature on each life stage of a beetle population represents a continuing challenge. This study evaluates the effects of five temperature treatments (20, 23, 26, 29 and 32 °C) on population abundance and timing of a colony of ambrosia beetles Xyleborus affinis reared under laboratory conditions and use this data to develop demographic and phenological models. Abundances at each life stage (eggs, larvae, pupae and adult) were examined through periodic destructive sampling; given that it was not possible to track individuals. To assess the effects of temperature on oviposition, development and survival rates we developed a novel estimation strategy based on cohorts, which does not require individual developmental data. Since oviposition was entirely unwitnessed, we assessed competing empirical ovipositional models. Rates of development were computed using a modal rate curve for each life stage, and rates were projected to cohorts in life stages assuming log-normal developmental variance. Temperature-driven survival rates were assumed to be logistic with a quadratic exponent to capture modal temperature dependence. Parameters were estimated simultaneously using minimum negative log posterior likelihood, assuming Poisson distribution of observations and using priors to inform unobserved developmental rates and enforce mechanistic constraints on oviposition models. A parabolic function best described oviposition rate. Optimal developmental temperatures were 30.5 °C, 29 °C and 27.5 °C, with maximum developmental rates of 0.26/day, 0.12/day and 0.23/day for eggs, larvae and pupae, respectively. The survival rates in the range 20-29 °C were equal to 1 in the eggs-to-larvae transition, from 0.72 to 0.35 in larvae-to-pupae transition, and from 0.2 to 0.89 in pupae-to-adults transition. This procedure effectively characterized the direct thermal effects on development and survival of each life stage in the X. affinis under laboratory conditions and would be suitable for estimating temperature dependence for other species in which individual observations are not possible.
Quantifying uncertainty about forest recovery 32-years after selective logging in Suriname
2017, Forest Ecology and Management
Citation Excerpt :
If the prior information specified does not reduce the model fit, the model DIC value (which is analogous to the AIC in a likelihood framework) will improve (Spiegelhalter et al., 2002). Improvements indicate that the prior information specified is consistent with the data and that the data had an overwhelmingly large influence on the posterior distribution (Morris et al., 2013). Bayes’ principle is especially suited to our needs given that we aim to quantify and communicate uncertainty around post-logging recovery in a probabilistic manner, and overcome data limitations with a quantitatively rigorous approach to better inform forest managers (Hobbs and Hooten, 2015; McCarthy and Masters, 2005).
The inclusion of managed tropical forests in climate change mitigation has made it important to find the sustainable sweet-spot for timber production, carbon retention, and the quick recovery of both. Here we focus on recovery of aboveground carbon and timber stocks over the first 32 years after selective logging with the CELOS Harvest System in Suriname. Our data are from twelve 1-ha permanent sample plots in which growth, survival, and recruitment of trees ≥15 cm diameter were monitored between 1978 and 2012. We evaluate plot-level changes in basal area, stem density, aboveground carbon, and timber stock in response to average timber harvests of 15, 23, and 46 m³ ha⁻¹. We use a linear mixed-effects model in a Bayesian framework to quantify recovery time for aboveground carbon and timber stock, as well as annualized increments for both. Our statistical models accounted for the uncertainty associated with the height and biomass allometries used to estimate aboveground carbon and increased precision of annualized aboveground carbon increments by including data from forty-one plots located elsewhere on the Guiana Shield. The probabilities of aboveground carbon recovery to pre-logging levels 32 years after harvests of 15, 23 and 46 m³ ha⁻¹ were 45%, 40%, and 24%, respectively. Net aboveground carbon increment for logged forests across all harvest intensities was 0.64 Mg C ha⁻¹ yr⁻¹, more than twice the rate observed in unlogged forests (0.26 Mg C ha⁻¹ yr⁻¹). The probabilities of timber stock recovery at the end of the 32-year period were highest after harvest intensities of 15 and 23 m³ ha⁻¹ (with 80% probability) and lowest after the harvest of 46 m³ ha⁻¹ (with 70% probability). Timber stock recovery across all harvest intensities was driven primarily by residual tree growth. Application of the legal cutting limit of 25 m³ ha⁻¹ will require more than 70 and 40 years to recover aboveground carbon and timber stocks, respectively, with 90% probability. Based on the low recruitment rates of the twelve species harvested, the 25 year cutting cycle currently implemented in Suriname is too short for long-term timber stock sustainability. We highlight the value of propagating uncertainty from individual tree measurements to statistical predictions of carbon stock recovery. Ultimately, our study reveals the trade-offs that must be made between timber and carbon services as well as the opportunity to use carbon payments to enable longer cutting rotations to capture carbon from forest regrowth.
Exogenous and Endogenous Sources of Uncertainty Inform Global Performance Monitoring
2024, SSRN
Male position in a sexual network reflects mating role and body size
2023, Journal of Vertebrate Biology
Beyond mean fitness: Demographic stochasticity and resilience matter at tree species climatic edges
2023, Global Ecology and Biogeography

View all citing articles on Scopus

View full text

Profiting from pilot studies: Analysing mortality using Bayesian models with informative priors

Abstract

Zusammenfassung

Introduction

Section snippets

Seedling survival experiment design and analyses

Results

Discussion

Conclusion

Acknowledgements

Trends in Ecology & Evolution

Discrete-time methods for the analysis of event histories

Sociological Methodology

Elicitation by design in ecology: Using expert opinion to inform priors for Bayesian statistical models

Ecology

Bayesian model averaging: A tutorial: Comment

Statistical Science

Discussion: Should ecologists become Bayesians?

Ecological Applications

Bayesian estimation of species richness from quadrat sampling data in the presence of prior information

Biometrics

A predictive model of avian natal dispersal distance provides prior information for investigating response to landscape change

Journal of Animal Ecology

Prior distributions for variance parameters in hierarchical models

Bayesian Analysis

Scaling regression inputs by dividing by two standard deviations

Statistics in Medicine

Bayesian data analysis. Texts in Statistical Science

Sampling design and statistical methods for environmental biologists

The meaning and use of the area under a receiver operating characteristic (ROC) curve.

Radiology

Alternatives to statistical hypothesis testing in ecology: A guide to self teaching

Ecological Applications