Profiting from pilot studies: Analysing mortality using Bayesian models with informative priors
Introduction
The ability of Bayesian analyses to formally incorporate prior information has been little exploited in ecological research despite its distinct appeal in a world where data and resources for research are limited and the impetus for rapid learning is great. Many textbooks on Bayesian methods for ecologists introduce the concept of informative priors in the first few pages (e.g., Kéry 2010; McCarthy 2007), yet researchers typically use very vague priors. In effect, these researchers assert that they have no prior knowledge of model parameters. Informally and formally, researchers use prior information to determine questions, sampling regimes, and model structures, and to interpret results. Whereas use of informative priors is rare, perhaps because it is hard to express prior knowledge as a probability distribution (Clyde 1999) or because informative priors are perceived as overly subjective (Dennis 1996), though subjectivity is not a requirement of priors (Hobbs & Hilborn 2006). Another reason why informative priors are not used is a fear that they could reduce model accuracy. A prior not only affects the precision of estimates, but also the location of the posterior and therefore, potentially the predictive accuracy.
Here we extend the domain of using informative priors in ecological modelling (see Choy, O’Leary, & Mengersen 2009; Dupuis & Joachim 2006; McCarthy & Masters 2005; McCarthy, Citroen, & McCall 2008) with pilot study data. The primary goal of a pilot study is to inform the design of the subsequent full-scale study. Pilot studies are small studies aimed to help reduce important uncertainties, and reveal the sample size needed to detect particular effects. Or, they can reveal major drivers of system variation and help identify at what spatial and temporal scales the variation is propagated, or help refine a set of predictors. However, the data generated by a pilot study may not simply inform the design of the full-scale study, but may also help address the fundamental research question. Many ecologists’ default position is to disregard the preliminary data, a stance recommended by texts on data collection and analysis (e.g. Green 1979). However, treating the results of a pilot study as an informative prior using Bayesian methods provides a formal and transparent way to combine two otherwise incompatible data sources, improving the cost-effectiveness of the research.
We illustrate the use of Bayesian informative priors to recover the inferential and predictive power of otherwise unusable pilot study data with a case study on eucalypt tree seedling mortality. Mortality rate is a key demographic parameter to be estimated. Understanding how and why it varies, is key to describing and learning about population dynamics of all species (Zens & Peart 2003). However, the mortality events from which rates are calculated are often rare in absolute terms, and in many systems may also be episodic. Combined, these issues make mortality rates difficult to characterise, so large datasets that include many individuals and span large spatial and temporal ranges are often required.
Including prior information in a Bayesian model will increase the precision of relevant parameter estimates and posterior predictive distributions (McCarthy 2007). The effect on model accuracy is harder to define though no less important and can only be done via model validation. Effects of priors on accuracy have received limited attention in ecology. If a prior disagrees with the likelihood then the model will be less accurate with respect to the training data than it would without the prior information. But the cost to predictive accuracy specific to the training data may be outweighed by the increased generality of the model, as it can include information from a wider range of sources than the training data alone. In this paper, we demonstrate how to treat the knowledge learned during a pilot study on eucalypt seedling mortality as an informative Bayesian prior. We express this source of prior knowledge as the degree to which the full-scale study budget would need to increase to recover the loss of information by not including the prior. The particular demonstration of using informative priors we provide here highlights their general benefit.
Section snippets
Seedling survival experiment design and analyses
During 2005–2009 a pilot and full-scale transplant survival experiment were undertaken at 21 sites on 14 farming properties in the Goulburn–Broken Catchment, Victoria, Southeastern Australia. The pilot study began in October 2005 when 54 Grey Box eucalypt (Eucalyptus microcarpa) seedlings were planted in each of four grazing exclosures in a split plot design with two treatments. For the first treatment plants were watered during the first six months at fortnightly intervals while the second
Results
The average mortality rate was lower for the seedlings planted during the pilot study (13% per month, assuming an average maximum temperature of 28 °C), than for those planted for the full-scale study (39% per month, assuming an average maximum temperature of 28 °C). But in both cases there was much variation between sites. In the full-scale study topographically wetter sites tended to have lower rates of mortality. In both experiments, periods and places with higher average maximum temperatures
Discussion
The effect of priors on model precision is well known and documented (e.g. McCarthy & Masters 2005). However, the effect on accuracy is rarely examined, if at all. We know of no such examples in the ecological literature. Here, the observed small improvement in accuracy may be due to the increase in model generality and scope achieved by including the prior derived from the pilot study. This represents a general benefit of including prior information, and in particular from a pilot study, as it
Conclusion
We have demonstrated how preliminary data can be used as an informative Bayesian prior. A key finding of this study is that including the prior information increased the precision of some parameters at the same time as improving or at least not compromising the model's predictive accuracy. As well as changing the precision, including prior information also changed the location of some posterior distributions. Changing the location could be beneficial or undesirable depending on what is driving
Acknowledgements
We thank Libby Rumpff, Megan Watson, James Camac, Chris Jones, Rhiannon Apted, Warwick McCallum and Alex Thompson for help in the field. We also acknowledge the assistance of Carla Miles (Goulburn Broken Catchment Management Authority), Kate Hill (Department of Sustainability and Environment) and the land owners who allowed us to undertake this study on their properties. We also thank Bob O’hara, Rod Fensham, and anonymous reviewers for helpful comments on earlier versions of this manuscript.
References (26)
- et al.
Dealing with death data: Individual hazards, mortality and bias
Trends in Ecology & Evolution
(2003) Discrete-time methods for the analysis of event histories
Sociological Methodology
(1982)- et al.
Elicitation by design in ecology: Using expert opinion to inform priors for Bayesian statistical models
Ecology
(2009) Bayesian model averaging: A tutorial: Comment
Statistical Science
(1999)Discussion: Should ecologists become Bayesians?
Ecological Applications
(1996)- et al.
Bayesian estimation of species richness from quadrat sampling data in the presence of prior information
Biometrics
(2006) - et al.
A predictive model of avian natal dispersal distance provides prior information for investigating response to landscape change
Journal of Animal Ecology
(2012) Prior distributions for variance parameters in hierarchical models
Bayesian Analysis
(2006)Scaling regression inputs by dividing by two standard deviations
Statistics in Medicine
(2008)- et al.
Bayesian data analysis. Texts in Statistical Science
(2004)
Sampling design and statistical methods for environmental biologists
The meaning and use of the area under a receiver operating characteristic (ROC) curve.
Radiology
Alternatives to statistical hypothesis testing in ecology: A guide to self teaching
Ecological Applications
Cited by (17)
Modelling invasive alien plant distribution: A literature review of concepts and bibliometric analysis
2021, Environmental Modelling and SoftwareCitation Excerpt :Modelling using Bayesian inference can increase the precision of model parameter estimates once it allies previous knowledge (a prior) with newly collected data (the likelihood) to produce a posterior distribution (Morris et al., 2015). Increased accuracy of estimates from the use of informative priors is well established, and improvement has been proved in ecological contexts (e.g., McCarthy and Masters 2005; McCarthy et al., 2008; Morris et al., 2013; 2015; Marcot et al., 2019). Indeed, the increase in precision is an inherent feature of using informative priors (Morris et al., 2015).
Quantifying uncertainty about forest recovery 32-years after selective logging in Suriname
2017, Forest Ecology and ManagementCitation Excerpt :If the prior information specified does not reduce the model fit, the model DIC value (which is analogous to the AIC in a likelihood framework) will improve (Spiegelhalter et al., 2002). Improvements indicate that the prior information specified is consistent with the data and that the data had an overwhelmingly large influence on the posterior distribution (Morris et al., 2013). Bayes’ principle is especially suited to our needs given that we aim to quantify and communicate uncertainty around post-logging recovery in a probabilistic manner, and overcome data limitations with a quantitatively rigorous approach to better inform forest managers (Hobbs and Hooten, 2015; McCarthy and Masters, 2005).
Male position in a sexual network reflects mating role and body size
2023, Journal of Vertebrate BiologyBeyond mean fitness: Demographic stochasticity and resilience matter at tree species climatic edges
2023, Global Ecology and Biogeography