A robust P-spline approach to closed population capture–recapture models with time dependence and heterogeneity

https://doi.org/10.1016/j.csda.2011.08.004Get rights and content

Abstract

We extend the conditional likelihood approach to the analysis of capture–recapture experiments for closed populations by nonparametrically modeling the relationship between capture probabilities and individual covariates using P-splines. The model allows nonparametric functions of multivariate continuous covariates as well as categorical covariates and time effects, greatly enhancing the techniques available to an analyst. To implement this approach in practice, we found it necessary to develop a robust modification of the Horvitz–Thompson estimator. The method is illustrated on several data sets and a small simulation study is conducted.

Introduction

Capture–recapture experiments are a sampling scheme for a population where the primary aim is to estimate the size of the population. They are widely used in ecological statistics in particular and present many challenges to a statistician if the capture probabilities are heterogeneous between individuals. The data from these experiments consist of individual capture histories that record whether an individual has been captured or not captured on each sampling occasion along with individual and environmental covariates but no information is available on the uncaptured individuals. One of the frustrations in the analysis of capture–recapture experiments is that an analyst is well aware of the flexible models available to model probabilities in a logistic regression setting but few of these have been readily available to model the individual capture probabilities. Here, we partially remedy this by implementing the P-spline approach to nonparametric modeling for the generalized linear model (GLM) (McCullagh and Nelder, 1989) arising from the conditional likelihood approach to capture–recapture experiments (Huggins, 1989).

In general, populations may be open or closed and here we consider closed populations with neither immigration nor emigration over the period of the study. Modern population size estimators for closed populations arose from the seminal works of Darroch (1958), Jolly (1965) and Seber (1965) where capture probabilities are assumed to be the same for each individual but allowed to vary across each capture occasion. Further model developments allowed capture probabilities to permit time (t), behavioral response (b) or heterogeneity between individuals (h) (Pollock, 1974, Otis et al., 1978). Covariates can be used to explain heterogeneity in the capture probabilities. For example, in closed population models Pollock et al. (1984) and Huggins (1989) associated covariates collected on individuals (such as body weight or sex) to capture probabilities and used maximum likelihood or conditional likelihood to estimate model parameters. Furthermore, time effects modeled as factors or as functions of covariates, such as the recorded temperature on the capture occasion, can also be considered.

We are concerned with model Mth, where capture probabilities may depend on individual covariates and time. A wide variety of approaches have been considered when to fit model Mth, including martingale methods (Lloyd and Yip, 1991), sample coverage models (Chao et al., 1992), log-linear and latent class models (Agresti, 1994), finite mixture models (Pledger, 2000) and, as mentioned above, the use of individual covariates in GLMs (Huggins, 1989). In practice, the relationship between capture probabilities and covariates can be quite nonlinear and may lead to model miss-specifications when assuming linearity (Huggins and Hwang, 2007). To overcome this, semi/nonparametric techniques have been introduced to capture–recapture models (Chen and Lloyd, 2000, Zwane and van der Heijden, 2004, Huggins and Hwang, 2007, Hwang and Huggins, 2007). Nonparametric models based on P-splines (Eilers and Marx, 1996) have become popular in applied statistics due to their flexibility and simple set up (Wood, 2006). P-splines are low rank smoothers that use a B-spline basis (de Boor, 2001) with a difference penalty applied on coefficients of adjacent B-splines. We consider generalized additive models (GAMs) (Hastie and Tibshirani, 1990, Marx and Eilers, 1998, Wood, 2006, Wang et al., 2011) which model the capture probabilities as a known function of a sum of smooth functions on covariates. The approach of Huggins (1989) uses a conditional likelihood to estimate the parameters in the model for the capture probabilities, which yields a GLM, and then a Horvitz–Thompson (H–T) estimator (Horvitz and Thompson, 1952) to estimate the population size. Here, we extend this model using P-splines, rather than the local polynomials of Huggins and Hwang (2007) and Hwang and Huggins (2007).

We saw in simulations and examples that using the H–T estimator as in Huggins (1989) can result in unrealistically large estimates of the population size. This problem can occur in both parametric and nonparametric estimation when extreme values of the covariate are not smoothed, as they are not close to the remainder of the data. This motivates the development of a robust estimator of the population size. There are two possible approaches to robustifying the population size estimator. The first is to robustify the GLM, or GAM equations and the second is to modify the H–T estimator, and of course one could combine both procedures. We consider the second approach as the first is an extension of known results for GLMs (Cantoni and Ronchetti, 2001). Moreover, even if we do have robust estimators of the model parameters, the estimated capture probabilities corresponding to outlying covariate values may still be extremely small resulting in inflated estimates of the population size. That is, the model that is fitted to the bulk of the data may not be appropriate at some extreme covariate values. Due to the form of the H–T estimator this is less straightforward than usual and care must be taken to not introduce excessive bias. Previous robust versions of the H–T estimators have tended to focus on the more common survey sampling applications where the sampling probabilities are known and the outliers occur in the survey variable (Chambers, 1986, Ghosh, 2008). The situation here is different as the survey variable is a known constant for each individual and the outliers occur in the sampling probabilities. Thus, we downweight unusual sampling probabilities rather than large residuals in the survey variable.

In Section 2.1 we introduce our notation and we review the conditional likelihood approach of Huggins (1989) in Section 2.2. In Section 2.3 we apply the GAM approach to model Mth and the submodels Mh and Mt. We first apply the model on two examples in Sections 3.1 Example 1: Harvest mouse data, 3.2 Example 2: Mountain Pygmy Possum data and the problems encountered there motivates the development of our robust population size estimator in Section 4. We revisit the second example in Section 4.3 and to validate the new approaches we conduct some simulations in Section 5. Some discussion is given in Section 6 and some technical results are in Appendix A Fitting GAMs, Appendix B Variances of the robust estimators, Appendix D Figures.

Section snippets

Notation

Consider a population of unknown size N and a capture–recapture experiment conducted over capture occasions labeled j=1,,τ. The population is supposed closed over the course of the experiment. The individuals in the population are assumed to behave independently of each other. Let Yij be the indicator that the ith individual has been caught on the jth occasion and Ci take the value 1 if the ith individual has been captured at least once and 0 otherwise. Let D=i=1NCi be the number of

Example 1: Harvest mouse data

We first illustrate the use of GAMs under model Mth using capture–recapture data collected on the Harvest mouse (Micromys minutus). The experiment was conducted at Wulin Recreation Area in Shei-Pa National Park, Taiwan in the summer of 2008. The mice were trapped using Sherman traps in 5×20 m grids where traps were set 10 meters apart. Over the τ=14 weekly capture occasions, D=142 individuals were captured at least once. Individuals were weighed (g), sexed and hindfoot measurements (mm) were

Robust population size estimates

To develop robust estimators we need to model the contamination. Note that our interest is in the situation where there is contamination on the observed or estimated value of the capture probability. That is, the captures are according to a model with the true probability but we observe some possibly unrelated value rather than a surrogate. This may be an artifact due to poor modeling of the estimated capture probability as a function of the covariate or may arise from errors in measuring the

GAMs

We first examined the performance of GAMs using univariate continuous and categorical covariates. We compared results with parametric linear and quadratic GLMs and the semiparametric local polynomials (LP) models of Hwang and Huggins (2007). Since Mth LP models have not yet been developed, we only considered the submodel Mh. We followed a similar simulation design as Hwang and Huggins (2007) where we considered a population size of N=100 with τ=7 capture occasions and generated the continuous

Discussion

We have demonstrated that GAMs may be relatively easily fitted to capture–recapture data when the capture probabilities are functions of multiple continuous/categorical covariates and time effects. In developing these methods we noted that the H–T estimator can result in unreasonable estimates of the population size in the presence of contamination that can arise from outliers, measurement error or inappropriate modeling. This can occur in parametric and nonparametric models. We developed a

Acknowledgments

The authors would like to thank Dr. Sheng-Hai Wu of the Institute of Statistics, National Chung Hsing University for the use of the Harvest mouse mark–recapture data. They would also like to thank the referees for comments that helped to clarify the exposition.

References (40)

  • B.D. Marx et al.

    Direct generalized additive modeling with penalized likelihood

    Computational Statistics & Data Analysis

    (1998)
  • V.M.R. Muggeo et al.

    Fitting generalized linear models with unspecified link function: a P-spline approach

    Computational Statistics & Data Analysis

    (2008)
  • E.N. Zwane et al.

    Semiparametric models for capture–recapture studies with covariates

    Computational Statistics & Data Analysis

    (2004)
  • A. Agresti

    Simple capture–recapture models permitting unequal catchability and variable sampling effort

    Biometrics

    (1994)
  • E. Cantoni et al.

    Robust inference for generalized linear models

    Journal of the American Statistical Association

    (2001)
  • R.J. Carroll et al.

    Generalized partially linear single-index models

    Journal of the American Statistical Association

    (1997)
  • R.L. Chambers

    Outlier robust finite population estimation

    Journal of the American Statistical Association

    (1986)
  • A. Chao et al.

    Estimating population size for capture–recapture data when capture probabilities vary by time and individual animal

    Biometrics

    (1992)
  • S.X. Chen et al.

    A non parametric approach to the analysis of two-stage mark-recapture experiments

    Biometrika

    (2000)
  • J.N. Darroch

    The multiple–recapture census: I. Estimation of a closed population

    Biometrika

    (1958)
  • C. de Boor

    A Practical Guide to Splines

    (2001)
  • P.H.C. Eilers et al.

    Flexible smoothing with B-splines and penalties

    Statistical Science

    (1996)
  • Ghosh, M., 2008. Robust estimation in finite population sampling. In: Balakrishnan, N., Peña, E.A., Silvapulle, M.J....
  • F. Hample

    The influence curve and its role in robust estimation

    Journal of the American Statistical Association

    (1974)
  • F.R. Hampel et al.

    Robust Statistics: The Approach Based on Influence Functions

    (1986)
  • T. Hastie et al.

    Generalized Additive Models

    (1990)
  • D. Heinze et al.

    A review of the ecology and conservation of the mountain pygmy-possum Burramys parvus

  • R.V. Hogg

    Adaptive robust procedures: a partial review and some suggestions for future applications and theory

    Journal of the American Statistical Association

    (1974)
  • D. Horvitz et al.

    A generalization of sampling without replacement from a finite universe

    Journal of the American Statistical Association

    (1952)
  • P.J. Huber

    Robust Statistics

    (1981)
  • Cited by (17)

    • Estimating population size of heterogeneous populations with large data sets and a large number of parameters

      2019, Computational Statistics and Data Analysis
      Citation Excerpt :

      Zwane and van der Heijden (2004) developed log-linear models that used penalized splines to express dependence among continuous covariates. Gimenez et al. (2006), Hwang and Huggins (2011) and Stoklosa and Huggins (2012) proposed nonparametric and semiparametric regression methods for estimating capture probabilities in capture–recapture models. Another recent study that uses similar CRDA data and open population models that allows for covariates is Stoklosa et al. (2016).

    • Accounting for contamination and outliers in covariates for open population capture-recapture models

      2016, Journal of Statistical Planning and Inference
      Citation Excerpt :

      In this paper a robust MA-type open population size estimator is proposed. This can also be seen as an extension to the work of Stoklosa and Huggins (2012b) but in the open population setting. We found that the theory and implementation of robust statistics were easily transferable since the methods of Stoklosa and Huggins (2012b) used a similar Horvitz–Thompson type estimator.

    • Special issue on robust analysis of complex data

      2013, Computational Statistics and Data Analysis
    • Doubly Robust Capture-Recapture Methods for Estimating Population Size

      2023, Journal of the American Statistical Association
    View all citing articles on Scopus
    View full text