Elsevier

Mathematical Biosciences

Volume 255, September 2014, Pages 43-51
Mathematical Biosciences

Semivarying coefficient models for capture–recapture data: Colony size estimation for the little penguin Eudyptula minor

https://doi.org/10.1016/j.mbs.2014.06.014Get rights and content

Highlights

  • We propose a new capture–recapture model to estimate wildlife population sizes.

  • Seasonality that changes yearly is accommodated by a time-varying coefficient model.

  • Simulations were conducted to examine the model performance.

  • The model is fitted to 25 years of real data collected on little penguins.

  • Our results revealed key structures relating to environmental and biological changes.

Abstract

To accommodate seasonal effects that change from year to year into models for the size of an open population we consider a time-varying coefficient model. We fit this model to a capture–recapture data set collected on the little penguin Eudyptula minor in south-eastern Australia over a 25 year period using Jolly–Seber type estimators and nonparametric P-spline techniques. The time-varying coefficient model identified strong changes in the seasonal pattern across the years which we further examined using functional data analysis techniques. To evaluate the methodology we also conducted several simulation studies that incorporate seasonal variation.

Introduction

Understanding the changes in abundance for wildlife populations across time is critical to developing appropriate conservation strategies. These changes may be due to environmental or biotic variables. The size of a population of a species may vary accordingly throughout the year due to factors such as: individual recruitment, individuals moving in and out of the population, and individual mortality (which may occur naturally or be caused by humans). These factors may also be driven by seasonality [1]. Assumptions concerning closed populations are seldom met in nature [2] and immigration/emigration are obvious components of the life histories of many birds [3]. The little penguins Eudyptula minor spend more time ashore during spring and summer due to breeding and moulting and, consequently, the recapture probabilities are susceptible to seasonal effects of a population model. It is known for the little penguin that the timing of the breeding season varies from year to year [4] so that models should allow the seasonal effects to vary according to the year.

In classical parametric models, the model coefficients are usually considered fixed. To allow these coefficient to vary, say over time, Cleveland et al. [5] proposed the varying coefficient model (VCM). These models were further extended by Hastie and Tibshirani [6], and have now become increasingly popular in various applications. Specifically, they are very flexible, easy to construct and are readily interpretable. Indeed, VCMs are natural extensions of classical parametric models, however as Fan and Zhang [7] mention “the varying coefficient models are not stimulated by the desire of purely mathematical extension, rather they come from the need in practice”. Consequently, an extensive literature on these models have been successfully applied in many areas. For example, Cheng et al. [8] investigated infant mortality in China by considering generalized linear models and allowed the coefficients for covariates, such as mother’s age at the birth of the child and urban–rural residence, to depend on time; and Huang et al. [9] analysed longitudinal data collected on homosexual men who were infected by HIV by considering longitudinal models and allowing coefficients for some covariates to be dependent on years after infection.

A special case of VCMs are the semivarying coefficient models (SVCM) arising in Fan and Zhang [10] and Zhang [10], [11] that have some varying and some nonvarying coefficients. Xia et al. [12] applied SVCM to hospital admissions where the coefficients for covariates, such as birth rates, were allowed to vary but seasonal effects were kept fixed; and Huggins et al. [13] have applied SVCM to model numbers of suicides in Hong Kong. Estimation is commonly conducted using kernel smoothing [14] or smoothing splines [6], [15]. Here, we focus on the latter, extending the VCM approach of Marx [15], also called generalized linear additive smooth structures [16], to SVCM and apply it to population models. We use varying coefficients to model the change in seasonal effects from year to year. For example, breeding and moulting which are seasonal, may vary from year to year. Importantly, the SVCM reduces the effective number of parameters in the model because we model the seasonal effects for each month as functions of time, rather than estimate a separate effect for each year.

To make inferences on the size of a population size over time, a capture–recapture (CR) experiment conducted on individuals from the population may be undertaken. In these experiments, individuals are uniquely marked on first capture and records are noted if the individuals are seen or not seen again over a fixed number of visits or capture occasions. There are many methods for estimating the population size from CR data, one of the most common being the Jolly–Seber model [JSM, [17], [18]] which gives closed form estimators for the number of marked individuals and the population size within the study area. These models consider an open population, which permit the processes of immigration, emigration, birth and death to occur over time. Several key assumptions are also made: all individuals have equal catchabilities, all individuals have equal survival rates, all emigration from the population is permanent, marked individuals do not lose their marks/tags and sampling periods are short [2].

A number of modifications and variants within the JSM framework have now been developed, these include: Pollock’s Robust design [19], [20], martingale estimators [21], [22], unequal catchabilities [23] and a more refined approach to handle birth process [24]. More recent extensions of the JSM include nonparametric and semiparametric models, which explicitly model the population size as functions of time and monthly effects [25], [26], and functions of covariates – such as sea-surface temperature, in addition to time [27].

We are directly motivated by observations on a colony of little penguins located at Summerland Beach on the western end of Phillip Island, Victoria, Australia. There are approximately 32,000 breeding penguins living on the island [28]. Little penguins are inshore feeders [29], [30], spending much of their time at sea searching for food and coming ashore for relatively short periods throughout the year except during moulting when they are ashore for up to three weeks [31]. When ashore they are vulnerable to introduced mammalian predators [32], and their numbers ashore are believed to be highest during breeding and moulting [33]. Pilchards Sardinops sagax were a major source of food until an extensive mortality of pilchards across southern Australia [34], [35] resulted in a switch in diet to other prey [36] with increased anchovy Engraulis australis presence in some years [35].

We consider a CR data set collected on adult little penguins, consisting of monthly captures from years 1985 to 2009. The first record was taken at 09/01/1985 and the last at 29/12/2009, yielding a total of 9118 uniquely marked individuals on 300 capture occasions. Using these data, we plot the JSM estimates [37] for: (a) the number of marked; and (b) the colony size with 95% confidence bands across each occasion in Fig. 1. Some seasonal peaks are evident here, however a clear interpretation of the seasonality cannot be made using these models. The drop in colony size around 1996 is likely to be due to higher adult mortality following the mass mortality of pilchards in 1995 [34], and the sudden drop at the end of the study is likely to be an artefact of not picking up some of the birds that were missed in 2009 but may have picked up in 2010 or 2011 if the study had continued.

The start of the breeding season (i.e., egg-laying) for little penguins is usually in spring [33], [38] and moulting occurs in late summer to mid autumn [31] but it has been observed that the timing of breeding and moulting seasons can be quite variable between years [4], [31]. Johannesen et al. [39] recorded seasonal variation in survival, weights, and population counts of little penguins in Otago, New Zealand. A consequence of this variability is that modelling seasonal effects on the population/colony size over a long time span becomes difficult; the breeding and moulting seasons may not be strictly related to calendar time. To model the seasonality and obtain meaningful estimates using open population models we extend the semiparametric models of Huggins and Stoklosa [27] to SVCMs, as described above. Inference is conducted using flexible P-splines techniques [40] as in Marx [15]. For further details on P-splines see Eilers and Marx [41]. This SVCM framework has several advantages over the recent works of Huggins and Stoklosa [27]:

  • firstly, a flexible modelling approach is considered for the seasonal effects – i.e., we now allow seasonal effects to vary from year to year;

  • secondly, the varying seasonality effects can be easily interpreted using functional data analysis (FDA) techniques [42], [43]; and

  • thirdly, the proposed approach helps clarify the information given by the regression plots (which are a common means of presenting the nonparametric estimates) – i.e., we can (visually) identify key structures in the data.

In order to present our proposed methods, we first review the semiparametric JSM of Huggins and Stoklosa [27] in Section 2, and discuss a method of accounting for temporary emigration in Section 2.2.2. A preliminary data analysis on the little penguin data is conducted in Section 2.3.2. We then extend the semiparametric JSM to the proposed SVCM in Section 3. Simulations are conducted in Section 4 and we apply the SVCM models to the little penguin data in Section 5. Some discussion is given in Section 6. We used the R-software [44] to conduct our simulations and data analysis.

Section snippets

Notation

As our motivating data were collected at monthly intervals we consider t=1,,τ monthly capture occasions. Let i=1,,I denote the years and k=1,,12 denote the months, such that for each t there is a corresponding i and k. Although the little penguin data was collected from a single colony, the models given below can be used to estimate population sizes. Thus we will use population when describing our models in a general framework, and colony when considering the little penguin data.

Let pt be

Semivarying coefficient Jolly–Seber models

In the SVCM we extend the SPM and take:Nik=XN(i)γ+αk(i)where αk(i) being functions of the years i=1,,I and we set α1(i)=0 for identifiability. That is, αk(i) is the seasonal effect corresponding to month k in year i which is the varying coefficient component of the model. Similar to the models above, we model αk(i) as a B-spline. For k=2,,12, write δk=(δ1T,,δqT)T and take αk(i)=Y(i)δk where Y is a B-spline basis for i=1,,I with ith row Y(i). We now write:

  • Semivarying Coefficient Model (SVCM):

Simulations

To examine the performance of the SVCM estimators and the standard error formulae (as given in Web Appendix A2), we conducted two simulation studies similar to Huggins [26] and Huggins and Stoklosa [27]. We considered a 10 year experiment with monthly capture occasions so that t=1,,τ=120. Throughout the entire experiment we set the birth/immigration rate equal to deaths/permanent emigration rate i.e., for each individual removed from the population, a new individual enters. To include temporary

Case study: little penguin data set

We return to the analysis in Section 2.3.2 by now fitting the SVCM to the little penguin data set. We used the same number of knots and difference penalties as in Section 2.3.2 for marked and long-term trends, with the number of knots set to 20 and a difference penalty order set to d=4 for the varying coefficient component. The AIC values, reported in Table 1, indicated that the SVCM gave a better fit to the data compared with the SPM but a slightly poorer fit than the NPM. From Fig. 2, the

Discussion

We focused on extending the approach of Huggins and Stoklosa [27] by considering a SVCM which gave the regressors a modified effect. Our approach differs from previous studies that also model seasonality; each having their advantages and disadvantages. Yang et al. [33] used a nonparametric time series decomposition model which allows for greater flexibility when considering a full time series structure, for example, lag effects can be incorporated in their model, etc.; and Johannesen et al. [39]

Acknowledgements

We would like thank the handling editor and two anonymous referees for their helpful comments in an early version of this article. We would also like to thank the Penguin Study Group for collecting the data and Leanne Renwick, Marg Healey and Roz Jessop for maintaining the database. The penguins were handled under permits from the Department of Sustainability and Environment in Victoria. The Australian Bird and Bat Banding Schemes provided the banding permit and the flipper bands. The first

References (49)

  • W. Zhang

    Local polynomial fitting in semivarying coefficient model

    J. Multivariate Anal.

    (2002)
  • P.S.F. Yip

    Statistical inference procedure for a hypergeometric model for capture–recapture experiment

    Appl. Math. Comput.tion

    (1993)
  • P.A. Parsons

    The Evolutionary Biology of Colonizing Species

    (1983)
  • S.C. Amstrup et al.

    Handbook of Capture–Recapture Analysis

    (2005)
  • J.B. Hestbeck et al.

    Estimates of movement and site fidelity using mark-resight data of wintering Canada geese

    Ecology

    (1991)
  • P.N. Reilly et al.

    The Little Penguin Eudyptula minor in Victoria, II: Breeding

    Emu

    (1981)
  • W.S. Cleveland et al.

    Local regression models

  • T. Hastie et al.

    Varying-coefficient models

    J. Roy. Stat. Soc., Ser. B (Methodological)

    (1986)
  • J. Fan et al.

    Statistical methods with varying coefficient models

    Stat. Interface

    (2008)
  • M.-Y. Cheng et al.

    Statistical estimation in generalized multiparameter likelihood models

    J. Am. Stat. Assoc.

    (2009)
  • J.Z. Huang et al.

    Polynomial spline estimation and inference for varying coefficient models with longitudinal data

    Stat. Sinica

    (2004)
  • J. Fan et al.

    Simultaneous confidence bands and hypothesis testing in varying-coefficient models

    Scand. J. Stat.

    (2000)
  • Y. Xia et al.

    Efficient estimation for semivarying-coefficient models

    Biometrika

    (2004)
  • R.M. Huggins et al.

    Application of additive semivarying coefficient models: monthly suicide data from Hong Kong

    Biometrics

    (2007)
  • J. Fan et al.

    Statistical estimation in varying coefficient models

    Ann. Stat.

    (1999)
  • B.D. Marx

    P-spline varying coefficient models for complex data

  • P.H.C. Eilers et al.

    Generalized linear additive smooth structures

    J. Comput. Graph. Stat.

    (2002)
  • G.M. Jolly

    Explicit estimates from capture–recapture data with both death and immigration-stochastic model

    Biometrika

    (1965)
  • G.A.F. Seber

    The Estimation of Animal Abundance

    (1982)
  • K.H. Pollock

    A capture–recapture design robust to unequal probability of capture

    J. Wildlife Manage.

    (1982)
  • W.L. Kendall et al.

    A likelihood-based approach to capture–recapture estimation of demographic parameters under the robust design

    Biometrics

    (1995)
  • P.S.F. Yip

    A martingale estimating equation for a capture–recapture experiment in discrete time

    Biometrics

    (1991)
  • W.D. Hwang et al.

    Quantifying the effects of unequal catchabilities on Jolly–Seber estimators via sample coverage

    Biometrics

    (1995)
  • C.J. Schwarz et al.

    A general methodology for the analysis of capture–recapture experiments in open populations

    Biometrics

    (1996)
  • Cited by (0)

    View full text