Some properties of a nonparametric estimator of the size of an open population

https://doi.org/10.1016/j.jspi.2006.12.004Get rights and content

Abstract

It has been demonstrated in the literature that local polynomial models may be used to estimate the size of an open population using capture–recapture data. However, very little is known about their properties. Here we develop a setting in which the properties of nonparametric estimators of the size of an open population using capture–recapture data can be examined and establish conditions under which expressions for the bias and variance may be determined.

Introduction

Local polynomial models are used extensively in nonparametric regression (Fan and Gijbels, 1996). Their use in estimating the size of an open population using capture–recapture data follows from Huggins and Yip (1999). Their approach was based on martingale estimating equations and extended the well-known Jolly–Seber estimators (Seber, 1982) by giving smooth estimates of both the numbers of marked individuals in the population and the population size. These estimators arose from applying kernel smoothing methods to the closed population martingale estimators of Yip (1993). For closed populations, there is a population of size N and capture occasions j=1,,J upon which individuals in the population can be captured. On each occasion the captured individuals that have not been previously captured are marked and the marks of the recaptured individuals are noted. Thus, for each individual that was captured at least once, the raw data consists of the occasions on which they were captured. Let nj denote the number of individuals captured on occasion j, mj the number of these that had been previously marked and Mj the known number of marked individuals in the population just prior to occasion j. Under the assumptions the capture probabilities are homogeneous across the population on each occasion, given nj, Mj and N, mj has a hypergeometric distribution so that E(mjN,nj,Mj)=Mjnj/N. This gives rise to the martingale estimating equations, j=1J(mjN-Mjnj)=0 and a simple closed form estimator for N. In an open population, with population size Nj on occasion j, Huggins and Yip (1999) developed estimating equations for Nj of the form k=1Jwj(k)(mkNj-Mknk)=0, where Nj was supposed to be locally constant, i.e. a polynomial of degree zero and wj(k) are kernel weight functions. Subsequently Huggins et al. (2003) extended the approach to sample coverage estimators to relax the equal catch-ability assumptions of Huggins and Yip (1999), and Yang and Huggins (2003) and Yang et al. (2003) further extended the models to local polynomial models, but relied on bootstrap procedures to estimate standard errors. More recently, Huggins (2006) gave expressions for the standard errors in the semi-parametric case and verified them in simulations but only outlined their derivation. These previous articles have demonstrated the utility of the method and motivate a deeper examination of the properties of the estimators. In this note we begin the formal derivation of the properties of the estimators. We return to the simpler non-parametric estimators of Huggins and Yip (1999) and further simplify the setting by supposing the number of marked animals is observable.

Here we develop a setting for capture–recapture experiments where we can determine conditions under which the large sample properties of the local polynomial estimators of population size may be derived and the rates of convergence can be examined. Local polynomial models involve the order p of the polynomial and a bandwidth h, and the bias and variance depend on both these quantities. When local polynomials are applied to regression models the bias terms are O(hp+1) for odd p, O(hp+2) for even p and the variance is O(h-1) (Fan and Gijbels, 1996) giving the familiar trade off between the bias and variance. These results require that h0, which is only of interest if the design points become dense. In nonparametric regression of Y on x this can be achieved by assuming we observe independent pairs (xi,Yi), i=1,,n, where the xi are independently and identically distributed with some common density f(.). Thus, to justify the use of local polynomial models in capture–recapture experiments, it is necessary to develop an asymptotic setting where the capture occasions become dense over the time period in which the experiment is conducted. The results obtained here are perhaps of less practical importance but are important in understanding the procedures and how standard errors may be derived in more complex situations.

In developing the asymptotic properties of a population size estimator from capture–recapture data collected at discrete capture occasions there are three factors to consider. The population size N, the time between capture occasions, δ and the length of time over which the experiment is conducted, τ. In traditional experiments for closed populations with τ and δ fixed it is implicit in the development of the asymptotic properties and the associated approximating distributions that N. Sometimes this is made explicit (Darroch, 1958, Huggins, 1989). Keeping N and τ fixed and letting δ0 approximates the continuous time counting process considered by Becker (1984). To the author's knowledge, keeping N and δ fixed and letting τ increase has not been explicitly studied although clearly for closed populations the number of animals sighted in (0,τ] will tend to N as τ. In open populations with immigration this situation may be more interesting.

We obtain our asymptotic results by considering a sequence of populations of increasing size (N) and allow the capture occasions to become closer together (δ0) for fixed τ. We also require that the probability of capture is proportional to δ. This is natural if the traps are always deployed but inspected at regular intervals. This situation could occur if trapping were conducted over a larger and larger area with increasingly frequent inspections of the traps.

In Section 2 we give the model and assumptions, in Section 3 the estimators are given and we derive expressions for their bias and variance. Section 4 contains some discussion of our results.

Section snippets

Model and assumptions

Consider a sequence of capture–recapture experiments over the time interval [0,τ]. Let Nr(t) denote the size of the population at time t in the rth experiment and suppose Nr(t) as r. For a given r, we take the point of view that the population size is fixed and that the sample space consists of the outcomes of capture–recapture experiments on this population. We suppose the capture experiment on the rth population consists of captures at the equally spaced capture occasions 0tr(1)<<tr(J(r)

The estimators and their properties

Let W(t) be a symmetric kernel function with support [-1,1], wt(r,j)=h(r)-1W{(tr(j)-t)/h(r)}, C1(z)=(1,z,,zp)T, C2(z) be a (p+1)×(p+1) matrix with i,j element zi+j-2, S==-KKW(/K)C2(/K) and Sp+k==-KKW(/K)C1(/K)(/K)p+k. For a given t and bandwidth h(r), weighted estimating equations for γr(t) arej=1J(r)wt(r,j)Gp,h(tr(j)-t){Gp,h(tr(j)-t)Tγr(t)mr(j)-Mr(j)nr(j)}=0.This yields the estimatorsγ^r(t)=A1(r,t)-1A2(r,t)and N^r(t)=eTγ^r(t), whereA1(r,t)=j=1J(r)wt(r,j)C2{(tr(j)-t)/h}mr(j)andA2(r,t

Discussion

We have given conditions under which the local polynomial estimator for the size of a population for a simple model that assumes homogeneous capture probabilities has analogues of the classical bias and variance expressions when the number of marked individuals is known: the bias in estimating Nr(t) by N^r(t)=γ^t0 is O(Nr(t)h(r)2) for p=0, for odd p it is O(Nr(t)h(r)p+1), for even p it is O(Nr(t)h(r)p+2) and its variance is O(Nr(t)h-1). To obtain these results we have supposed that the

Acknowledgement

The author is grateful to two referees for comments that helped clarify the exposition.

References (12)

There are more references available in the full text version of this article.

Cited by (0)

View full text