Some properties of a nonparametric estimator of the size of an open population

doi:10.1016/j.jspi.2006.12.004

Journal of Statistical Planning and Inference

Volume 138, Issue 3, 1 March 2008, Pages 618-623

https://doi.org/10.1016/j.jspi.2006.12.004 Get rights and content

Abstract

It has been demonstrated in the literature that local polynomial models may be used to estimate the size of an open population using capture–recapture data. However, very little is known about their properties. Here we develop a setting in which the properties of nonparametric estimators of the size of an open population using capture–recapture data can be examined and establish conditions under which expressions for the bias and variance may be determined.

Introduction

Local polynomial models are used extensively in nonparametric regression (Fan and Gijbels, 1996). Their use in estimating the size of an open population using capture–recapture data follows from Huggins and Yip (1999). Their approach was based on martingale estimating equations and extended the well-known Jolly–Seber estimators (Seber, 1982) by giving smooth estimates of both the numbers of marked individuals in the population and the population size. These estimators arose from applying kernel smoothing methods to the closed population martingale estimators of Yip (1993). For closed populations, there is a population of size N and capture occasions $j = 1, \dots, J$ upon which individuals in the population can be captured. On each occasion the captured individuals that have not been previously captured are marked and the marks of the recaptured individuals are noted. Thus, for each individual that was captured at least once, the raw data consists of the occasions on which they were captured. Let $n_{j}$ denote the number of individuals captured on occasion j, $m_{j}$ the number of these that had been previously marked and $M_{j}$ the known number of marked individuals in the population just prior to occasion j. Under the assumptions the capture probabilities are homogeneous across the population on each occasion, given $n_{j}$ , $M_{j}$ and N, $m_{j}$ has a hypergeometric distribution so that $E (m_{j} ∣ N, n_{j}, M_{j}) = M_{j} n_{j} / N$ . This gives rise to the martingale estimating equations, $\sum_{j = 1}^{J} (m_{j} N - M_{j} n_{j}) = 0$ and a simple closed form estimator for N. In an open population, with population size $N_{j}$ on occasion j, Huggins and Yip (1999) developed estimating equations for $N_{j}$ of the form $\sum_{k = 1}^{J} w_{j} (k) (m_{k} N_{j} - M_{k} n_{k}) = 0$ , where $N_{j}$ was supposed to be locally constant, i.e. a polynomial of degree zero and $w_{j} (k)$ are kernel weight functions. Subsequently Huggins et al. (2003) extended the approach to sample coverage estimators to relax the equal catch-ability assumptions of Huggins and Yip (1999), and Yang and Huggins (2003) and Yang et al. (2003) further extended the models to local polynomial models, but relied on bootstrap procedures to estimate standard errors. More recently, Huggins (2006) gave expressions for the standard errors in the semi-parametric case and verified them in simulations but only outlined their derivation. These previous articles have demonstrated the utility of the method and motivate a deeper examination of the properties of the estimators. In this note we begin the formal derivation of the properties of the estimators. We return to the simpler non-parametric estimators of Huggins and Yip (1999) and further simplify the setting by supposing the number of marked animals is observable.

Here we develop a setting for capture–recapture experiments where we can determine conditions under which the large sample properties of the local polynomial estimators of population size may be derived and the rates of convergence can be examined. Local polynomial models involve the order p of the polynomial and a bandwidth h, and the bias and variance depend on both these quantities. When local polynomials are applied to regression models the bias terms are $O (h^{p + 1})$ for odd p, $O (h^{p + 2})$ for even p and the variance is $O (h^{- 1})$ (Fan and Gijbels, 1996) giving the familiar trade off between the bias and variance. These results require that $h \to 0$ , which is only of interest if the design points become dense. In nonparametric regression of Y on x this can be achieved by assuming we observe independent pairs $(x_{i}, Y_{i})$ , $i = 1, \dots, n$ , where the $x_{i}$ are independently and identically distributed with some common density $f (.)$ . Thus, to justify the use of local polynomial models in capture–recapture experiments, it is necessary to develop an asymptotic setting where the capture occasions become dense over the time period in which the experiment is conducted. The results obtained here are perhaps of less practical importance but are important in understanding the procedures and how standard errors may be derived in more complex situations.

In developing the asymptotic properties of a population size estimator from capture–recapture data collected at discrete capture occasions there are three factors to consider. The population size N, the time between capture occasions, $δ$ and the length of time over which the experiment is conducted, $τ$ . In traditional experiments for closed populations with $τ$ and $δ$ fixed it is implicit in the development of the asymptotic properties and the associated approximating distributions that $N \to \infty$ . Sometimes this is made explicit (Darroch, 1958, Huggins, 1989). Keeping N and $τ$ fixed and letting $δ \to 0$ approximates the continuous time counting process considered by Becker (1984). To the author's knowledge, keeping N and $δ$ fixed and letting $τ$ increase has not been explicitly studied although clearly for closed populations the number of animals sighted in $(0, τ]$ will tend to N as $τ \to \infty$ . In open populations with immigration this situation may be more interesting.

We obtain our asymptotic results by considering a sequence of populations of increasing size ( $N \to \infty$ ) and allow the capture occasions to become closer together ( $δ \to 0$ ) for fixed $τ$ . We also require that the probability of capture is proportional to $δ$ . This is natural if the traps are always deployed but inspected at regular intervals. This situation could occur if trapping were conducted over a larger and larger area with increasingly frequent inspections of the traps.

In Section 2 we give the model and assumptions, in Section 3 the estimators are given and we derive expressions for their bias and variance. Section 4 contains some discussion of our results.

Section snippets

Model and assumptions

Consider a sequence of capture–recapture experiments over the time interval $[0, τ]$ . Let $N_{r} (t)$ denote the size of the population at time t in the rth experiment and suppose $N_{r} (t) \to \infty$ as $r \to \infty$ . For a given r, we take the point of view that the population size is fixed and that the sample space consists of the outcomes of capture–recapture experiments on this population. We suppose the capture experiment on the rth population consists of captures at the equally spaced capture occasions $0 ⩽ t_{r} (1) < \dots < t_{r} (J (r)$

The estimators and their properties

Let $W (t)$ be a symmetric kernel function with support $[- 1, 1]$ , $w_{t} (r, j) = h (r)^{- 1} W {(t_{r} (j) - t) / h (r)}$ , $C_{1} (z) = (1, z, \dots, z^{p})^{T}$ , $C_{2} (z)$ be a $(p + 1) \times (p + 1)$ matrix with $i, j$ element $z^{i + j - 2}$ , $S = \sum_{ℓ = - K}^{K} W (ℓ / K) C_{2} (ℓ / K)$ and $S_{p + k} = \sum_{ℓ = - K}^{K} W (ℓ / K) C_{1} (ℓ / K) (ℓ / K)^{p + k}$ . For a given t and bandwidth $h (r)$ , weighted estimating equations for $γ_{r} (t)$ are $\sum_{j = 1}^{J (r)} w_{t} (r, j) G_{p, h} (t_{r} (j) - t) {G_{p, h} (t_{r} (j) - t)^{T} γ_{r} (t) m_{r} (j) - M_{r} (j) n_{r} (j)} = 0 .$ This yields the estimators ${\hat{γ}}_{r} (t) = A_{1} (r, t)^{- 1} A_{2} (r, t)$ and ${\hat{N}}_{r} (t) = e^{T} {\hat{γ}}_{r} (t)$ , where $A_{1} (r, t) = \sum_{j = 1}^{J (r)} w_{t} (r, j) C_{2} {(t_{r} (j) - t) / h} m_{r} (j)$ and $A_{2} (r, t$

Discussion

We have given conditions under which the local polynomial estimator for the size of a population for a simple model that assumes homogeneous capture probabilities has analogues of the classical bias and variance expressions when the number of marked individuals is known: the bias in estimating $N_{r} (t)$ by ${\hat{N}}_{r} (t) = {\hat{γ}}_{t 0}$ is $O (N_{r} (t) h (r)^{2})$ for $p = 0$ , for odd p it is $O (N_{r} (t) h (r)^{p + 1})$ , for even p it is $O (N_{r} (t) h (r)^{p + 2})$ and its variance is $O (N_{r} (t) h^{- 1})$ . To obtain these results we have supposed that the

Acknowledgement

The author is grateful to two referees for comments that helped clarify the exposition.

References (12)

R.M. Huggins et al.
Population size estimation using local sample coverage for open populations
J. Statist. Plan. Inference
(2003)
P. Yip
Statistical inference procedure for a hypergeometric model for capture–recapture experiment
Appl. Math. Comput.
(1993)
N.G. Becker
Estimating population size from capture–recapture experiments in continuous time
Austral. J. Statist.
(1984)
J.N. Darroch
The multiple-recapture census: I. Estimation of a closed population
Biometrika
(1958)
J. Fan et al.
Local Polynomial Modelling and its Applications
(1996)
R.M. Huggins
On the statistical analysis of capture experiments
Biometrika
(1989)

There are more references available in the full text version of this article.

Cited by (0)

View full text

Journal of Statistical Planning and Inference

Some properties of a nonparametric estimator of the size of an open population

Abstract

Introduction

Section snippets

Model and assumptions

The estimators and their properties

Discussion

Acknowledgement

Population size estimation using local sample coverage for open populations

J. Statist. Plan. Inference

Statistical inference procedure for a hypergeometric model for capture–recapture experiment

Appl. Math. Comput.

Estimating population size from capture–recapture experiments in continuous time

Austral. J. Statist.

The multiple-recapture census: I. Estimation of a closed population

Biometrika

Local Polynomial Modelling and its Applications

On the statistical analysis of capture experiments

Biometrika