On the performance of some non-parametric estimators of the conditional survival function with interval-censored data

https://doi.org/10.1016/j.csda.2011.06.027Get rights and content

Abstract

Simple nonparametric estimators of the conditional distribution of a response variable given a continuous covariate are often useful in survival analysis. Since a few nonparametric estimation options are available, a comparison of the performance of these options may be of value to determine which approach to use in a given application. In this note, we compare various nonparametric estimators of the conditional survival function when the response is subject to interval- and right-censoring. The estimators considered are a generalization of Turnbull’s estimator proposed by Dehghan and Duchesne (2011) and two nonparametric estimators for complete or right-censored data used in conjunction with imputation methods, namely the Nadaraya–Watson and generalized Kaplan–Meier estimators. We study the finite sample integrated mean squared error properties of all these estimators by simulation and compare them to a semi-parametric estimator. We propose a rule-of-thumb based on simple sample summary statistics to choose the most appropriate among these estimators in practice.

Highlights

► Three nonparametric estimators of the conditional survival function are compared. ► All three estimators can handle an interval-censored response variable. ► A rule-of-thumb is provided to help decide which estimator to choose when. ► Generally the generalized Turnbull estimator exhibits better finite sample properties.

Introduction

Simple nonparametric estimators of the conditional survival function of a random variable (response) given the value of a covariate are often useful to investigate the relationship between these two variables, and especially so when the response is a time to event. Indeed, the analysis of failure time data often implies that at least one of the following standard questions will have to be addressed: estimation of survival functions, comparison of survival functions and assessment of the effect of covariates on the distribution of the time to failure. A model-free estimate of the conditional survival function could therefore be of value in a wide range of applications, as it can be used to validate or to suggest a specific (semi-) parametric form for a model that can then be used to carry out the required inferences with adequate power and efficiency.

When the value of the response variable is observed exactly, a vast array of non-parametric inference methods are available. Unfortunately, failure times are rarely observed completely. Let T=(T1,,Tn) be n values of the (positive) failure times of interest. In this paper we are mainly interested in cases where the failure time T is not always observed exactly, but may be right-censored or only known up to an interval, say (L,R). More precisely, our goal is to compare the performance of a few algorithms that can be used to estimate the conditional survival function of T given the value of a continuous covariate Z, say S(t|z)=P(T>t|Z=z), when T is subject to such censoring schemes. Recently, Dehghan and Duchesne (2011) proposed a simple, completely nonparametric estimator of the conditional survival function that is a generalization of the generalized Kaplan–Meier (Beran, 1981) and Turnbull (1976) estimators that can handle mixed case censoring. But it is also simple to think of other estimators of S(t|z) that can be obtained by using imputation methods in conjunction with nonparametric estimators of the conditional survival function defined for complete or right-censored data.

The purpose of this note is to compare these options by simulation to find out which one works best and when, and also to compare them to a “benchmark” semi-parametric estimator. Note that we only consider estimators of S(t|z) that are step functions in t. Such estimators are typically used in early data exploration stages or for (semi-)parametric model selection purposes, e.g., Sun (2006, Chap. 2, 5). Other authors, for instance Leconte et al. (2002) or Pan (2000), have studied estimators that are smooth in both t and z. The paper is organized as follows. In Section 2, we introduce the notation and assumptions and we describe the estimation methods that will be considered in the simulation study. The finite sample properties of the estimators are studied and compared by simulation in Section 3, and practical guidelines to select the best option are given. We use the estimators in an analysis of HIV data in Section 4 and Section 5 concludes with a discussion and ideas for future research.

Section snippets

Notation and assumptions

Let {(Li,Ri),i=1,,n} be the observed intervals that contain the values of the true failure times {Ti,i=1,,n} and let Z=(Z1,,Zn) be the n corresponding observed values of a continuous covariate. We assume that the Zi are observed exactly, but that the values of the Ti are subject to censoring, i.e., we do not observe Ti but know that its value lies in an interval (Li,Ri). We consider two different types of samples: samples subject to pure interval-censoring with 0<Li<Ri<+ for all i=1,,n,

Simulation study

This section reports the results of the simulation study that we conducted to investigate the performance of the nonparametric estimators of the conditional survival function. Although any symmetric kernel function could have been used, in our simulations we only considered the Gaussian kernel, Kh(u)=(2πh2)1/2exp{u2/(2h2)}, uR. As for the value of the bandwidth parameter h, for each estimator and for each simulation scenario, pilot simulations were run to find the value of the bandwidth

Example: HIV study

As an illustration of the rule-of-thumb (10), we use the estimators considered in this paper to analyze a dataset that had originally been considered by Saïd (1998), which is a subset of the data from the study of Laga et al. (1994). In this study, 731 female sex workers from Kinshasa (Democratic Republic of the Congo) were tested on an approximately quarterly basis for HIV. Initially, all women were HIV negative and 67 of them tested positive over the course of the study. Given the study

Discussion

Our main objective in this paper was to compare simple non-parametric estimators of the conditional survival function to identify which one performs best under various schemes of interval-censoring for the response variable. Under pure interval-censoring, our study yielded some pretty clear conclusions: the GT estimator proposed by Dehghan and Duchesne (2011) performs best when the censoring intervals are wide, the GKM or NW estimators with imputation perform best when the censoring intervals

Acknowledgments

We express ours thanks to the Natural Sciences and Engineering Research Council of Canada and to the Ministry of Science of Iran for their financial support. Comments from an Associate Editor and three anonymous referees enabled us to significantly improve this manuscript.

References (14)

  • M. Laga et al.

    Condom promotion, sexually transmitted diseases treatment, and declining incidence of HIV-1 infection in female Zairian sex workers

    Lancet

    (1994)
  • Beran, R., 1981. Nonparametic regression with randomly censored survival data. Technical Report, Department of...
  • D.M. Dabrowska

    Non-parametric regression with censored survival-time data

    Scand. J. Stat.

    (1987)
  • D.M. Dabrowska

    Uniform consistency of the kernel conditional Kaplan–Meier estimate

    Ann. Stat.

    (1989)
  • M.H. Dehghan et al.

    A generalization of Turnbull’s estimator for nonparametric estimation of the conditional survival function with interval-censored data

    Lifetime Data Anal.

    (2011)
  • Henschel, V., Heiss, C., Mansmann, U., 2009. Iterated convex minorant algorithm for interval censored event data. R...
  • E. Leconte et al.

    Smooth conditional distribution function and quantiles under random censorship

    Liftime Data Anal.

    (2002)
There are more references available in the full text version of this article.

Cited by (7)

View all citing articles on Scopus
View full text