Elsevier

Statistical Methodology

Volume 29, March 2016, Pages 18-31
Statistical Methodology

Regression analysis of competing risks data with general missing pattern in failure types

https://doi.org/10.1016/j.stamet.2015.09.002Get rights and content

Abstract

In competing risks data, missing failure types (causes) is a very common phenomenon. In a general missing pattern, if a failure type is not observed, one observes a set of possible types containing the true type along with the failure time. Dewanji and Sengupta (2003) considered nonparametric estimation of the cause-specific hazard rates and suggested a Nelson–Aalen type estimator under such general missing pattern. In this work, we deal with the regression problem, in which the cause-specific hazard rates may depend on some covariates, and consider estimation of the regression coefficients and the cause-specific baseline hazards under the general missing pattern using some semi-parametric models. We consider two different proportional hazards type semi-parametric models for our analysis. Simulation studies from both the models are carried out to investigate the finite sample properties of the estimators. We also consider an example from an animal experiment to illustrate our methodology.

Introduction

In survival studies, the failure (or, death) may be attributed to one of several causes or types, known as competing risks. In such situations, for each individual, we observe a random vector (T,J), where T is possibly censored survival time and J represents the cause of death (exactly one of m possible causes, say). However, due to inadequacy in the diagnostic mechanism, often there is uncertainty about the true failure type and so the experimentalists are reluctant to report any specific value of J for some individuals. This is usually known as the problem of missing failure type in competing risks and has been addressed by many authors. For example, in carcinogenicity studies, besides deaths (failures) without tumor, there are deaths (with tumor present) due to either the tumor itself or some other cause. Often there is uncertainty in assigning this cause of death even if the presence of tumor can be ascertained [11], [19]. In extreme situations, one cannot even ascertain presence or absence of tumor because it is totally cannibalized or autolyzed (see Section  6 for details).

Analysis of competing risks data with missing failure types was first considered by Dinse  [12] with the assumption that failure type was either completely available (that is, observed as exactly one of m possible types) or not available at all (that is, unobserved failure type is any one of the m possible types). This problem was subsequently studied by different researchers (see [21], [23], [24], [27]). Goetghebeur and Ryan  [16], [17] considered the regression problem for two failure types under the assumption that the cause-specific hazards for two failure types are proportional. The method of partial likelihood was employed for estimating the regression parameter. See also Dewanji  [9] and Lu and Tsiatis  [22] for similar work.

In the above-mentioned works, missingness meant that no information on failure type was available at all. However, in many contexts, one may be able to narrow down to fewer than m causes to be responsible for failure. In the present work, we consider a general missing pattern. Here, for each individual failure, we observe the survival time and a subset g{1,,m} of labels of possible failure types, exactly one of which is the true but unobserved cause of failure (see Section  6 for an example). When g is a singleton set, the failure type is exactly observed, and when g={1,,m}, the missingness is total. It is usually said that the true failure type is masked in the set g. Flehinger et al.  [13] considered such general pattern of missing failure types for the purpose of estimating survival due to different types, with the strong assumption of proportional cause-specific hazards. They also assumed that, for some of the observations with missing failure type, a second stage diagnosis can be performed to pinpoint the type. Flehinger et al.  [14] considered the same problem using a parametric modeling but without assuming proportional hazards. Craiu and Duchesne  [7] suggested an estimation procedure using EM algorithm based on piecewise constant cause-specific hazard rates. Under a missing-at-random type assumption and requiring a second stage diagnosis, they developed an EM algorithm to estimate the piecewise constant cause-specific hazard rates and the diagnostic probabilities of the actual cause of failure being j, given the set g of observed possible causes. See also Craiu and Reiser  [8]. Dewanji and Sengupta  [10], in addition to suggesting a nonparametric estimator using EM algorithm, developed a Nelson–Aalen type estimator of the cumulative cause-specific hazard rates (and also a smooth estimator of the cause-specific hazard rates), when certain information on the diagnostic probabilities are available from the experimentalists, but the missing pattern could be allowed to be non-ignorable and no second stage diagnosis was required.

In this work, we deal with the regression problem, in which the cause-specific hazard rates may depend on some covariates, and consider estimation of the regression coefficients under some proportional hazards type semi-parametric models, when observation on the failure type exhibits the general missing pattern as discussed before. Recently, Chatterjee et al.  [6] have considered a similar problem in the context of partially observed disease classification data with possibly large number of types. They have suggested a two-stage modeling in which the first stage involves reducing the number of parameters by imposing a natural structure on the underlying disease types and the second stage involves inference through a general extension of the partial likelihood based estimating equation (see [17]). Apparently, however, they need to make certain assumptions regarding the missing probabilities like most of the work on this issue. Also, Sen et al.  [28] have developed a semiparametric Bayesian approach, where the partial information about the cause of death is incorporated by means of latent variables, and proposed a simulation-based method using Markov Chain Monte Carlo (MCMC) techniques to implement the Bayesian methodology.

We also consider estimation of the cumulative baseline hazards in the spirit of Dewanji and Sengupta  [10]. In Section  2, we describe the data and two semi-parametric models to study the effect of covariates. In Section  3, we consider estimation of the regression coefficients and the cumulative baseline cause-specific hazards for the first model. The same is done for the second model in Section  4. Simulation studies for both the models are carried out in Section  5 to investigate the finite sample properties of the estimators. We illustrate our methodology by means of an example of carcinogenicity study conducted by the British Industrial Biological Research Association [25] in Section  6. Some concluding remarks are provided in Section  7.

Section snippets

Data and models

For the m competing causes of failure, the corresponding cause-specific hazard rates, given the covariate value Z=z, are defined as λj(t,z)=limΔt01ΔtPr[T[t,t+Δt),J=j|Tt,z], for j=1,,m, where T denotes the failure time, J the failure type and Z the covariate that may be a vector. The survival function S(t,z), for an individual with covariate value Z=z, can be written in terms of the cause-specific hazard rates as S(t,z)=exp[0tj=1mλj(u,z)du].

Suppose that the data consists of the covariate

Estimation under model 1

Under Model 1, using (3), the hazard rate for failure at time t with g observed as the set of possible failure types, given covariate value z, is given by λg(t,z)=λ0g(t)ezβ, where λ0g(t)=jgpgj(t)λ0j(t), for gG, are unknown functions. Also, these cause-specific hazard rates for the ‘modified’ competing risks problem are of the same semi-parametric form as those for the original cause-specific hazard rates in (4). Hence, the following partial likelihood is the most appropriate for

Estimation under model 2

A similar method can be developed under Model 2. Using (3), the cause-specific hazard rate for failure at time t with g observed as the set of possible failure types, given covariate value z, is given by λg(t,z)=λ0(t)fg(z,t,θ), where fg(z,t,θ)=fg(z,t,γ,β)=jgpgj(t)eγj+zβj, for all gG. These have a semi-parametric form similar to those for the original cause-specific hazard rates in (5), except that the parametric component fg(z,t,θ), for different g’s, are not of the simple exponential

Simulation studies

In order to investigate the finite sample properties of the estimates obtained in Sections  3 Estimation under model 1, 4 Estimation under model 2, we carry out some simulation studies as described in the following two subsections for Models 1 and 2, respectively. For this purpose, we consider m=3 and the sample size n=50, 200 and 500. As in the previous work [10], given the true cause j, the probability of missing failure type is 1p{j}j, which is taken as a constant α(0<α<1), for all j, with

An example

A large animal experiment with a total of 5000 rodents was conducted by the British Industrial Biological Research Association [25] to investigate the carcinogenicity of different nitrosamines administered in drinking water. Gart et al.  [15, p. 58–66], reported details of the data set for the occurrence of pituitary tumors in male rats given N-nitrosodimethylamine (NDMA) in sixteen different concentrations (in ppm) including a control group with 0 ppm. The other fifteen treated groups were

Concluding remarks

In this article, we have considered a general pattern of missingness in failure types while dealing with competing risks data. Without making the usual missing at random assumption, we have discussed the regression problem in which the cause-specific hazard rates may depend on certain covariates. We have considered two proportional hazards type regression models. The estimation of the regression parameters has been carried out using the partial likelihood approach. To estimate the baseline

Acknowledgments

We would like to thank the two anonymous referees for their careful reading and helpful suggestions which have led to much improvement in this paper.

References (28)

  • S.H. Lo

    Estimating a survival function with incomplete cause-of-death data

    J. Multivariate Anal.

    (1991)
  • P.K. Andersen et al.

    Counting process models for life history data: a review

    Scand. J. Stat.

    (1985)
  • P.K. Andersen et al.

    Statistical Models Based on Counting Processes

    (1993)
  • P.K. Andersen et al.

    Cox’s regression model for counting processes: A large sample study

    Ann. Statist.

    (1982)
  • S. Basu

    Inference about the masking probabilities in the competing risks model

    Comm. Statist. Theory Methods

    (2009)
  • B. Bergman, B. Klefsjo, Recent applications of the TTT-plotting technique, in: A.P. Basu, S.K. Basu, S.P. Mukhopadhyay...
  • N. Chatterjee et al.

    Analysis of cohort studies with multivariate and partially observed disease classification data

    Biometrika

    (2010)
  • R.V. Craiu et al.

    Inference based on the EM algorithm for the competing risks model with masked causes of failure

    Biometrika

    (2004)
  • R.V. Craiu et al.

    Inference for the dependent competing risks model with masked causes of failure

    Lifetime Data Anal.

    (2006)
  • A. Dewanji

    A note on a test for competing risks with missing failure type

    Biometrika

    (1992)
  • A. Dewanji et al.

    Estimation of competing risks with general missing pattern in failure types

    Biometrics

    (2003)
  • G.E. Dinse

    Nonparametric prevalence and mortality estimators for animal experiments with incomplete cause of death data

    J. Amer. Statist. Assoc.

    (1986)
  • G.E. Dinse

    Nonparametric estimation for partially-complete time and type of failure data

    Biometrics

    (1982)
  • B.J. Flehinger et al.

    Survival with competing risks and masked causes of failures

    Biometrika

    (1998)
  • Cited by (1)

    View full text