Next Article in Journal
Comparing COSTATIS and Generalized Procrustes Analysis with Multi-Way Public Education Expenditure Data
Next Article in Special Issue
Incorporating a New Summary Statistic into the Min–Max Approach: A Min–Max–Median, Min–Max–IQR Combination of Biomarkers for Maximising the Youden Index
Previous Article in Journal
On Progressive Censored Competing Risks Data: Real Data Application and Simulation Study
Previous Article in Special Issue
Use of Bayesian Markov Chain Monte Carlo Methods to Model Kuwait Medical Genetic Center Data: An Application to Down Syndrome and Mental Retardation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Alternative Promotion Time Cure Model with Overdispersed Number of Competing Causes: An Application to Melanoma Data

by
Diego I. Gallardo
1,
Mário de Castro
2 and
Héctor W. Gómez
3,*
1
Departamento de Matemática, Facultad de Ingeniería, Universidad de Atacama, Copiapó 1530000, Chile
2
Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos 13560-095, Brazil
3
Departamento de Matemática, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(15), 1815; https://doi.org/10.3390/math9151815
Submission received: 3 July 2021 / Revised: 25 July 2021 / Accepted: 28 July 2021 / Published: 31 July 2021
(This article belongs to the Special Issue Applied Medical Statistics: Theory, Computation, Applicability)

Abstract

:
A cure rate model under the competing risks setup is proposed. For the number of competing causes related to the occurrence of the event of interest, we posit the one-parameter Bell distribution, which accommodates overdispersed counts. The model is parameterized in the cure rate, which is linked to covariates. Parameter estimation is based on the maximum likelihood method. Estimates are computed via the EM algorithm. In order to compare different models, a selection criterion for non-nested models is implemented. Results from simulation studies indicate that the estimation method and the model selection criterion have a good performance. A dataset on melanoma is analyzed using the proposed model as well as some models from the literature.

1. Introduction

The usual approach in models for time-to-event data is built on the assumption that all subjects will experience the event of interest if the follow-up time is long enough. In many studies in different areas, there are subjects insusceptible to the event of interest. We call cure rate the proportion of insusceptible subjects. Cure rate models, also known as long-term survival models, are capable to deal with this situation.
Cure rate models have had an intense research activity in last years. From the seminal contributions in [1,2] and the approach based on competing risks for cure rate models in [3], we find many papers in recent years on this research area. The competing risks approach is summarized as follows. Let M be an unobserved variable denoting the initial number of competing causes related to the occurrence of the event of interest. For instance, in cancer studies M represents the number of carcinogenic cells at the end of treatment that can produce a detectable cancer. For M following the Bernoulli or the Poisson distribution, we obtain the models in [2,3], respectively. The time for the j-th competing cause to produce the event of interest (that is, the promotion time) is denoted by Z j , j = 1 , , M . We also assume that, conditional on M, the latent times Z 1 , , Z M are independent and identically distributed with cumulative distribution function (cdf) F ( z ; λ ) and survival function S ( z ; λ ) = 1 F ( z ; λ ) , where λ is the parameter vector. Under a competing risks setup, the time elapsed until the event of interest is given by T = min ( Z 0 , Z 1 , , Z M ) , where Z 0 is a random variable degenerate at + . Our paper is based on the competing risks approach for cure rate models.
Many cure rate models proposed in last years are formulated from different distributions for the number of competing causes M. This research line includes the negative binomial [4], COM-Poisson [5], power series [6], Yule-Simon [7], polylogarithm [8], modified power series [9], zero inflated power series [10] and zero-modified geometric [11] distributions, among many others. In this vein, we adopt the one-parameter Bell distribution recently proposed in [12], with probability mass function (pmf)
P ( M = m ; θ ) = B m θ m e e θ + 1 m ! , m = 0 , 1 , 2 , ,
for θ > 0 , where B m is the Bell number defined as B m = e 1 k = 0 k m / k ! . The first few Bell numbers are B 0 = B 1 = 1 , B 2 = 2 , B 3 = 5 , B 4 = 15 , B 5 = 52 , B 6 = 203 ,   B 7 = 877 , B 8 = 4140 , B 9 = 21147 and B 10 = 115975 . We denote M Bell ( θ ) to say that M follows a distribution with pmf in (1). Denoting the mean and the variance of M by E ( M ) and Var ( M ) , respectively, we have that E ( M ) = θ e θ and Var ( M ) = θ ( 1 + θ ) e θ . Since Var ( M ) > E ( W ) , the Bell distribution accommodates overdispersed counts. For cure rate models, this distribution is interesting due to some characteristics:
  • The distribution has only one parameter, being an alternative to traditional discrete distributions such as Poisson and Geometric.
  • The probability generating function (pgf) of the distribution has a simple expression. In fact, if M Bell ( θ ) , then its pgf is given by G ( s ; θ ) = exp ( e θ s e θ ) , for | s | < 1 . See Proposition 1 in [12]. This fact is relevant from the point of view of cure rate models because the population survival function depends on this function.
  • P ( M = 0 ; θ ) = exp ( e θ + 1 ) has a simple form. This fact is important because this probability is the cure rate of the model. The simplicity of this term allows, among other things, to reparameterize the model in terms of the cure rate.
  • The distribution belongs to the power series family of distributions [13] with pmf P ( M = m ; θ ) = a m θ m / A ( θ ) , where a m = B m / m ! , m = 0 , 1 , 2 , , and A ( θ ) = m = 0 a m θ m = exp ( e θ 1 ) . Recently, ref. [14] proposed an EM-type algorithm for a class of cure rate models based on the power series family. The maximization (M) step of the EM algorithm is decomposed in two steps involving the distribution of the number of competing causes M and the distribution of the latent times Z.
  • To the best of our knowledge, the Bell distribution has not yet been proposed to model the number of competing causes in a cure rate models context.
The remaining of the paper is organized as follows. In Section 2, we present the cure rate rate model. Inference methods are presented in Section 3. Section 4 is dedicated to a simulation study covering properties of the estimators and model selection. A dataset on melanoma is analyzed in Section 5. Closing remarks are given in Section 6.

2. Model

Let M be an unobserved variable denoting the initial number of competing causes related to the occurrence of an event of interest. We assume that M Bell ( θ ) . Under the competing risks setup in Section 1, from Theorem 2 in [4], see also [15], we have that the (improper) population survival function of the Bell cure rate (Bellcr) model is given by
S p o p ( t ; θ , λ ) = P ( T > t ; θ , λ ) = G S ( t ; λ ) ; θ = exp [ e θ S ( t ; λ ) e θ ] .
It is immediate that the cure rate of the model is given by p = lim t S p o p ( t ; ψ ) = exp ( 1 e θ ) = P ( M = 0 ; θ ) . Henceforth, we adopt the parameterization p = exp ( 1 e θ ) , i.e., θ = log [ 1 log ( p ) ] . In this way, covariates can be directly linked to the cure rate p, allowing to compare regression coefficients among different models parameterized in terms of the cure rate.
An interesting family of cure rate models parameterized in the cure rate is studied in [16]. Such family includes the Bernoulli cure rate (Berncr) model, also known as mixture cure model [1,2], the Poisson cure rate (Pocr) model, also known as the promotion time cure model [3,17], the logarithmic cure rate (Locr) model and the negative binomial cure rate (NBcr) model, noticing that the NBcr model has an additional parameter q ( q > 0 ) and q = 1 corresponds to the geometric cure rate (Geocr) model.
Table 1 and Figure 1 show the variance of the number of competing causes M in terms of the cure rate for some models. Note that the curve for the Bellcr model lies between the Geocr and Pocr models. Also, the curves for the NBcr model with q = 0.3, 5 and 10 are close to the Locr, Bellcr and Pocr models, respectively. In fact, by applying the L’Hopital’s rule, we have lim q q ( 1 p 1 / q ) p 2 / q = log ( p ) so that the proximity between the NBcr ( q = 10 ) and Pocr curves is justified.
With the parameterization in the cure rate p, the population survival function for the Bellcr model in (2) is recast, leading to
S p o p ( t ; p , λ ) = p exp { 1 log ( p ) S ( t ; λ ) 1 } ,
so that the population density and hazard functions are given by
f p o p ( t ; p , λ ) = S p o p ( t ; p , λ ) [ 1 log ( p ) ] S ( t ; λ ) log [ 1 log ( p ) ] f ( t ; λ )
and h p o p ( t ; p , λ ) = [ 1 log ( p ) ] S ( t ; λ ) log [ 1 log ( p ) ] f ( t ; λ ) .
In a sample of size n, for the i-th subject the covariates are represented by x i = ( 1 , x 1 i , , x r i ) , i = 1 , , n , where the symbol “⊤” denotes the transpose operator. The cure rate p i is linked to x i through the logistic function, that is,
log p i 1 p i = x i β ,
where β = ( β 0 , β 1 , , β r ) is the vector of regression coefficients of dimension r + 1 . Therefore, interpretations about the regression coefficients can be obtained in terms of the odds ratio for the cure probability.

3. Inference

In this section, we present some details about the estimation procedure for the parameters of the model in Section 2. We consider a framework in which the lifetimes are subject to right censoring. Let Y i and C i be the failure and censoring time variables for the i-th subject, respectively. In a sample of size n, the observed variables are T i = min ( Y i , C i ) and δ i = I ( Y i C i ) , where δ i = 1 and δ i = 0 denote either if a failure or a censored time was observed for the i-th subject, respectively, i = 1 , , n . Under the usual assumption of non-informative censoring and with p i as in (5), the log-likelihood function of ψ = ( β , λ ) is given by
( ψ ) = i = 1 n ( log ( p i ) + log { [ 1 log ( p i ) t ] S ( t ; λ ) 1 } + δ i { S ( t i ; λ ) log [ 1 log ( p i ) ] + log { ( log [ 1 log ( p i ) ] } + log [ f ( t i ; λ ) ] } ) .
The maximum likelihood (ML) estimator of ψ , ψ ^ say, is obtained by maximizing (6) with respect to ψ . However, direct maximization is not computationally simple because some terms in (6) involve both β and λ . For this reason and taking advantage of the EM algorithm developed for the power series family of cure rate models in [14], ML estimates in the Bellcr model are computed using this algorithm. In short, the k-th iteration of the EM algorithm goes as follows:
  • E step: For i = 1 , , n , compute M ˜ i ( k ) = μ i ( k 1 ) exp ( μ i ( k 1 ) ) + δ i ( 1 + μ i ( k 1 ) ) , where μ i ( k 1 ) = θ i ( k 1 ) S ( t i ; λ ( k 1 ) ) with θ i ( k 1 ) = log [ 1 log ( p i ( k 1 ) ) ] and p i ( k 1 ) comes from (5).
  • M step 1: Given M ˜ ( k ) = ( M ˜ 1 ( k ) , , M ˜ n ( k ) ) , find β ( k ) that maximizes Q 1 ( β | ψ ( k ) ) with respect to β , where
    Q 1 ( β | ψ ( k ) ) = i = 1 n M ˜ i ( k ) log { log [ 1 log ( p i ) ] } log ( p i ) .
  • M step 2: Given M ˜ ( k ) , find λ ( k ) that maximizes Q 2 ( λ | ψ ( k ) ) with respect to λ , where
    Q 2 ( λ | ψ ( k ) ) = i = 1 n { M ˜ i ( k ) log [ S ( t i ; λ ) ] + δ i log [ h ( t i ; λ ) ] } .
The E and M steps are cycled until a suitable convergence criterion is attained, for instance, | | ψ ( k ) ψ ( k 1 ) | | < ϵ , where “ | | · | | ” denotes the Euclidean norm and ϵ is a specified tolerance. Maximization of the functions in (7) and (8) can be performed using the optim function in R language [19].
We stress that the proposed EM algorithm procedure is general in the sense that it can accommodate different distributions for the latent time Z in Section 1. In this work, we assume the Weibull distribution, denoted by Wei ( α , ν ) , because it is a suitable model in a cure rate model context. With λ = ( α , ν ) , the survival function for this distribution is given by S ( t ; λ ) = exp e α t ν , for t > 0 , ν > 0 and α R .
On the other hand, the normal theory asymptotic covariance matrix of the maximum likelihood estimator ψ ^ can be estimated from minus the Hessian matrix of the log-likelihood function in (6) evaluated at ψ = ψ ^ . The numDeriv R package [20] provides a numerical approximation to this matrix. Interval estimates of the parameters are computed from the asymptotic standard errors. Computational codes are available from the authors upon request.
Finally, inference usually is made on the cure rate. However, we might also be interested in the cure rate for subjects who survived up to a certain time t 0 , denoted by p t 0 . First, for t > t 0 , from (3) we have P ( T > t T > t 0 ) = P ( T > t ) / P ( T > t 0 ) = S p o p ( t ; p , λ ) / S p o p ( t 0 ; p , λ ) = exp { [ 1 log ( p ) ] S ( t ; λ ) [ 1 log ( p ) ] S ( t 0 ; λ ) } . Then, omitting the dependence on the covariates and passing to the limit for t we obtain the conditional cure rate
p t 0 = lim t P ( T > t ) / P ( T > t 0 ) = exp { 1 [ 1 log ( p ) ] S ( t 0 ; λ ) } .
Of course, if t 0 = 0 in (9), p t 0 reduces to the cure rate p in (5). For comparatives purposes, Remark 1 gives expressions for p t 0 under the Pocr, Locr, NBcr and Bincr models.
Remark 1.
For the Pocr, Locr, NBcr and Bincr models, routine calculation shows that p t 0 is given by
p t 0 = p S ( t 0 ; λ ) , f o r   t h e   P o c r   m o d e l , p t 0 = 1 ( 1 p 1 / q ) S ( t 0 ; λ ) q , f o r   t h e   N B c r   m o d e l , p t 0 = 1 + ( 1 / p 1 ) S ( t 0 ; λ ) q , f o r   t h e   B i n c r   m o d e l   a n d p t 0 = θ ( p ) S ( t 0 ; λ ) log 1 θ ( p ) S ( t 0 ; λ ) , f o r   t h e   L o c r   m o d e l ,
where θ ( p ) = 1 + p W [ exp ( 1 / p ) / p ] .

4. Simulation Studies

In this section, we present two simulation studies. The main goal of the first study is to assess the performance of the ML estimates for the parameters of the Bellcr model computed with the EM algorithm in Section 3. The second study is devoted to model selection based on Vuong’s test statistic in [21] (see Appendix A) when the true cure rate model is the Bellcr model or may be wrongly specified. The null hypothesis states that the two compared models are undistinguishable. If the null hypothesis is rejected, the model with the highest value of the likelihood function is preferable. We pick a model with three covariates x 1 , x 2 and x 3 . Figure 2 shows the scheme to draw the covariates.
The true value of β = ( β 0 , β 1 , β 2 , β 3 ) is computed from four combinations of values of ( x 1 , x 2 , x 3 ) and cure rates p c 1 , , p c 4 . For ( x 1 , x 2 , x 3 ) , we choose ( 0 , 0 , 0 ) , ( 1 , 0 , 0 ) , ( 0 , 5 , 2 ) and ( 1 , 10 , 5 ) . Solving the equations in (5), we get β 0 = logit ( p c 1 ) , β 1 = logit ( p c 1 ) + logit ( p c 2 ) , β 2 = logit ( p c 1 ) + ( 2 / 5 ) logit ( p c 2 ) + logit ( p c 3 ) ( 2 / 5 ) logit ( p c 4 ) and β 3 = 2   logit ( p c 1 ) logit ( p c 2 ) 2 logit ( p c 3 ) + logit ( p c 4 ) , where logit ( p ) = log ( p ) log ( 1 p ) , for p ( 0 , 1 ) . In the studies presented here, cure rates ( p c 1 , , p c 4 ) are ( 0.9 , 0.8 , 0.65 , 0.5 ) , ( 0.8 , 0.7 , 0.55 , 0.4 ) and ( 0.7 , 0.6 , 0.45 , 0.3 ) , labeled as Cure 1, Cure 2 and Cure 3, respectively. On the other hand, to set the vector λ = ( α , ν ) we choose values for the expected value E ( Z ) and the variance Var ( Z ) of the latent time Z in Section 1. For the Wei ( α , ν ) distribution in Section 3, these conditions imply that α = ν { log [ Γ ( 1 + 1 / ν ) ] log [ E ( Z ) ] } and ν is the solution to the equation Γ ( 1 + 2 / ν ) / Γ 2 ( 1 + 1 / ν ) = 1 + Var ( Z ) / [ E ( Z ) ] 2 . Table 2 showcases the true values of the parameters for the simulations.
The simulations comprise two sample sizes, n = 200 and n = 400 , and two cases for the latent time Z in Section 1, namely, (i) E ( Z ) = 7 and Var ( Z ) = 4 and (ii) E ( Z ) = 5 and Var ( Z ) = 3 . For each combination of sample size, cure rate ( p c 1 , , p c 4 ) and E ( Z ) , Var ( Z ) , we compute θ i = log [ 1 log ( p i ) ] , with p i as in (5). Next, M i is drawn from the Bell ( θ i ) distribution. If M i = 0 , the failure time is T i = + . If M i 1 , we draw Z 1 , , Z M i from the Wei ( α , ν ) distribution and the failure time is T i = min ( Z 1 , , Z M i ) . The censoring time C i is sampled from the U ( 0 , 15 ) distribution. A simulated dataset is formed by x i = ( 1 , x 1 i , x 2 i , x 3 i ) , Y i = min ( T i , C i ) and δ i = I ( T i C i ) , i = 1 , , n . For the Pocr, Locr, NBcr, Geocr and Berncr models in Section 4.2, the data generation process is similar.
Each scenario was replicated 1000 times. The average proportion of censored times ranges from 56% to 77%. The vectors of covariates x i , i = 1 , , n , are kept fixed throughout the replications.

4.1. Estimation

Having in mind the goal of assessing the behavior of the ML estimates of the parameters in the Bellcr model, Table 3 reports the simulated bias for each parameter (Bias), the average of the asymptotic standard errors (SE) based on the covariance matrix in Section 3, the root of the simulated mean squared error (RMSE) and the coverage probability of the normal theory 95% asymptotic confidence intervals (CP). In general, the estimators show a good behavior in all scenarios of our study. We see that Bias is low so that SE and RMSE are close and get closer when the sample size is 400. It is noteworthy that CP is close to the nominal value for both sample sizes, ranging from 0.939 to 0.963. Overall, we see that within the scope of our study, the estimators and the estimation algorithm in Section 3 have a good performance.

4.2. Model Comparison

In this section, we present a simulation study aimed to test the Bellcr model against the Pocr, Locr, NBcr and Geocr models. In our comparisons, the models are compared with the Bellcr model whichever the model generating the data (True model in Table 4 and Table 5). The models are tested against the Bellcr model using the Vuong’s test statistic. The nominal significance level is 5%.
In Table 4, when the data generating model is the Bellcr model, rejection rates of the Bellcr model are very low regardless of the model being compared, as expected. When the true model is the Pocr model, the highest rejection rates of the Bellcr model, between 11.4% and 24.4%, correspond to the Pocr and NBcr models. We recall the Pocr model is a limiting case of the NBcr model. In Table 4 and Table 5, we see null rejection rates (up to one decimal place) of the Bellcr model against the Berncr model. This is not surprising because the Berncr model (mixture cure model) is too simple to cope with the structure of the data generated in our simulation study.
When the data are drawn from the NBcr ( q = 3 ) model, rejection rates in Table 4 are low, as expected in light of Figure 1. The largest rejection rates of the Bellcr model are achieved when the true model is the Locr model in Table 4 (rates between 75.8% and 80.9%) and the Geocr model in Table 5 (rates between 15.8% and 96.2%). In all scenarios in Table 4 and Table 5, rejection rates of the Bellcr model are consistent with the patterns in Figure 1.

5. Data Analysis

In this section, we conduct an analysis of a dataset on melanoma available at the timereg R package [22]. The study includes 205 patients, with survival times observed after an operation for removal of a malignant melanoma. The observed times vary from 10 to 5565 days (from 0.03 to 15.24 years), with mean and median 5.89 and 5.49 years, respectively, and standard deviation 3.07 years. The dataset comprises 148 censored observations (72.2%), corresponding to patients who died from other causes or were still alive at the end of the study. The dataset includes the covariates ulceration status ( x 1 ) (absent, n = 115 ; present, n = 90 ) and tumor thickness ( x 2 ) (in mm, mean = 2.91, median = 1.94 and standard deviation = 2.96).
We consider the Bell cure rate model for analyzing this dataset. For comparison purposes, we also consider the Pocr, Locr, NBcr, Geocr and Bincr models. We assume the Weibull distribution for the time-to-event for the concurrent causes. First we assess the goodness of fit of the models. For such purpose, we use the Cox-Snell residual, see, e.g., [23] and the normalized quantile residual [24]. In Figure 3, we see that the Bellcr, Pocr, Locr and Geocr models yield a reasonable fit. However, the deviation from the identity line in both residuals plot is evident for the NBcr model. We omit the results for the Bincr model because the plots are similar to the ones for the NBcr model. For this reason, we disregard the NBcr and Bincr models. The Cox-Snell residuals highlight observations 200–205, which correspond to the largest censored times. On the other hand, the normalized quantile residuals do not suggest the presence of possible poorly fitted observations. Table 6 displays parameter estimates and odds ratio for the Bellcr, Locr, Pocr and Geocr models. At a 5% significance level, the coefficients of ulceration status ( β 1 ) and tumor thickness ( β 2 ) are significant and the negative sign is as expected. Moreover, the estimates of ν indicate that the exponential distribution for the latent times in Section 1 is not adequate.
Note that for all the models, exp ( β ^ 1 ) 0.23 . Therefore, patients with ulceration have approximately their probability of being cured reduced by a quarter in relation with patients without ulceration.
Table 7 shows the Vuong’s statistic for the fitted models. At a 5% significance level, we conclude that the Bellcr model is preferred to the Pocr model. Moreover, there is no significant difference among the Bellcr and the Locr and Geocr models.
Figure 4 shows the estimates of the mean of the conditional distribution of M i t i , δ i ; ψ ^ , say M ˜ i , for i = 1 , , n . As previously discussed, values closer to 0 suggest which individuals are cured. Note that the Bellcr and Locr models provide very similar values. For instance, the observation 6 corresponds to a patient who died after 204 days (0.56 years), with ulceration and a tumor thickness of 4.84 cm., providing M ˜ i equal to 2.988, 2.985 and 2.156 for Bellcr, Locr and Geocr models, respectively. Therefore, the three models suggest that this observation has a greater number of carcinogenic cells (a susceptible individual). Similarly, the observation 149 corresponds to a patient who died after 2782 days (7.62 years), without ulceration and a tumor thickness of 1.94 cm., providing M ˜ i equal to 1.076, 1.076 and 1.453 for Bellcr, Locr and Geocr models. This explains why the patient 6 fails in a time considerably lower than patient 149. In a similar way, the observation 203 has a censored time after 4688 days (12.84 years), without ulceration and a tumor thickness of 0.48 cm., resulting in M ˜ i equal to 0.002, 0.002 and 0.015 for the three models. In this case, the models agree that this patients is a potentially cured individual.
Finally, Figure 5 presents the width of the 95% confidence intervals for the conditional cure rate in (9) and Figure 6 displays estimates of cure rate under the Bellcr, Locr and Geocr models. In all models, the confidence intervals are computed using the delta method [25] (Theorem 3.4.5) using expressions of the covariance matrix of the estimators in the power series cure rate model presented in Gallardo et al. [14]. For the Locr and Geocr models, the conditional cure rate is computed as in Remark 1. Note that the Bellcr model provides more accurate confidence intervals for some settings in Figure 5. Finally, Figure 6 shows the conditional cure rate in Equation (9) for different in terms of the tumor thickness for t 0 = 0 and t 0 = 5 years and patients with and without ulceration.

6. Conclusions

In this paper, a cure rate model is proposed under the competing risks setup. For the number of competing causes of the event of interest, we posit the Bell distribution. This is a one-parameter distribution that accommodates overdispersed counts. [26] emphasized that the Poisson distribution represents a strong assumption when modelling the number of competing causes. Therefore, compared to the two-parameter negative binomial distribution, the proposed model is more parsimonious. The cure rate is a parameter linked to covariates, facilitating the comparison with other models. Parameter estimates are computed through iterative steps of the EM algorithm. In order to compare different models, the test statistic proposed in [21] is implemented. In our simulation studies, the estimation method and the test statistic have a good performance.
A dataset on melanoma is analyzed using the proposed model as well as some models from the literature. Extensions of the proposed model to accommodate other types of censoring (e.g., interval censoring) might be a theme for future work.

Author Contributions

Conceptualization, D.I.G. and M.d.C.; Formal analysis, D.I.G. and M.d.C.; Investigation, D.I.G., M.d.C. and H.W.G.; Methodology, D.I.G. and M.d.C.; Software, D.I.G. and M.d.C.; Supervision, D.I.G., M.d.C. and H.W.G.; Validation, D.I.G., M.d.C. and H.W.G.; All of the authors contributed significantly to this research article. All authors have read and agreed to the published version of the manuscript.

Funding

The research of H.W. Gómez was supported by SEMILLERO UA-2021 project, Chile. The work of M.d.C. was partially funded by CNPq, Brazil.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in Section 5 is duly referenced.

Acknowledgments

We thank the editors and the anonymous reviewers for their constructive comments, which helped us to improve the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Model Selection

Now we present the Vuong’s statistic [21] to test the hypothesis H 0 : The data are generated from the Bellcr model against H 1 : The data are generated from an alternative cure rate model. Let ψ ^ 0 and ψ ^ 1 denote the ML estimators of the model parameter ψ under H 0 and H 1 , respectively. The test statistic is given by V = n 1 / 2 ξ / ω , where ξ = i = 1 n log [ f i ( ψ ^ 0 ) / g i ( ψ ^ 1 ) ] and ω = { n 1 i = 1 n log 2 [ f i ( ψ ^ 0 ) / g i ( ψ ^ 1 ) ] ξ 2 } 1 / 2 , with
f i ( ψ 0 ) = f p o p ( t i ; p , λ ) δ i S p o p ( t i ; p , λ ) 1 δ i ,
where S p o p and f p o p are given in (3) and (4), respectively and g i ( ψ 1 ) denotes the contribution of the i-th individual in the likelihood function under H 1 (Pocr, Locr, NBcr, Geocr or Bincr model).

References

  1. Boag, J.W. Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J. R. Stat. Soc. B 1949, 35, 15–53. [Google Scholar] [CrossRef]
  2. Berkson, J.; Gage, R.P. Survival curve for cancer patients following treatment. J. Am. Stat. Assoc. 1952, 47, 501–515. [Google Scholar] [CrossRef]
  3. Yakovlev, A.Y.; Tsodikov, A.D. Stochastic Models of Tumor Latency and Their Biostatistical Applications; World Scientific: Singapore, 1996. [Google Scholar]
  4. Rodrigues, J.; Cancho, V.G.; de Castro, M.; Louzada-Neto, F. On the unification of the long-term survival models. Stat. Probab. Lett. 2009, 79, 753–759. [Google Scholar] [CrossRef]
  5. Rodrigues, J.; de Castro, M.A.; Cancho, V.G.; Balakrishnan, N. COM-Poisson cure rate survival model and an application to a cutaneous melanoma data. J. Stat. Plan. Inference 2009, 139, 3605–3611. [Google Scholar] [CrossRef]
  6. Cancho, V.G.; Louzada, F.; Ortega, E.M. The power series cure rate model: An application to a cutaneous melanoma data. Commun. Stat. Simul. Comput. 2013, 42, 586–602. [Google Scholar] [CrossRef]
  7. Gallardo, D.I.; Gómez, H.W.; Bolfarine, H. A new cure rate model based on the Yule-Simon distribution with application to a melanoma data set. J. Appl. Stat. 2017, 44, 1153–1164. [Google Scholar] [CrossRef]
  8. Gallardo, D.I.; Gómez, Y.M.; De Castro, M. A flexible cure rate model based on the polylogarithm distribution. J. Stat. Comput. Simul. 2018, 88, 2137–2149. [Google Scholar] [CrossRef]
  9. Gallardo, D.I.; Gómez, Y.M.; Gómez, H.W.; De Castro, M. On the use of the modified power series family of distributions in a cure rate model context. Stat. Methods Med. Res. 2020, 29, 1831–1845. [Google Scholar] [CrossRef] [PubMed]
  10. Cancho, V.G.; Macera, M.A.C.; Suzuki, A.K.; Louzada, F.; Zavaleta, K.E.C. A new long-term survival model with dispersion induced by discrete frailty. Lifetime Data Anal. 2020, 26, 221–244. [Google Scholar] [CrossRef]
  11. Leao, J.; Bourguignon, M.; Gallardo, D.I.; Rocha, R.; Tomazella, V. A new cure rate model with flexible competing causes with applications to melanoma and transplantation data. Stat. Med. 2020, 24, 3272–3284. [Google Scholar] [CrossRef] [PubMed]
  12. Castellares, F.; Lemonte, A.J.; Santos, M.A.C. On the Nielsen distribution. Braz. J. Probab. Stat. 2020, 1, 90–111. [Google Scholar] [CrossRef]
  13. Noack, A. A class of random variables with discrete distributions. Ann. Math. Stat. 1950, 21, 127–132. [Google Scholar] [CrossRef]
  14. Gallardo, D.I.; Romeo, J.S.; Meyer, R. A simplified estimation procedure based on the EM algorithm for the power series cure rate model. Commun. Stat. Simul. Comput. 2017, 46, 6342–6359. [Google Scholar] [CrossRef]
  15. Tsodikov, A.D.; Ibrahim, J.G.; Yakovlev, A.Y. Estimating cure rates from survival data: An alternative to two-component mixture models. J. Am. Stat. Assoc. 2003, 98, 1063–1078. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Cancho, V.G.; De Castro, M.; Dey, D.K. Long-term survival models with latent activation under a flexible family of distributions. Braz. J. Probab. Stat. 2013, 27, 585–600. [Google Scholar] [CrossRef]
  17. Chen, M.-H.; Ibrahim, J.G.; Sinha, D. A new Bayesian model for survival data with a surviving fraction. J. Am. Stat. Assoc. 1999, 94, 909–919. [Google Scholar] [CrossRef]
  18. Corless, R.M.; Gonnet, G.H.; Hare, D.E.G.; Jeffrey, D.J.; Knuth, D.E. On the Lambert W function. Adv. Comput. Math. 1996, 5, 329–359. [Google Scholar] [CrossRef]
  19. R Development Core Team. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
  20. Gilbert, P.; Varadhan, R. numDeriv: Accurate Numerical Derivatives, R Package Version; Available online: http://optimizer.r-forge.r-project.org/ (accessed on 3 July 2021).
  21. Vuong, Q.H. Likelihood Ratio Tests for Model Selection and non-nested Hypotheses. Econometrica 1989, 57, 307–333. [Google Scholar] [CrossRef] [Green Version]
  22. Scheike, T.H.; Zhang, M.J. Analyzing competing risk data using the R timereg package. J. Stat. Softw. 2011, 38, 1–15. [Google Scholar] [CrossRef] [Green Version]
  23. Conlon, A.S.C.; Taylor, J.M.G.; Sargent, D.J. Multi-state models for colon cancer recurrence and death with a cure fraction. Stat. Med. 2014, 33, 1750–1766. [Google Scholar] [CrossRef] [Green Version]
  24. Dunn, P.K.; Smyth, G.K. Randomized quantile residuals. J. Comput. Graph. Stat. 1996, 5, 236–244. [Google Scholar]
  25. Sen, P.K.; Singer, J.M. Large Sample Methods in Statistics: An Introduction with Applications; Chapman & Hall: New York, NY, USA, 1993. [Google Scholar]
  26. Tournoud, M.; Ecochard, R. Promotion time models with time-changing exposure and heterogeneity: Application to infectious diseases. Biom. J. 2008, 50, 395–407. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Variance of the number of competing causes as a function of the cure rate for some models.
Figure 1. Variance of the number of competing causes as a function of the cure rate for some models.
Mathematics 09 01815 g001
Figure 2. Scheme to draw the covariates in the simulation study. Bern (0.5) denotes the Bernoulli distribution with probability 0.5 and U ( a , b ) denotes the uniform distribution on ( a , b ).
Figure 2. Scheme to draw the covariates in the simulation study. Bern (0.5) denotes the Bernoulli distribution with probability 0.5 and U ( a , b ) denotes the uniform distribution on ( a , b ).
Mathematics 09 01815 g002
Figure 3. Cox-Snell residuals (upper panels) and normalized quantile residuals plots (lower panels): (a) Bellcr, (b) Pocr, (c) Locr, (d) NBcr and (e) Geocr models.
Figure 3. Cox-Snell residuals (upper panels) and normalized quantile residuals plots (lower panels): (a) Bellcr, (b) Pocr, (c) Locr, (d) NBcr and (e) Geocr models.
Mathematics 09 01815 g003
Figure 4. Estimate of the mean of the conditional distribution of M i t i , δ i ; ψ ^ in melanoma data set using (a) Bellcr, (b) Locr and (c) Geocr models.
Figure 4. Estimate of the mean of the conditional distribution of M i t i , δ i ; ψ ^ in melanoma data set using (a) Bellcr, (b) Locr and (c) Geocr models.
Mathematics 09 01815 g004
Figure 5. Width of the 95% confidence interval for the conditional cure rate: (a) without ulceration and t 0 = 0 year, (b) without ulceration and t 0 = 5 years, (c) with ulceration and t 0 = 5 years and (d) with ulceration and t 0 = 10 years.
Figure 5. Width of the 95% confidence interval for the conditional cure rate: (a) without ulceration and t 0 = 0 year, (b) without ulceration and t 0 = 5 years, (c) with ulceration and t 0 = 5 years and (d) with ulceration and t 0 = 10 years.
Mathematics 09 01815 g005
Figure 6. Estimates of the conditional cure rate ( p t 0 in Equation (9)) for t 0 = 5 years: (a) without ulceration and (b) with ulceration. Estimated survival function for a tumor thickness of 10 cms: (c) without ulceration and (d) with ulceration.
Figure 6. Estimates of the conditional cure rate ( p t 0 in Equation (9)) for t 0 = 5 years: (a) without ulceration and (b) with ulceration. Estimated survival function for a tumor thickness of 10 cms: (c) without ulceration and (d) with ulceration.
Mathematics 09 01815 g006
Table 1. Variance of the number of competing causes as a function of the cure rate for some models parameterized in the cure rate (p).
Table 1. Variance of the number of competing causes as a function of the cure rate for some models parameterized in the cure rate (p).
ModelVar ( M ) ModelVar ( M )
Berncr p ( 1 p ) NBcr q ( 1 p 1 / q ) p 2 / q
Pocr log ( p ) Locr * θ [ θ + log ( 1 θ ) ] / [ ( 1 θ ) log ( 1 θ ) ] 2
Geocr ( 1 p ) p 2 Bellcr [ 1 log ( p ) ] log [ 1 log ( p ) ] { 1 + log [ 1 log ( p ) ] }
θ = θ ( p ) = 1 + p W [ exp ( 1 / p ) / p ] , where W ( · ) denotes the Lambert function  [18].
Table 2. True values of the parameters in the simulation studies.
Table 2. True values of the parameters in the simulation studies.
Cure Rate β 0 β 1 β 2 β 3 ( E ( Z ) , Var ( Z ) ) α ν
Cure 12.197−0.811−1.0241.770 (7 , 4)−8.0183.920
Cure 21.386−0.539−0.6841.118 (5 , 3)−5.4443.165
Cure 30.847−0.442−0.5470.843 (2 , 1)−1.7122.101
Table 3. Simulated bias (Bias), average of the asymptotic standard errors (SE), root of the simulated mean squared error (RMSE) and coverage probability of the 95% asymptotic confidence intervals (CP) for the Bellcr model.
Table 3. Simulated bias (Bias), average of the asymptotic standard errors (SE), root of the simulated mean squared error (RMSE) and coverage probability of the 95% asymptotic confidence intervals (CP) for the Bellcr model.
Cure Rate ( E ( Z ) , Var ( Z ) ) Parameter n = 200 n = 400
BiasSERMSECPBiasSERMSECP
Cure 1(7, 4) β 0  0.1090.5940.6590.945 0.0460.4030.4220.952
β 1 −0.0030.6920.7370.949−0.0030.4710.4740.954
β 2 −0.0670.2100.2570.963−0.0270.1380.1430.950
β 3  0.1120.3960.4670.957 0.0490.2600.2690.955
α −0.3350.8230.8980.953−0.2210.5720.6470.951
ν  0.1640.4460.4870.951 0.1070.3070.3390.951
(5, 3) β 0  0.1180.5320.5970.944 0.0500.3630.3830.948
β 1  0.0060.6210.6410.958 0.0050.4270.4510.949
β 2 −0.0570.1830.2080.950−0.0210.1230.1320.950
β 3  0.0820.3440.3660.955 0.0230.2320.2470.949
α −0.2060.5180.5630.943−0.1240.3610.3970.946
ν  0.1110.3210.3480.943 0.0730.2230.2370.945
Cure 2(7, 4) β 0  0.0430.4710.5010.945 0.0210.3230.3230.953
β 1  0.0050.5900.6000.955−0.0030.4080.4340.953
β 2 −0.0350.1390.1470.950−0.0150.0950.1000.950
β 3  0.0580.2680.2880.941 0.0280.1830.1910.945
α −0.2760.8200.8890.940−0.1320.5690.5900.949
ν  0.1370.4310.4580.947 0.0690.2980.3090.952
(5, 3) β 0  0.0510.4210.4330.948 0.0200.2910.2880.952
β 1 −0.0190.5340.5520.956−0.0010.3690.3900.951
β 2 −0.0260.1240.1270.964−0.0130.0860.0870.950
β 3  0.0450.2400.2480.951 0.0220.1650.1710.950
α −0.1460.5140.5340.944−0.0780.3580.3600.953
ν  0.0900.3110.3240.948 0.0510.2160.2200.952
Cure 3(7, 4) β 0  0.0050.4200.4140.957 0.0040.2900.3000.949
β 1  0.0000.5420.5410.957−0.0030.3750.3870.950
β 2 −0.0240.1160.1190.956−0.0110.0790.0830.953
β 3  0.0500.2270.2340.951 0.0130.1550.1600.950
α −0.1920.7650.8140.957−0.1320.5340.5670.953
ν  0.0960.4010.4210.956 0.0710.2790.2970.948
(5, 3) β 0  0.0130.3760.3870.954 0.0130.2610.2750.948
β 1 −0.0100.4900.5000.950−0.0050.3410.3360.950
β 2 −0.0180.1040.1100.955−0.0120.0720.0740.948
β 3  0.0290.2030.2110.946 0.0170.1410.1450.950
α −0.1290.4790.5150.939−0.0610.3340.3390.953
ν  0.0810.2890.3070.951 0.0350.2010.2000.950
Table 4. Rejection rate (in %) of the Bellcr model when compared with different models using the Vuong’s test statistic—Part 1.
Table 4. Rejection rate (in %) of the Bellcr model when compared with different models using the Vuong’s test statistic—Part 1.
True Model( E ( Z ) , Var ( Z ) )Model Compared with Bellcr n = 200 n = 400
Cure RateCure Rate
Cure 1Cure 2Cure 3Cure 1Cure 2Cure 3
Bellcr(7, 4)Pocr2.72.42.91.82.11.6
Locr0.00.00.00.00.00.0
NBcr4.23.33.73.83.63.1
Geocr0.40.30.20.00.00.5
Berncr0.00.00.00.00.00.0
(5, 3)Pocr1.72.12.81.01.01.8
Locr0.00.00.00.00.00.0
NBcr3.62.93.52.32.22.8
Geocr0.20.60.50.00.00.3
Berncr0.00.00.00.00.00.0
(2, 1)Pocr1.31.52.80.70.90.7
Locr0.00.00.00.00.00.0
NBcr2.72.73.72.32.01.6
Geocr0.50.70.80.00.00.1
Berncr0.00.00.00.00.00.0
Pocr(7, 4)Pocr13.611.411.816.613.614.2
Locr0.00.00.10.00.00.0
NBcr16.112.412.919.615.516.0
Geocr0.20.10.00.00.00.0
Berncr0.00.00.00.00.00.0
(5, 3)Pocr12.914.511.721.717.215.2
Locr0.00.00.00.00.00.0
NBcr14.915.312.224.418.617.2
Geocr0.00.20.10.00.00.0
Berncr0.00.00.00.00.00.0
(2, 1)Pocr15.615.013.920.918.315.1
Locr0.00.00.00.00.00.0
NBcr17.216.414.622.820.116.8
Geocr0.10.00.10.00.00.0
Berncr0.00.00.00.00.00.0
Locr(7, 4)Pocr6.21.60.49.50.00.3
Locr79.979.779.680.980.880.8
NBcr5.95.06.09.52.36.3
Geocr0.53.79.70.04.511.3
Berncr0.00.00.00.00.00.0
(5, 3)Pocr6.80.80.76.70.60.2
Locr76.175.975.880.580.480.3
NBcr7.43.44.86.73.97.9
Geocr0.34.39.40.14.414.7
Berncr0.00.00.00.00.00.0
(2, 1)Pocr5.70.90.47.20.30.1
Locr80.079.979.778.678.578.6
NBcr6.14.76.77.24.110.9
Geocr0.56.710.70.16.318.2
Berncr0.00.00.00.00.00.0
Table 5. Rejection rate (in %) of the Bellcr model when compared with different models using the Vuong’s test statistic—Part 2.
Table 5. Rejection rate (in %) of the Bellcr model when compared with different models using the Vuong’s test statistic—Part 2.
True Model( E ( Z ) , Var ( Z ) )Model Compared with Bellcr n = 200 n = 400
Cure RateCure Rate
Case 1Case 2Case 3Case 1Case 2Case 3
NBcr ( q = 3 )(7, 4)Pocr0.92.33.40.41.41.6
Locr0.00.00.00.00.00.0
NBcr4.83.55.36.54.63.9
Geocr0.81.40.60.30.50.5
Berncr0.00.00.00.00.00.0
(5, 3)Pocr0.82.53.00.40.41.5
Locr0.00.00.00.00.00.0
NBcr7.24.84.211.23.44.1
Geocr1.71.00.70.50.30.7
Berncr0.00.00.00.00.00.0
(2, 1)Pocr0.71.42.80.11.01.7
Locr0.00.00.00.00.00.0
NBcr5.04.23.99.84.64.7
Geocr1.00.80.50.60.20.5
Berncr0.00.00.00.00.00.0
Geocr(7, 4)Pocr0.00.00.00.00.00.0
Locr0.40.80.50.10.10.4
NBcr64.719.413.392.449.834.3
Geocr58.725.718.883.549.138.8
Berncr0.00.00.00.00.00.0
(5, 3)Pocr0.00.00.00.00.00.0
Locr0.00.10.30.00.00.1
NBcr69.825.815.894.855.636.8
Geocr62.530.822.187.452.039.5
Berncr0.00.00.00.00.00.0
(2, 1)Pocr0.00.00.00.00.00.0
Locr0.00.00.00.00.00.0
NBcr72.229.417.396.260.440.7
Geocr63.933.424.289.656.742.7
Berncr0.00.00.00.00.00.0
Table 6. Parameter estimates, standard errors (SE) and maximum value of the log-likelihood function ( ψ ^ ) for different models.
Table 6. Parameter estimates, standard errors (SE) and maximum value of the log-likelihood function ( ψ ^ ) for different models.
ParameterModel
BellcrPocrLocrGeocr
EstimateSEEstimateSEEstimateSEEstimateSE
β 0  1.87590.2449 1.90390.3431 1.67710.3566 1.81270.3501
β 1 −1.45360.2843−1.48140.3901−1.48160.3278−1.48070.3569
β 2 −0.19080.0416−0.19600.0705−0.14120.0356−0.17850.0537
α −3.24610.3124−2.94350.3411−4.18630.5154−3.49070.3916
ν  1.82870.1808 1.73680.2178 2.21100.2725 1.91320.2357
exp ( β 1 ) 0.23370.06640.22730.08870.22730.07450.22750.0812
exp ( β 2 ) 0.82630.03440.82200.05800.86830.03090.83650.0449
( ψ ^ ) −206.3−207.5−203.7−205.4
Table 7. Vuong’s statistic and p-value when testing the Bell cure rate model against some alternatives.
Table 7. Vuong’s statistic and p-value when testing the Bell cure rate model against some alternatives.
ModelPocrLocrGeocr
Vuong’s statistic−2.4731.2381.146
p-Value 0.0130.2160.252
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Gallardo, D.I.; de Castro, M.; Gómez, H.W. An Alternative Promotion Time Cure Model with Overdispersed Number of Competing Causes: An Application to Melanoma Data. Mathematics 2021, 9, 1815. https://doi.org/10.3390/math9151815

AMA Style

Gallardo DI, de Castro M, Gómez HW. An Alternative Promotion Time Cure Model with Overdispersed Number of Competing Causes: An Application to Melanoma Data. Mathematics. 2021; 9(15):1815. https://doi.org/10.3390/math9151815

Chicago/Turabian Style

Gallardo, Diego I., Mário de Castro, and Héctor W. Gómez. 2021. "An Alternative Promotion Time Cure Model with Overdispersed Number of Competing Causes: An Application to Melanoma Data" Mathematics 9, no. 15: 1815. https://doi.org/10.3390/math9151815

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop