Skip to main content
Log in

A Hard EM algorithm for prediction of the cured fraction in survival data

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In clinical studies, survival analysis is a well known technique to analyze time to event data with the assumption that every subject in the study will encounter the event of interest. With recent advancements in the drug development industry, a fraction of subjects may not face the event and are considered as immune or cured. However, due to the finite study period, full knowledge of subjects who are immune is usually not known and hence, can be considered as missing. We develop a novel semi-parametric algorithm to address this problem by minimizing a suitable loss function, which incorporates the missing data and generates cure indicators for the censored individuals. We prove the existence of a global minimizer for the loss function and establish some asymptotic properties, demonstrate via numerical experiments that under appropriate circumstances, our approach performs better than simpler alternatives, and use this algorithm to estimate lifetime parameters and the overall survivor function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Abbreviations

PU:

Positive and unlabeled

EM:

Expectation-maximization

ML:

Maximum likelihood

SCAR:

Selected completely at random

AUC:

Area under the curve

H-score:

Null hypothesis (3.1) for the given score

H-logloss:

Null hypothesis (3.1) for the logloss score

H-Accuracy:

Null hypothesis (3.1) for the accuracy score

H-AUC:

Null hypothesis (3.1) for the AUC score

SLSQP:

Sequential least squares programming

References

  • Andersen PK, Borgan O, Gill RD, Keiding N (2012) Statistical models based on counting processes. Springer

  • Balakrishnan N, Pal S (2016) Expectation maximization-based likelihood inference for flexible cure rate models with Weibull lifetimes. Stat Methods Med Res 25(4):1535–1563

    Article  MathSciNet  Google Scholar 

  • Balakrishnan N, Barui S, Milienos F (2017) Proportional hazards under Conway-Maxwell-Poisson cure rate model and associated inference. Stat Methods Med Res 26(5):2055–2077

    Article  MathSciNet  Google Scholar 

  • Bekker J, Davis J (2020) Learning from positive and unlabeled data: a survey. Mach Learn 109:719–760

    Article  MathSciNet  Google Scholar 

  • Bekker J, Robberechts P, Davis J (2019) Beyond the selected completely at random assumption for learning from positive and unlabeled data. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 71–85

  • Berkson J, Gage RP (1952) Survival curve for cancer patients following treatment. J Am Stat Assoc 47:501–515

    Article  Google Scholar 

  • Boag JW (1949) Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J R Stat Soc Ser B 11:15–53

    MATH  Google Scholar 

  • Breslow NE (1975) Analysis of survival data under the proportional hazards model. Int Stat Rev 45–57

  • Casella G, Berger RL (2002) Statistical inference, vol 2. Duxbury Pacific Grove, CA

    MATH  Google Scholar 

  • Chaudhari S, Shevade S (2012) Learning from positive and unlabelled examples using maximum margin clustering. In: International conference on neural information processing. Springer, pp 465–473

  • Chen YC (2018) Statistical inference with local optima. arXiv:1807.04431

  • Cohen SB, Smith NA (2010). Viterbi training for PCFGs: Hardness results and competitiveness of uniform initialization. In: Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pp 1502–1511

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 213–220

  • Farewell VT (1982) The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38:1041–1046

    Article  Google Scholar 

  • Kuk AY, Chen CH (1992) A mixture model combining logistic regression with proportional hazards regression. Biometrika 79:531–541

    Article  Google Scholar 

  • Li XL, Liu B (2005) Learning from positive and unlabeled examples with different data distributions. In: European conference on machine learning. Springer, pp 218–229

  • Liu B, Lee WS, Yu PS, Li X (2002) Partially supervised classification of text documents. ICML 2:387–394

    Google Scholar 

  • Maller RA, Zhou X (1996) Survival analysis with long-term survivors. Wiley, New York

    MATH  Google Scholar 

  • Peng Y, Dear KB (2000) A nonparametric mixture model for cure rate estimation. Biometrics 56:237–243

    Article  Google Scholar 

  • Prinja S, Gupta N, Verma R (2010) Censoring in clinical trials: review of survival analysis techniques. Indian J Commun Med Off Publ Indian Assoc Prev Soc Med 35(2):217

    Google Scholar 

  • Rodrigues J, de Castro M, Cancho VG, Balakrishnan N (2009) COM-Poisson cure rate survival models and an application to a cutaneous melanoma data. J Stat Plann Infer 139:3605–3611

    Article  MathSciNet  Google Scholar 

  • Shivaswamy PK, Chu W, Jansche M (2007) A support vector approach to censored targets. In: Seventh IEEE international conference on data mining (ICDM 2007). IEEE, pp 655–660

  • Sy JP, Taylor JM (2000) Estimation in a Cox proportional hazards cure model. Biometrics 56:227–236

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandip Barui.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kosovalić, N., Barui, S. A Hard EM algorithm for prediction of the cured fraction in survival data. Comput Stat 37, 817–835 (2022). https://doi.org/10.1007/s00180-021-01140-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-021-01140-0

Keywords

Navigation