Skip to main content

Advertisement

Log in

Applying Data Science methods and tools to unveil healthcare use of lung cancer patients in a teaching hospital in Spain

  • Research Article
  • Published:
Clinical and Translational Oncology Aims and scope Submit manuscript

Abstract

Purpose

Our primary goal was to study the use of outpatient attendances by lung cancer patients in Hospital Universitario Puerta de Hierro Majadahonda (HUPHM), Spain, by leveraging our Electronic Patient Record (EPR) and structured clinical registry of lung cancer cases as well as assessing current Data Science methods and tools.

Methods/patients

We applied the Cross-Industry Standard Process for Data Mining (CRISP-DM) to integrate and analyze activity data extracted from the EPR (9.3 million records) and clinical data of lung cancer patients from a previous registry that was curated into a new, structured database based on REDCap. We have described and quantified factors with an influence in outpatient care use from univariate and multivariate points of view (through Poisson and negative binomial regression).

Results

Three cycles of CRISP-DM were performed resulting in a curated database of 522 lung cancer patients with 133 variables which generated 43,197 outpatient visits and tests, 1538 ER visits and 753 inpatient admissions. Stage and ECOG-PS at diagnosis and Charlson Comorbidity Index were major contributors to healthcare use. We also found that the patients’ pattern of healthcare use (even before diagnosis), the existence of a history of cancer in first-grade relatives, smoking habits, or even age at diagnosis, could play a relevant role.

Conclusions

Integrating activity data from EPR and clinical structured data from lung cancer patients and applying CRISP-DM has allowed us to describe healthcare use in connection with clinical variables that could be used to plan resources and improve quality of care.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Eurostat Press Releases. 1 in 4 deaths caused by cancer in the EU28. http://ec.europa.eu/eurostat/en/web/products-press-releases/-/3-25112014-BP. Accessed 22 Mar 2018.

  2. Sociedad Española de Oncología Médica (SEOM). Las Cifras del Cáncer en España. 2018. https://seom.org/seomcms/images/stories/recursos/Las_Cifras_del_cancer_en_Espana2018.pdf. Accessed 22 Mar 2018.

  3. GLOBOCAN 2012. Estimated Cancer Incidence, Mortality and Prevalence Worldwide in 2012. ARCI: OMS; http://globocan.iarc.fr/Default.aspx. Accessed 13 Jan 2018.

  4. Provencio M, García-López FJ, Bonilla F, España P.Comparison of the long-term mortality in Hodgkin’s disease patients with that of the general population. Ann Oncol . 1999; 10(10): 1199–205. http://www.ncbi.nlm.nih.gov/pubmed/10586337.

  5. Provencio M, Sánchez A, Bonilla F, España P. Association between hepatitis C virus and non-Hodgkin’s lymphomas. J Clin Oncol. 2006;24(21):3513. https://doi.org/10.1200/JCO.2006.06.4329.

    Article  PubMed  Google Scholar 

  6. Provencio M, España P, Millán I, Yebra M, Sánchez AC, de la Torre A et al. Prognostic factors in Hodgkin’s disease. Leukemia Lymphoma 2004, 45(6), pp. 1133–1139. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/15359992

    Article  Google Scholar 

  7. Provencio M, Millán I, España P, Sánchez AC, Sánchez JJ, Cantos B, et al. Analysis of competing risks of causes of death and their variation over different time periods in Hodgkin’s disease. Clin. Cancer Res: Official J Am. Assoc. Cancer Res. 2008;14(16), 5300–5305. https://doi.org/10.1158/1078-0432.CCR-07-0927

    Article  Google Scholar 

  8. Shearer C. The CRISP-DM Model: the new blueprint for data mining. J Data Wareh. 2000;5:13–22.

    Google Scholar 

  9. Marbán O, Mariscal G, Segovia J. A data mining & knowledge discovery process model, data mining and knowledge discovery in real life applications. Julio Ponce and Adem Karahoca (Ed.), InTech. 2009. https://doi.org/10.5772/6438. [Último acceso el 09/04/18]. https://mts.intechopen.com/books/data_mining_and_knowledge_discovery_in_real_life_applications/a_data_mining__amp__knowledge_discovery_process_model. Accessed 9 Apr 2018

    Google Scholar 

  10. Rivo E, de la Fuente J, Rivo Á, García-Fontán E, Cañizares M-Á, Gil P. Cross-industry standard process for data mining is applicable to the lung cancer surgery domain, improving decision making as well as knowledge and quality management. Clin Transl Oncol. 2012;14(1):73–9. https://doi.org/10.1007/s12094-012-0764-8.

    Article  PubMed  Google Scholar 

  11. Pérez J, Iturbide E, Olivares V, Hidalgo M, Martínez A, Almanza N. A data preparation methodology in data mining applied to mortality population databases. J Med Syst. 2015;39(11):152. https://doi.org/10.1007/s10916-015-0312-5.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Harris PA, Taylor Robert, Thielke Robert, Payne Jonathon, Gonzalez Nathaniel, Conde Jose G. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inf. 2009;42(2):377–81.

    Article  Google Scholar 

  13. Oken MM, Creech RH, Tormey DC, Horton J, Davis TE, McFadden ET, Carbone PP . Toxicity and response criteria of the Eastern Cooperative Oncology Group. Am j Clin Oncol. 2018;5(6):649–55. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/7165009.

    Article  CAS  Google Scholar 

  14. Spanish Ministry of Health. Índice Nacional de Defunciones. https://www.msssi.gob.es/estadEstudios/estadisticas/estadisticas/estMinisterio/IND_TipoDifusion.htm. Accessed 09 Apr 2018.

  15. Charlson ME, Pompei P, Ales KL, McKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chron Dis. 1987;40(5):373–83.

    Article  CAS  Google Scholar 

  16. Stagg V. CHARLSON: Stata module to calculate Charlson index of comorbidity. 2017. https://EconPapers.repec.org/RePEc:boc:bocode:s456719. Accessed: 27 Feb 2018.

  17. Deyo RA, Cherkin DC, Ciol MA. Adapting a clinical comorbidity index for use with ICD-9 administrative databases. J Clin Epidemiol. 1992;45(10):613–9. https://doi.org/10.1016/0895-4356(93)90104-9.

    Article  CAS  Google Scholar 

  18. Quan H, Sundararajan V, Halfon P, Fong A. Coding algorithms for defining comorbidities in. Med Care. 2005;43(11):1130–9.

    Article  Google Scholar 

  19. Kim HJ, Fay MP, Feuer EJ, Midthune DN. Permutation tests for joinpoint regression with applications to cancer rates. Stat Med. 2000;19:335–51 (correction: 2001;20:655).

    Article  CAS  Google Scholar 

  20. Consellería de Sanidade, Xunta de Galicia, España; Organización Panamericana de la salud (OPS-OMS); Universidad CES, Colombia. Epidat: programa para análisis epidemiológico de datos. Versión 4.2, julio 2016. https://www.sergas.es/Saude-publica/EPIDAT-4-2?idioma=es. Accessed 11 Dec 2017

  21. IBM Corporation. IBM SPSS Modeler 1.0 Modelling Nodes. ftp://public.dhe.ibm.com/software/analytics/spss/documentation/modeler/18.0/en/ModelerModelingNodes.pdf. Accessed 09 Apr 2018.

  22. Ruiz EM, et al. Profiling lung cancer patients using electronic health records. J Med Syst. 2018;42:1–10.

    Article  Google Scholar 

  23. Spanish Lung Cancer Group (GECP). El GECP dibuja el primer mapa del cáncer de pulmón en España. https://www.gecp.org/el-gecp-dibuja-el-primer-mapa-del-cancer-de-pulmon-en-espana/. Accessed 09 Apr 2018.

  24. Instituto Nacional de Estadística. Indicadores Urbanos 2017. http://www.ine.es/prensa/ua_2017.pdf. Accessed 09 Apr 2018.

  25. Cheng TYD, Cramb SM, Baade PD, et al. The international epidemiology of lung cancer: latest trends, disparities, and tumor characteristics. J Thorac Oncol. 2016;11:1653–71.

    Article  Google Scholar 

  26. Bosetti C, Malvezzi M, Rosso T, Bertuccio P, Gallus S, Chatenoud L, La Vecchia C. Lung cancer mortality in European women: trends and predictions. Lung Cancer. 2012;78(3):171–8. https://doi.org/10.1016/j.lungcan.2012.08.008.

    Article  PubMed  Google Scholar 

  27. Dela Cruz CS, Tanoue LT, Matthay RA. Lung cancer: epidemiology, etiology, and prevention. Clin Chest Med. 2011;32(4):605–44. https://doi.org/10.1016/j.ccm.2011.09.001.

    Article  PubMed  Google Scholar 

  28. Driessen EJ, Aarts MJ, Bootsma GP, van Loon JG, Janssen-Heijnen ML. Trends in treatment and relative survival among non-small cell lung cancer patients in the Netherlands (1990–2014): disparities between younger and older patients. Lung Cancer. 2017;108:198–204. https://doi.org/10.1016/j.lungcan.2017.04.005.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to J. L. Cruz-Bermúdez, E. Menasalvas-Ruiz or M. Provencio.

Ethics declarations

Conflict of interest

We declare no conflicts of interest during the development of this research.

Ethical approval

Our study has been approved by the Ethics Committee of the Hospital and has, therefore, been performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki.

Informed consent

Consent is not required for this type of study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cruz-Bermúdez, J.L., Parejo, C., Martínez-Ruíz, F. et al. Applying Data Science methods and tools to unveil healthcare use of lung cancer patients in a teaching hospital in Spain. Clin Transl Oncol 21, 1472–1481 (2019). https://doi.org/10.1007/s12094-019-02074-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12094-019-02074-2

Keywords

Navigation