Summary
In this chapter, we present a retrospective clinical study where the adoption of computational intelligence approaches for performing knowledge extraction from gene expression data enabled an improved oncological clinical analysis. This study focuses on a survival analysis of estrogen receptor (ER) positive breast cancer patients treated with tamoxifen. The chapter describes each step of the gene expression data analysis procedure, from the quality control of data to the final validation going through normalization, feature transformation, feature selection, and model building. Each section proposes a set of guidelines and motivates the specific choice made for this particular study. Finally, the main guidelines that emerged from this study are the use of simple and effective techniques rather than complex non-linear models, the use of interpretable methods and the use of scalable computational solutions able to deal with multiplatform and multisource data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zizka, J., Hudik, T.: Machine Learning - Based Knowledge Extraction from Complex Clinical Oncological Data. In: Knowledge Extraction and Modelling Conference (2006)
Pritchard, K.I.: Aromatase inhibitors in adjuvant therapy of breast cancer: Before, instead of, or beyond tamoxifen. Journal of Clinical Oncology 23(22), 4852–4858 (2005)
Loi, S., Piccart, M., Haibe-Kains, B., Desmedt, C., Harris, A., Bergh, J., Ellis, P., Miller, L., Liu, E., Sotiriou, C.: Prediction of early distant relapses on tamoxifen in early-stage breast cancer (BC): a potential tool for adjuvant aromatase inhibitor (AI) tailoring. In: Proceedings of the American Society of Clinical Oncology Meeting, Orlando, abstract 509 (2005)
Perou, C.M., Sorlie, T., Eisen, M.B., van de Rijn, M., Jeffrey, S.S., Rees, C.A., Pollack, J.R., Ross, D.T., Jonhsen, H., Aklslen, L.A., Fluge, O., Pergamenschikov, A., Williams, C., Zhu, S.X., Loning, P.E., Borresen-Dale, A.L., Brown, P.O., Botstein, D.: Molecular portraits of human breast tumours. Nature 406(6797), 747–752 (2000)
Sorlie, T., Tibshirani, R., Parker, J., Hastie, T., Marron, J.S., Nobel, A., Deng, S., Johnsen, H., Pesich, R., Geister, S., Demeter, J., Perou, C., Lonning, P.E., Brown, P.O., Borresen-Dale, A.L., Botstein, D.: Repeated observation of breast tumor subtypes in indepedent gene expression data sets. Proc. Natl. Acad. Sci. USA 1(14), 8418–8423 (2003)
Sotiriou, C., Neo, S.Y., McShane, L.M., Korn, E.L., Long, P.M., Jazaeri, A., Martiat, P., Fox, S., Harris, A.L., Liu, E.T.: Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proc. Natl. Acad. Sci. 100(18), 10393–10398 (2003)
Loi, S., Haibe-Kains, B., Desmedt, C., Lallemand, F., Tutt, A., Gillett, C., Ellis, P., Harris, A., Bergh, J., Foekens, J.A., Klijn, J., Larsimont, D., Buyse, M., Bontempi, G., Delorenzi, M., Piccart, M., Sotiriou, C.: Definition of clinically distinct molecular subtypes in estrogen receptor positive breast carcinomas through use of genomic grade. Journal of Clinical Oncology 25(10), 1239–1246 (2007)
Sotiriou, C., Piccart, M.J.: Taking gene-expression profiling to the clinic: when will molecular signatures become relevant to patient care? Nature Cancer Review 7, 545–553 (2007)
Ma, X.J., Wang, Z., Ryan, P.D., Isakoff, S.J., Barmettler, A., Fuller, A., Muir, B., Mohapatra, G., Salunga, R., Tuggle, J.T., Tran, Y., Tran, D., Tassin, A., Amon, P., Wang, W., Wang, W., Enright, E., Stecker, K., Estepa-Sabal, E., Smith, B., Younger, J., Balis, U., Michaelson, J., Bhan, A., Habion, K., Baer, T.M., Brugge, J., Haber, D.A., Erlander, M.G., Sgroi, D.S.: A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen. Cancer Cell 5, 607–616 (2004)
Paik, S., Shak, S., Tang, G., Kim, C., Bakker, J., Cronin, M., Baehner, F.L., Walker, M.G., Watson, D., Park, T., Hiller, W., Fisher, E.R., Wickerham, D.L., Bryant, J., Wolmark, N.: A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. New England Journal of Medicine (351), 2817–2826 (2004)
Jansen, M., Foekens, J.A., van Staveren, I.L., Dirkzwager-Kiel, M.M., Ritstier, K., Look, M.P., van Gelder, M.E.M., Sieuwerts, A.M., Portengen, H., Dorssers, L.C., Jlijn, J., Berns, M.: Molecular clasification of tamoxifen-resistant breast carcinomas by gene expression profiling. Journal of Clinical Oncology 23(4), 732–740 (2005)
Oh, D.S., Troester, M.A., Usary, J., Hu, Z., He, X., Fan, C., Wu, J., Carey, L.A., Perou, C.M.: Estrogen-regulated genes predict survival in hormone receptor–positive breast cancers. Journal of Clinical Oncology 24(11) (2006)
Ransohoff, D.F.: Rules of evidence for cancer molecular marker discovery and validation. Nature Cancer Review 4, 309–314 (2004)
Michiels, S., Koscielny, S., Hill, C.: Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365, 488–492 (2005)
Ein-Dor, L., Kela, I., Getz, G., Domany, E.: Outcome signature genes in breast cancer: Is there a unique set? Bioinformatics 21, 171–178 (2005)
Gentleman, R.: Reproducible research: A bioinformatics case study. Statistical Applications in Genetics and Molecular Biology 4(1) (2005)
Barrett, T., Suzek, T.O., Troup, D.B., Wilhite, S.E., Ngau, W.C., Rudnev, P.D., Lash, A.E., Fujibuchi, W., Edgar, R.: NCBI GEO: mining millions of expression profiles - database and tool. Nucleic Acids Research 33, D562 (2005)
R Development Core Team, R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0 (2007)
Parmigiani, G., Garett, E.S., Irizarry, R.A., Zeger, S.L.: The Analysis of Gene Expression Data. Springer, Heidelberg (2003)
Allison, P.D.: Survival Analysis Using SAS: A Practical Guide. SAS Institute Inc. (1995)
Duda, R.O., Hart, P.R., Stork, D.G.: Pattern classification. John Wiley and Sons, Chichester (2001)
Kaplan, E.L., Meier, P.: Nonparametric estimation from incomplete observations. Journal of American Statistical Asscoiation 53, 451–457 (1958)
Therneau, T.M., Grambsch, P.M.: Modeling Survival Data: Extending the Cox Model. Springer, Heidelberg (2000)
Cox, D.R.: Regression models and life tables. Journal of the Royal Statistical Society Series B 34, 187–220 (1972)
Gentleman, R., Huber, W., Carey, V.J., Irizarry, R.A., Dudoit, S.: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, Heidelberg (2005)
Affymetrix, GeneChip Expression Analysis (2002)
Irizarry, R.A., Boldstad, B.M., Collin, F., Cope, L.M., Hobbs, B., Speed, T.R.: Summaries of affymetrix genechip probe level data. Nucleic Acids Research 31(4) (2003)
Wu, Z., Irizarry, R.A.: Preprocessing of oligonucleotide array data. Nature Biotechnology 22, 656–658 (2004)
Li, C., Wong, W.H.: Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biology 2(8), 1–11 (2001)
Huber, W., von Heydebreck, A., Sultman, H., Poustka, A., Vingron, M.: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18(1), S96–S104 (2002)
Ploner, A., Miller, L.D., Hall, P., Bergh, J., Pawitan, Y.: Correlation test to assess low-level processing of high-density oligonucletide microarray data. BMC Bioinformatics 6(80), 1–20 (2005)
Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003)
Harr, B., Schlotterer, C.: Comparison of algorithms for the analysis of affymetrix microarray data as evaluated by co-expression of genes in known operons. Nucleic Acids Research 34(2), 8 (2006)
Hartigan, J.A.: Clustering Algorithms. Wiley, Chichester (1975)
Eisen, M., Spellman, P., Brown, P., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. PNAS 95, 14863–14868 (1998)
Chernoff, H., Lehmann, E.L.: The use of maximum likelihood estimates in chi-square tests for goodness-of-fit. The Annals of Mathematical Statistics 25, 579–586 (1954)
Cramer, H.: Mathematical Methods of Statistics. Princeton University Press, Princeton (1999)
Ambroise, C., McLachlan, G.: Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. USA 99, 6562–6566 (2002)
Nicolau, M., Tibshirani, R., Borresen-Dale, A.L., Jeffrey, S.S.: Disease-specific genomic analysis: identifying the signature of pathologic biology. Bioinformatics 23(8), 957–965 (2007)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer, Heidelberg (2001)
Weiss, S.M., Kulikowski, C.A.: Computer Systems that learn. Morgan Kaufmann, San Mateo (1991)
Pang, S., Havukkala, I., Hu, Y., Kasabov, N.: Classification consistency analysis for bootstrapping gene selection. Neural Computing and Applications 18(6), 527–539 (2007)
Davis, C.A., Gerick, F., Hintermair, V., Friedel, C.C., Fundel, K., Kuffner, R., Zimmer, R.: Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics 22(19), 2356–2363 (2006)
Hastie, T., Tibshirani, R.: Generalized Additive Models. Chapman and Hall, London (1990)
Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–238 (1998)
Harrell, F.E.: Tutorial in biostatistics: multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine 15, 361–387 (1996)
Pencina, M.J., D’Agostinno, R.B.: Overall C as a measure of discrimination in survival analysis: model specic population value and condence interval estimation. Statistics in Medicine 23, 2109–2123 (2004)
Varma, S., Simon, R.: Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 7(91), 1471–2105 (2006)
Cochrane, W.G.: Problems arising in the analysis of a series of similar experiments. Journal of the Royal Statistical Society 4, 102–118 (1937)
Freeman, W.M., Walker, S.J., Vrana, K.E.: Quantitative RT-PCR: pitfalls and potential. Biotechniques 26(1), 124–125 (1999)
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwoght, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unfication of biology. the gene ontology consortium. Nature Genetics 25, 25–29 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Haibe-Kains, B., Desmedt, C., Loi, S., Delorenzi, M., Sotiriou, C., Bontempi, G. (2008). Computational Intelligence in Clinical Oncology: Lessons Learned from an Analysis of a Clinical Study. In: Smolinski, T.G., Milanova, M.G., Hassanien, AE. (eds) Computational Intelligence in Biomedicine and Bioinformatics. Studies in Computational Intelligence, vol 151. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70778-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-70778-3_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70776-9
Online ISBN: 978-3-540-70778-3
eBook Packages: EngineeringEngineering (R0)