Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Resource
  • Published:

A community effort to create standards for evaluating tumor subclonal reconstruction

Abstract

Tumor DNA sequencing data can be interpreted by computational methods that analyze genomic heterogeneity to infer evolutionary dynamics. A growing number of studies have used these approaches to link cancer evolution with clinical progression and response to therapy. Although the inference of tumor phylogenies is rapidly becoming standard practice in cancer genome analyses, standards for evaluating them are lacking. To address this need, we systematically assess methods for reconstructing tumor subclonality. First, we elucidate the main algorithmic problems in subclonal reconstruction and develop quantitative metrics for evaluating them. Then we simulate realistic tumor genomes that harbor all known clonal and subclonal mutation types and processes. Finally, we benchmark 580 tumor reconstructions, varying tumor read depth, tumor type and somatic variant detection. Our analysis provides a baseline for the establishment of gold-standard methods to analyze tumor heterogeneity.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Features of tumor subclonal reconstruction.
Fig. 2: Quantifying performance of subclonal reconstruction algorithms.
Fig. 3: Simulating subclonal CNAs in tumor BAM files and spiking somatic mutations.
Fig. 4: Simulated realistic tumor genomes.
Fig. 5: Error profiles of subclonal reconstruction algorithms.
Fig. 6: Impact of CNA error profiles on subclonal reconstruction.

Similar content being viewed by others

Data availability

Sequences files are available at EGA under study accession no. EGAD00001003971.

Code availability

BAMSurgeon is available at: https://github.com/adamewing/bamsurgeon. The framework for subclonal mutation simulation is available at http://search.cpan.org/~boutroslb/NGS-Tools-BAMSurgeon-v1.0.0/. The PhaseTools BAM phasing toolkit is available at https://github.com/mateidavid/phase-tools. Scripts providing the complete scoring harness are available at: https://github.com/asalcedo31/SMC-Het_Scoring/smc_het_eval.

References

  1. Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041.e21 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).

    Article  CAS  PubMed  Google Scholar 

  4. Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).

    Article  CAS  PubMed  Google Scholar 

  5. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Cooper, C. S. et al. Analysis of the genetic phylogeny of multifocal prostate cancer identifies multiple independent clonal expansions in neoplastic and morphologically normal prostate tissue. Nat. Genet. 47, 367–372 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Boutros, P. C. et al. Spatial genomic heterogeneity within localized, multifocal prostate cancer. Nat. Genet. 47, 736–745 (2015).

    Article  CAS  PubMed  Google Scholar 

  9. Caiado, F., Silva-Santos, B. & Norell, H. Intra-tumour heterogeneity—going beyond genetics. FEBS J. 283, 2245–2258 (2016).

    Article  CAS  PubMed  Google Scholar 

  10. Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Dentro, S. C., Wedge, D. C. & Van Loo, P. Principles of reconstructing the subclonal architecture of cancers. Cold Spring Harb. Perspect. Med. 7, a026625 (2017).

  12. Jiao, W., Vembu, S., Deshwar, A. G., Stein, L. & Morris, Q. Inferring clonal evolution of tumors from single nucleotide somatic mutations. BMC Bioinformatics 15, 35 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Deshwar, A. G. et al. PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol. 16, 35 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Fischer, A., Vázquez-García, I., Illingworth, C. J. R. & Mustonen, V. High-definition reconstruction of clonal composition in cancer. Cell Rep. 7, 1740–1752 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods 11, 396–398 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21, 751–759 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. de Bruin, E. C. et al. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science 346, 251–256 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Turajlic, S. et al. Deterministic evolutionary trajectories influence primary tumor crowth: TRACERx renal. Cell 173, 595–610.e11 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Espiritu, S. M. G. et al. The evolutionary landscape of localized prostate cancers drives clinical aggression. Cell 173, 1003–1013 (2018).

  20. Wedge, D. C. et al. Sequencing of prostate cancers identifies new cancer genes, routes of progression and drug targets. Nat. Genet. 50, 682–692 (2018).

  21. Gundem, G. et al. The evolutionary history of lethal metastatic prostate cancer. Nature 520, 353–357 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. McPherson, A. et al. Divergent modes of clonal spread and intraperitoneal mixing in high-grade serous ovarian cancer. Nat. Genet. 48, 758–767 (2016).

    Article  CAS  PubMed  Google Scholar 

  23. Turajlic, S. et al. Tracking cancer evolution reveals constrained routes to metastases: TRACERx renal. Cell 173, 581–594.e12 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Bolli, N. et al. Heterogeneity of genomic evolution and mutational profiles in multiple myeloma. Nat. Commun. 5, 2997 (2014).

    Article  CAS  PubMed  Google Scholar 

  25. Landau, D. A. et al. Mutations driving CLL and their evolution in progression and relapse. Nature 526, 525–530 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Van Loo, P. & Voet, T. Single cell analysis of cancer genomes. Curr. Opin. Genet. Dev. 24, 82–91 (2014).

    Article  CAS  PubMed  Google Scholar 

  27. Ewing, A. D. et al. Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat. Methods 12, 623–630 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Rosenberg, A. & Hirschberg, J. V-Measure: a conditional entropy-based external cluster evaluation measure. In Proc. 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 2830, 2007, Prague, Czech Republic (ed Eisner, J.) 410–420 (Association for Computational Linguistics, 2007).

  29. Dentro, S. C. et al. Portraits of genetic intra-tumour heterogeneity and subclonal selection across cancer types. Preprint at bioRxiv https://doi.org/10.1101/312041(2018).

  30. Lee, A. Y.-W. et al. Combining accurate tumor genome simulation with crowdsourcing to benchmark somatic structural variant detection. Genome Biol. 19, 188 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Cheng, J. et al. Pan-cancer analysis of homozygous deletions in primary tumours uncovers rare tumour suppressors. Nat. Commun. 8, 1221 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Andor, N. et al. Pan-cancer analysis of the extent and consequences of intratumor heterogeneity. Nat. Med. 22, 105–113 (2016).

    Article  CAS  PubMed  Google Scholar 

  33. Storchova, Z. & Kuffer, C. The consequences of tetraploidy and aneuploidy. J. Cell Sci. 121, 3859–3866 (2008).

    Article  CAS  PubMed  Google Scholar 

  34. Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Williams, M. J., Werner, B., Barnes, C. P., Graham, T. A. & Sottoriva, A. Identification of neutral tumor evolution across cancer types. Nat. Genet. 48, 238–244 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Sun, R. et al. Between-region genetic divergence reflects the mode and tempo of tumor evolution. Nat. Genet. 49, 1015–1024 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Williams, M. J. et al. Quantification of subclonal selection in cancer from bulk sequencing data. Nat. Genet. 50, 895–903 (2018).

  39. Tarabichi, M. et al. Neutral tumor evolution? Nat. Genet. 50, 1630–1633 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Bozic, I., Paterson, C. & Waclaw, B. On measuring selection in cancer from subclonal mutation frequencies. PLoS Comput Biol. 15, e1007368 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Campbell, P. J. et al. Pan-cancer analysis of whole genomes. Preprint at bioRxiv https://doi.org/10.1101/162784 (2017).

  42. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotech. 31, 213–219 (2013).

    Article  CAS  Google Scholar 

  43. Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012).

    Article  CAS  PubMed  Google Scholar 

  44. Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012).

    Article  CAS  PubMed  Google Scholar 

  45. Ding, J. et al. Feature-based classifiers for somatic mutation detection in tumour-normal paired sequencing data. Bioinformatics 28, 167–175 (2012).

    Article  CAS  PubMed  Google Scholar 

  46. Xu, C. A review of somatic single nucleotide variant calling algorithms for Next-Generation Sequencing data. Comput. Struct. Biotech. J. 16, 15–24 (2018).

    Article  CAS  Google Scholar 

  47. Xu, H., DiCarlo, J., Satya, R. V., Peng, Q. & Wang, Y. Comparison of somatic mutation calling methods in amplicon and whole exome sequence data. BMC Genomics 15, 244 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. McKenna, A. et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Alemany, A., Florescu, M., Baron, C. S., Peterson-Maduro, J. & van Oudenaarden, A. Whole-organism clone tracing using single-cell sequencing. Nature 556, 108–112 (2018).

    Article  CAS  PubMed  Google Scholar 

  50. Frieda, K. L. et al. Synthetic recording and in situ readout of lineage information in single cells. Nature 541, 107–111 (2017).

    Article  CAS  PubMed  Google Scholar 

  51. Van Loo, P. et al. Allele-specific copy number analysis of tumors. PNAS 107, 16910–16915 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Cribari-Neto, F. & Zeileis, A. Beta regression in R. J. Stat. Soft. 34, 1–24 (2010).

    Article  Google Scholar 

  53. P’ng, C. et al. BPG: seamless, automated and interactive visualization of scientific data. BMC Bioinform. 20, 42 (2019).

    Article  Google Scholar 

Download references

Acknowledgements

We thank the members of their laboratories for support, and Sage Bionetworks and the DREAM Challenge organization for their ongoing support of the SMC-Het Challenge. In particular, we thank T. Norman, J.C. Bare, S. Friend and G. Stolovitzky for their patience, technical support and scientific insight. We also thank R. Sun and C. Curtis for kindly sharing code for calculating the intra-tumor heterogeneity metrics and building the support vector machine predictor in multi-region sequencing simulations. This study was conducted with the support of the Ontario Institute for Cancer Research to P.C.B. and J.T.S. through funding provided by the Government of Ontario. This work was supported by Prostate Cancer Canada and is proudly funded by the Movember Foundation (Grant no. RS2014-01 to P.C.B.). This study was conducted with the support of Movember funds through Prostate Cancer Canada and with the additional support of the Ontario Institute for Cancer Research, funded by the Government of Ontario. This project was supported by Genome Canada through a Large-Scale Applied Project contract to P.C.B., S.P. Shah and R.D. Morin. This work was supported by the Discovery Frontiers: Advancing Big Data Science in Genomics Research program, which is jointly funded by the Natural Sciences and Engineering Research Council of Canada, the Canadian Institutes of Health Research (CIHR), Genome Canada and the Canada Foundation for Innovation (CFI). Q.M. is a Canada CIFAR AI chair and is supported by an Associate Investigator award from OICR. This research is part of the University of Toronto’s Medicine by Design initiative, which receives funding from the Canada First Research Excellence Fund (CFREF). J.A.W. was partially supported by an Ontario Graduate Scholarship. This work was supported by The Francis Crick Institute, which receives its core funding from Cancer Research UK (grant no. FC001202), the UK Medical Research Council (grant no. FC001202), and the Wellcome Trust (grant no. FC001202). M.T. is a postdoctoral fellow supported by the European Union’s Horizon 2020 research and innovation program (Marie Sklodowska-Curie Grant Agreement no. 747852-SIOMICS). P.V.L. is a Winton Group Leader in recognition of the Winton Charitable Foundation’s support toward the establishment of The Francis Crick Institute. This project was enabled through access to the MRC eMedLab Medical Bioinformatics infrastructure, supported by the UK Medical Research Council (grant no. MR/L016311/1 to M.T. and P.V.L.). A.S. was partly supported by a CIHR CGS-doctoral award. P.C.B. was supported by a Terry Fox Research Institute New Investigator Award and a CIHR New Investigator Award. D.C.W. is supported by the Li Ka Shing foundation. The Galaxy portions of the evaluation system were supported by National Institutes of Health (NIH) grant nos. U41 HG006620 and R01 AI134384-01 as well as NSF grant no. 1661497. The following NIH grants supported this work: no. R01-CA180778 (to J.M.S.), no. U24-CA143858 (to J.M.S.) and no. P30-CA008748 (to Thompson, subgrant to Q.M.). We thank Google Inc. (in particular N. Deflaux) for their ongoing support of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. This work was supported by the NIH/NCI under award no. P30CA016042.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

All authors edited and approved the final manuscript. A.S. wrote the first draft of the paper, designed experiments, performed statistical analyses, performed bioinformatics analyses and performed data visualization. M.T. wrote the first draft of the paper, designed experiments, generated tools and reagents, performed statistical analyses, performed bioinformatics analyses and performed data visualization. S.M.G.E. wrote the first draft of the paper, generated tools and reagents, performed bioinformatics analyses and performed data visualization. A.G.D. wrote the first draft of the paper, designed experiments, generated tools and reagents and performed bioinformatics analyses. M.D., S.D., L.Y.L., S.S., H.Z. J.M.C., A.B., C.M.L., I.U. and B.L. generated tools and reagents. K.Z. and T.-H.O.Y. generated tools and reagents and performed bioinformatics analyses. A.D.E. generated tools and reagents and supervised research. N.M.W. performed bioinformatics analyses and performed data visualization. J.A.W., M.K., H.Z. and C.V.A. performed bioinformatics analyses. C.P. performed data visualization. J.T.S., J.M.S., D.A. and Y.G. supervised research. K.E. wrote the first draft of the paper and supervised research. D.C.W. designed experiments and supervised research. Q.M. wrote the first draft of the paper, designed experiments, generated tools and reagents and supervised research. P.V.L. wrote the first draft of the paper, designed experiments and supervised research. P.C.B. wrote the first draft of the paper, designed experiments and supervised research.

Corresponding author

Correspondence to Paul C. Boutros.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Materials

Supplementary Figs. 1–6, Tables 2–5 and Notes 1–3.

Reporting Summary

Supplementary Table 1

Benchmark scores Unnormalized benchmark scores for all tumors and all subchallenges varying depth, mutation caller, CNA input, and subclonal reconstruction algorithms. The number of SNVs detected (SNVs), false positive (FP), false negative (FN), true positive (TP) and true negative (TN) rates for SNV detection are included as well the estimated cF.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Salcedo, A., Tarabichi, M., Espiritu, S.M.G. et al. A community effort to create standards for evaluating tumor subclonal reconstruction. Nat Biotechnol 38, 97–107 (2020). https://doi.org/10.1038/s41587-019-0364-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-019-0364-z

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer