Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Extensive variation within the pan-genome of cultivated and wild sorghum

Abstract

Sorghum is a drought-tolerant staple crop for half a billion people in Africa and Asia, an important source of animal feed throughout the world and a biofuel feedstock of growing importance. Cultivated sorghum and its inter-fertile wild relatives constitute the primary gene pool for sorghum. Understanding and characterizing the diversity within this valuable resource is fundamental for its effective utilization in crop improvement. Here, we report analysis of a sorghum pan-genome to explore genetic diversity within the sorghum primary gene pool. We assembled 13 genomes representing cultivated sorghum and its wild relatives, and integrated them with 3 other published genomes to generate a pan-genome of 44,079 gene families with 222.6 Mb of new sequence identified. The pan-genome displays substantial gene-content variation, with 64% of gene families showing presence/absence variation among genomes. Comparisons between core genes and dispensable genes suggest that dispensable genes are important for sorghum adaptation. Extensive genetic variation was uncovered within the pan-genome, and the distribution of these variations was influenced by variation of recombination rate and transposable element content across the genome. We identified presence/absence variants that were under selection during sorghum domestication and improvement, and demonstrated that such variation had important phenotypic outcomes that could contribute to crop improvement. The constructed sorghum pan-genome represents an important resource for sorghum improvement and gene discovery.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Sorghum pan-genome.
Fig. 2: Phylogenetic relationships and distribution of genetic variation across the sorghum genome.
Fig. 3: Presence/absence variation underlying grain-colour variation in sorghum.

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during current study have been deposited in China National GeneBank database (https://db.cngb.org) under the project CNP0001440 and the Genome Sequence Archive64 in the National Genomics Data Center65, Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences under accession number CRA003806, which are publicly accessible at https://bigd.big.ac.cn/gsa.

Code availability

The code used in this manuscript is available at the GitHub repository https://github.com/xujiabao507/Sorghum_pangenome.

References

  1. Sakschewski, B., Von Bloh, W., Huber, V., Müller, C. & Bondeau, A. Feeding 10 billion people under climate change: how large is the production gap of current agricultural systems? Ecol. Modell. 288, 103–111 (2014).

    Article  Google Scholar 

  2. Clark, J. D. & Stemler, A. Early domesticated sorghum from Central Sudan. Nature 254, 588–591 (1975).

    Article  Google Scholar 

  3. Wendorf, F. et al. Saharan exploitation of plants 8,000 years bp. Nature 359, 721–724 (1992).

    Article  Google Scholar 

  4. Mace, E. S. et al. Whole-genome sequencing reveals untapped genetic potential in Africa’s indigenous cereal crop sorghum. Nat. Commun. 4, 2320 (2013).

    Article  PubMed  Google Scholar 

  5. Morris, G. P. et al. Population genomic and genome-wide association studies of agroclimatic traits in sorghum. Proc. Natl Acad. Sci. USA 110, 453–458 (2013).

    Article  CAS  PubMed  Google Scholar 

  6. Zheng, L. Y. et al. Genome-wide patterns of genetic variation in sweet and grain sorghum (Sorghum bicolor). Genome Biol. 12, R114 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Smith, O. et al. A domestication history of dynamic adaptation and genomic deterioration in Sorghum. Nat. Plants 5, 369–379 (2019).

    Article  PubMed  Google Scholar 

  8. Dewet, J. M. J. Systematics and evolution of Sorghum-sect Sorghum (Gramineae). Am. J. Bot. 65, 477–484 (1978).

    Article  Google Scholar 

  9. Wiersema, J. H. & Dahlberg, J. The nomenclature of Sorghum bicolor (L.) Moench (Gramineae). Taxon 56, 941–946 (2007).

    Article  Google Scholar 

  10. Waterhouse, R. M. et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol. Biol. Evol. 35, 543–548 (2018).

    Article  CAS  PubMed  Google Scholar 

  11. Paterson, A. H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009).

    Article  CAS  PubMed  Google Scholar 

  12. McCormick, R. F. et al. The Sorghum bicolor reference genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization. Plant J. 93, 338–354 (2018).

    Article  CAS  PubMed  Google Scholar 

  13. Deschamps, S. et al. A chromosome-scale assembly of the sorghum genome using nanopore sequencing and optical mapping. Nat. Commun. 9, 4844 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Cooper, E. A. et al. A new reference genome for Sorghum bicolor reveals high levels of sequence similarity between sweet and grain genotypes: implications for the genetics of sugar metabolism. BMC Genom. 20, 420 (2019).

    Article  Google Scholar 

  15. Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 21, 265 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Li, Y. H. et al. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat. Biotechnol. 32, 1045–1052 (2014).

    Article  CAS  PubMed  Google Scholar 

  17. Gordon, S. P. et al. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat. Commun. 8, 2184 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Wang, W. et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557, 43–49 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Zhao, Q. et al. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat. Genet. 50, 279–284 (2018).

    Google Scholar 

  20. Gao, L. et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 51, 1044–1051 (2019).

    Article  CAS  PubMed  Google Scholar 

  21. Deu, M., Rattunde, F. & Chantereau, J. A global view of genetic diversity in cultivated sorghums using a core collection. Genome 49, 168–180 (2006).

    Article  CAS  PubMed  Google Scholar 

  22. Gobena, D. et al. Mutation in sorghum LOW GERMINATION STIMULANT 1 alters strigolactones and causes Striga resistance. Proc. Natl Acad. Sci. USA 114, 4471–4476 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Zhang, L.-M. et al. Sweet sorghum originated through selection of Dry, a plant-specific NAC transcription factor gene. Plant Cell 30, 2286–2307 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Zhang, Z. et al. Genome-wide mapping of structural variations reveals a copy number variant that determines reproductive morphology in cucumber. Plant Cell 27, 1595–1604 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Jovelin, R. & Cutter, A. D. Fine-scale signatures of molecular evolution reconcile models of indel-associated mutation. Genome Biol. Evol. 5, 978–986 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Swanson-Wagner, R. A. et al. Pervasive gene content variation and copy number variation in maize and its undomesticated progenitor. Genome Res. 20, 1689–1699 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Lin, Z. et al. Parallel domestication of the Shattering1 genes in cereals. Nat. Genet. 44, 720–724 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Liu, S. et al. Overexpression of a CPYC-type glutaredoxin, OsGrxC2.2, causes abnormal embryos and an increased grain weight in rice. Front. Plant Sci. 10, 848 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Tao, Y. F. et al. Novel grain weight loci revealed in a cross between cultivated and wild sorghum. Plant Genome 11, 170089 (2018).

    Article  Google Scholar 

  30. Yu, Y. C. et al. Independent losses of function in a polyphenol oxidase in rice: differentiation in grain discoloration between subspecies and the role of positive selection under domestication. Plant Cell 20, 2946–2959 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Tao, Y. et al. Large-scale GWAS in sorghum reveals common genetic control of grain size among cereals. Plant Biotechnol. J. 18, 1093–1105 (2020).

    Article  CAS  PubMed  Google Scholar 

  32. Chopra, S., Brendel, V., Zhang, J., Axtell, J. D. & Peterson, T. Molecular characterization of a mutable pigmentation phenotype and isolation of the first active transposable element from Sorghum bicolor. Proc. Natl Acad. Sci. USA 96, 15330–15335 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Sweeney, M. T., Thomson, M. J., Pfeil, B. E. & McCouch, S. Caught red-handed: Rc encodes a basic helix–loop–helix protein conditioning red pericarp in rice. Plant Cell 18, 283–294 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Hufford, M. B. et al. Comparative population genomics of maize domestication and improvement. Nat. Genet. 44, 808–811 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Xu, X. et al. Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat. Biotechnol. 30, 105–111 (2012).

    Article  CAS  Google Scholar 

  36. Golicz, A. A. et al. The pangenome of an agronomically important crop plant Brassica oleracea. Nat. Commun. 7, 13390 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Tao, Y., Zhao, X., Mace, E., Henry, R. & Jordan, D. Exploring and exploiting pan-genomics for crop improvement. Mol. Plant 12, 156–169 (2019).

    Article  CAS  PubMed  Google Scholar 

  38. Montenegro, J. D. et al. The pangenome of hexaploid bread wheat. Plant J. 90, 1007–1013 (2017).

    Article  CAS  PubMed  Google Scholar 

  39. Jensen, S. E. et al. A sorghum practical haplotype graph facilitates genome-wide imputation and cost-effective genomic prediction. Plant Genome 13, e20009 (2020).

    Article  CAS  PubMed  Google Scholar 

  40. Wang, B., et al. Pan-genome analysis in sorghum highlights the extent of genomic variation and sugarcane aphid resistance genes. Preprint at bioRXiv https://doi.org/10.1101/2021.01.03.424980 (2021).

  41. Hackl, T., Hedrich, R., Schultz, J. & Forster, F. proovread: large-scale high-accuracy PacBio correction through iterative short read consensus. Bioinformatics 30, 3004–3011 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. GigaScience 7, gix120 (2018).

    Article  Google Scholar 

  43. Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods 17, 155–158 (2020).

    Article  CAS  PubMed  Google Scholar 

  46. Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Daccord, N. et al. High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development. Nat. Genet. 49, 1099–1106 (2017).

    Article  CAS  PubMed  Google Scholar 

  48. Kajitani, R. et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 24, 1384–1395 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Ye, C., Hill, C. M., Wu, S., Ruan, J. & Ma, Z. S. DBG2OLC: efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies. Sci. Rep. 6, 31900 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578–579 (2011).

    Article  CAS  PubMed  Google Scholar 

  51. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19, ii215–ii225 (2003).

    Article  PubMed  Google Scholar 

  55. Korf, I. Gene finding in novel genomes. BMC Bioinform. 5, 59 (2004).

    Article  Google Scholar 

  56. Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Li, L., Stoeckert, C. J. Jr. & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Hufford, M.B., et al. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Preprint at bioRXiv https://doi.org/10.1101/2021.01.14.426684 (2021).

  59. Wang, Y. P. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  61. Sun, S. et al. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nat. Genet. 50, 1289–1295 (2018).

    Article  CAS  PubMed  Google Scholar 

  62. Butler, D. G., Cullis, B. R., Gilmour, A. R. & Gogel, B. J. Technical Report: ASReml-R Reference Manual (Queensland Department of Primary Industries, 2009); http://www.vsni.co.uk/software/asreml/

  63. Liu, X., Huang, M., Fan, B., Buckler, E. S. & Zhang, Z. Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet. 12, e1005767 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Wang, Y. Q. et al. GSA: Genome Sequence Archive. Genom. Proteom. Bioinformatics 15, 14–18 (2017).

    Article  Google Scholar 

  65. Zhang, Z. et al. Database resources of the National Genomics Data Center in 2020. Nucleic Acids Res. 48, D24–D33 (2020).

    CAS  Google Scholar 

Download references

Acknowledgements

This work was undertaken as part of the initiative ‘Adapting Agriculture to Climate Change: Collecting, Protecting and Preparing Crop Wild Relatives’, which is supported by the Government of Norway. The project is managed by the Global Crop Diversity Trust with the Millennium Seed Bank of the Royal Botanic Gardens, Kew and implemented in partnership with national and international gene banks and plant breeding institutes around the world. For further information, see the project website: http://www.cwrdiversity.org/. This work was also supported by funding from the Australian Research Council through the Centre of Excellence for Translational Photosynthesis (CE1401000015), National Key R&D Program of China (2019YFD1002701 and 2018YFD1000701) and Strategic Priority Research Program of Chinese Academy of Sciences (XDA26050101).

Author information

Authors and Affiliations

Authors

Contributions

E.M., H.J., D.J. and Y.T. designed this study and coordinated the project. Y.T., X.Z., A.C. and A.H. selected samples and conducted field work. T.S., Y.L. and X.W. collected samples. J.X. and F.T. carried out the genome assembly and annotation. Y.T., H.L. and F.T. performed pan-genome analysis. Y.T. and J.X. conducted variation detection, phylogenetic analysis and selection analysis. Y.T., X.Z. and A.H. carried out GWAS analysis. Y.T. wrote the manuscript, E.M. and D.J. edited the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to David Jordan, Haichun Jing or Emma Mace.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Plants thanks Zhangjun Fei and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 A snapshot of graph-based sorghum pan-genome.

This pan-genome graph shows variation within a LGS1 region on Chromosome 5. The graph was visualised using Bandage. Yellow colour highlights the sequence segment containing LGS1. Grey colour indicates sequence segments from the reference genome BTx623. Green colour indicates sequence segments from genomes other than BTx623.

Extended Data Fig. 2 Comparison between core, shell and cloud genes.

A CDS length. Core genes are significantly longer than shell and cloud genes (p-value<2.2e-16, Wilcoxon signed rank, two-sided). B number of exons. Core genes have significantly more exons than shell and cloud genes (p-value<2.2e-16, Wilcoxon signed rank, two-sided). Sample size: core, n = 15,867; shell, n = 28,026; cloud genes, n = 186. In the box plots, center lines represent the median, the bottom and top of boxes represent the first and third percentiles, whiskers show the data that lie within the 1.5 interquartile range of the first and third quartiles.

Extended Data Fig. 3 Comparison of expression level between core, shell and cloud genes.

Expression level (FPKM, Fragments Per Kilobase of transcript per Million mapped reads) of core, shell and cloud genes were measured in six samples. Core genes consistently showed a higher expression level compared to shell and cloud genes across six genomes (p-value<2.2e-16, Wilcoxon signed rank, two-sided). Sample size in the six genomes, 353: core, n = 22,522; shell, n = 13,873; cloud, n = 78, IS3614-3: core, n = 20,786; shell, n = 12,648; cloud, n = 12, IS8525: core, n = 21,223; shell, n = 13,365; cloud, n = 12, IS929: core, n = 21,702; shell, n = 12,860; cloud, n = 35, Ji2731: core, n = 22,251; shell, n = 13,874; cloud, n = 57, PI525695: core, n = 20,445; shell, n = 11,372; cloud, n = 35. In the box plots, center lines represent the median, the bottom and top of boxes represent the first and third percentiles, whiskers show the data that lie within the 1.5 interquartile range of the first and third quartiles.

Supplementary information

Supplementary Information

Supplementary notes, Figs. 1–22 and Tables 1–16, 18–20 and 24–27.

Reporting Summary

Supplementary Tables

Supplementary Tables 17, 21, 22 and 23.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tao, Y., Luo, H., Xu, J. et al. Extensive variation within the pan-genome of cultivated and wild sorghum. Nat. Plants 7, 766–773 (2021). https://doi.org/10.1038/s41477-021-00925-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41477-021-00925-x

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing