Skip to main content
Log in

Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data

  • Research Paper
  • Published:
Science China Life Sciences Aims and scope Submit manuscript

Abstract

Pigs were domesticated independently in the Near East and China, indicating that a single reference genome from one individual is unable to represent the full spectrum of divergent sequences in pigs worldwide. Therefore, 12 de novo pig assemblies from Eurasia were compared in this study to identify the missing sequences from the reference genome. As a result, 72.5 Mb of non-redundant sequences (∼3% of the genome) were found to be absent from the reference genome (Sscrofa11.1) and were defined as pan-sequences. Of the pan-sequences, 9.0 Mb were dominant in Chinese pigs, in contrast with their low frequency in European pigs. One sequence dominant in Chinese pigs contained the complete genic region of the tazarotene-induced gene 3 (TIG3) gene which is involved in fatty acid metabolism. Using flanking sequences and Hi-C based methods, 27.7% of the sequences could be anchored to the reference genome. The supplementation of these sequences could contribute to the accurate interpretation of the 3D chromatin structure. A web-based pan-genome database was further provided to serve as a primary resource for exploration of genetic diversity and promote pig breeding and biomedical research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data availability

The sequencing reads of each sequencing library have been deposited at NCBI for Hi-C data (Project ID: PRJNA482496). The assembly of the pig pan-genome and subsequent analysis results are available from our PIGPAN website (http://animal.nwsuaf.edu.cn/code/index.php/pan-Pig). All other data supporting the findings of this study are available in the article and its supplementary information files are available from the corresponding author on request.

References

  • Ai, H., Fang, X., Yang, B., Huang, Z., Chen, H., Mao, L., Zhang, F., Zhang, L., Cui, L., He, W., et al. (2015). Adaptation and possible ancient interspecies introgression in pigs identified by whole-genome sequencing. Nat Genet 47, 217–225.

    Article  CAS  PubMed  Google Scholar 

  • Arumemi, F., Bayles, I., Paul, J., and Milcarek, C. (2013). Shared and discrete interacting partners of ELL1 and ELL2 by yeast two-hybrid assay. ABB 04, 774–780.

    Article  CAS  Google Scholar 

  • Blanco, E., Parra, G., and Guigo, R. (2007). Using geneid to identify genes. Curr Protoc Bioinformatics Chapter 4, Unit 4.3.

  • Burge, C.B., and Karlin, S. (1998). Finding the genes in genomic DNA. Curr Opin Struct Biol 8, 346–354.

    Article  CAS  PubMed  Google Scholar 

  • Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., and Madden, T.L. (2009). BLAST+: architecture and applications. BMC BioInf 10, 421.

    Article  CAS  Google Scholar 

  • Casper, J., Zweig, A.S., Villarreal, C., Tyner, C., Speir, M.L., Rosenbloom, K.R., Raney, B.J., Lee, C.M., Lee, B.T., Karolchik, D., et al. (2017) OUP accepted manuscript. Nucleic Acids Res.

  • Christopoulos, A., Ligoudistianou, C., Bethanis, P., and Gazouli, M. (2018). Successful use of adipose-derived mesenchymal stem cells to correct a male breast affected by Poland Syndrome: a case report. J Surg Case Rep 2018(7), rjy151.

    Article  PubMed  PubMed Central  Google Scholar 

  • Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J. S., and Ren, B. (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Doerks, T., Copley, R.R., Schultz, J., Ponting, C.P., and Bork, P. (2002). Systematic identification of novel protein domain families associated with nuclear functions. Genome Res 12, 47–56.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Dong, P., Tu, X., Chu, P.Y., Lü, P., Zhu, N., Grierson, D., Du, B., Li, P., and Zhong, S. (2017). 3D chromatin architecture of large plant genomes determined by local A/B compartments. Mol Plant 10, 1497–1509.

    Article  CAS  PubMed  Google Scholar 

  • Durand, N.C., Shamim, M.S., Machol, I., Rao, S.S.P., Huntley, M.H., Lander, E.S., and Aiden, E.L. (2016). Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst 3, 95–98.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Fang, X., Mou, Y., Huang, Z., Li, Y., Han, L., Zhang, Y., Feng, Y., Chen, Y., Jiang, X., Zhao, W., et al. (2012). The sequence and analysis of a Chinese pig genome. Gigascience 1, 16.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Frantz, L.A.F., Schraiber, J.G., Madsen, O., Megens, H.J., Cagan, A., Bosse, M., Paudel, Y., Crooijmans, R.P.M.A., Larson, G., and Groenen, M.A.M. (2015). Evidence of long-term gene flow and selection during domestication from analyses of Eurasian wild and domestic pig genomes. Nat Genet 47, 1141–1148.

    Article  CAS  PubMed  Google Scholar 

  • Frazee, A.C., Pertea, G., Jaffe, A.E., Langmead, B., Salzberg, S.L., and Leek, J.T. (2015). Ballgown bridges the gap between transcriptome assembly and expression analysis. Nat Biotechnol 33, 243–246.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Golicz, A.A., Bayer, P.E., Barker, G.C., Edger, P.P., Kim, H.R., Martinez, P. A., Chan, C.K.K., Severn-Ellis, A., McCombie, W.R., Parkin, I.A.P., et al. (2016). The pangenome of an agronomically important crop plant Brassica oleracea. Nat Commun 7, 13390.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gordon, S.P., Contreras-Moreira, B., Woods, D.P., Des Marais, D.L., Burgess, D., Shu, S., Stritt, C., Roulin, A.C., Schackwitz, W., Tyler, L., et al. (2017). Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat Commun 8, 2184.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Groenen, M.A.M., Archibald, A.L., Uenishi, H., Tuggle, C.K., Takeuchi, Y., Rothschild, M.F., Rogel-Gaillard, C., Park, C., Milan, D., Megens, H.J., et al. (2012). Analyses of pig genomes provide insight into porcine demography and evolution. Nature 491, 393–398.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Guirao-Rico, S., Ramirez, O., Ojeda, A., Amills, M., and Ramos-Onsins, S. E. (2018). Porcine Y-chromosome variation is consistent with the occurrence of paternal gene flow from non-Asian to Asian populations. Heredity 120, 63–76.

    Article  CAS  PubMed  Google Scholar 

  • Hirsch, C.N., Foerster, J.M., Johnson, J.M., Sekhon, R.S., Muttoni, G., Vaillancourt, B., Peñagaricano, F., Lindquist, E., Pedraza, M.A., Barry, K., et al. (2014). Insights into the maize pan-genome and pan-transcriptome. Plant Cell 26, 121–135.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Jeong, H., Song, K.D., Seo, M., Caetano-Anollés, K., Kim, J., Kwak, W., Oh, J.D., Kim, E.S., Jeong, D.K., Cho, S., et al. (2015). Exploring evidence of positive selection reveals genetic basis of meat quality traits in Berkshire pigs through whole genome sequencing. BMC Genet 16, 104.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Kent, W.J. (2002). BLAT—The BLAST-like alignment tool. Genome Res 12, 656–664.

    CAS  PubMed  PubMed Central  Google Scholar 

  • Kim, D., Langmead, B., and Salzberg, S.L. (2015). HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12, 357–360.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Knight, P.A., and Ruiz, D. (2013). A fast algorithm for matrix balancing. IMA J Numer Anal 33, 1029–1047.

    Article  Google Scholar 

  • Kumar, S., Stecher, G., and Tamura, K. (2016). MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33, 1870–1874.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Larson, G., Dobney, K., Albarella, U., Fang, M., Matisoo-Smith, E., Robins, J., Lowden, S., Finlayson, H., Brand, T., Willerslev, E., et al. (2005). Worldwide phylogeography of wild boar reveals multiple centers of pig domestication. Science 307, 1618–1621.

    Article  CAS  PubMed  Google Scholar 

  • Leung, D., Jung, I., Rajagopal, N., Schmitt, A., Selvaraj, S., Lee, A.Y., Yen, C.A., Lin, S., Lin, Y., Qiu, Y., et al. (2015). Integrative analysis of haplotype-resolved epigenomes across human tissues. Nature 518, 350–354.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and Durbin, R. (2009). The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Li, M., Chen, L., Tian, S., Lin, Y., Tang, Q., Zhou, X., Li, D., Yeung, C.K.L., Che, T., Jin, L., et al. (2017). Comprehensive variation discovery and recovery of missing sequence in the pig genome using multiple de novo assemblies. Genome Res 27, 865–874.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li, M., Tian, S., Jin, L., Zhou, G., Li, Y., Zhang, Y., Wang, T., Yeung, C.K.L., Chen, L., Ma, J., et al. (2013). Genomic analyses identify distinct patterns of selection in domesticated pigs and Tibetan wild boars. Nat Genet 45, 1431–1438.

    Article  CAS  PubMed  Google Scholar 

  • Li, R., Li, Y., Zheng, H., Luo, R., Zhu, H., Li, Q., Qian, W., Ren, Y., Tian, G., Li, J., et al. (2010). Building the sequence map of the human pan-genome. Nat Biotechnol 28, 57–63.

    Article  CAS  PubMed  Google Scholar 

  • Li, Y., Zhou, G., Ma, J., Jiang, W., Jin, L., Zhang, Z., Guo, Y., Zhang, J., Sui, Y., Zheng, L., et al. (2014). De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat Biotechnol 32, 1045–1052.

    Article  CAS  PubMed  Google Scholar 

  • Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., et al. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20, 1297–1303.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Monat, C., Pera, B., Ndjiondjop, M.N., Sow, M., Tranchant-Dubreuil, C., Bastianelli, L., Ghesquière, A., and Sabot, F. (2016). de novo assemblies of three Oryza glaberrima accessions provide first insights about pan-genome of African rices. Genome Biol Evol evw253.

  • Morgulis, A., Gertz, E.M., Schäffer, A.A., and Agarwala, R. (2006). WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–141.

    Article  CAS  PubMed  Google Scholar 

  • Neafsey, D.E., Waterhouse, R.M., Abai, M.R., Aganezov, S.S., Alekseyev, M.A., Allen, J.E., Amon, J., Arcà, B., Arensburger, P., Artemov, G., et al. (2015). Highly evolvable malaria vectors: The genomes of 16 Anopheles mosquitoes. Science 347, 1258522–43.

    Article  PubMed  CAS  Google Scholar 

  • Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.C., Mendell, J.T., and Salzberg, S.L. (2015). StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33, 290–295.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rao, S.S.P., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., et al. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ron, G., Globerson, Y., Moran, D., and Kaplan, T. (2017). Promoter-enhancer interactions identified from Hi-C data using probabilistic models and hierarchical topological domains. Nat Commun 8, 2237.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Schatz, M.C., Maron, L.G., Stein, J.C., Hernandez Wences, A., Gurtowski, J., Biggers, E., Lee, H., Kramer, M., Antoniou, E., Ghiban, E., et al. (2014). Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica. Genome Biol 15, 506.

    PubMed  PubMed Central  Google Scholar 

  • Shen, W., Le, S., Li, Y., and Hu, F. (2016). SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS ONE 11, e0163962.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Sherman, R.M., Forman, J., Antonescu, V., Puiu, D., Daya, M., Rafaels, N., Boorgula, M.P., Chavan, S., Vergara, C., Ortega, V.E., et al. (2019). Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat Genet 51, 30–35.

    Article  CAS  PubMed  Google Scholar 

  • Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S., and Morgenstern, B. (2006). AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, W435–W439.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sun, C., Hu, Z., Zheng, T., Lu, K., Zhao, Y., Wang, W., Shi, J., Wang, C., Lu, J., Zhang, D., et al. (2017). RPAN: rice pan-genome browser for ∼3000 rice genomes. Nucleic Acids Res 45, 597–605.

    Article  CAS  PubMed  Google Scholar 

  • Uyama, T., Ichi, I., Kono, N., Inoue, A., Tsuboi, K., Jin, X.H., Araki, N., Aoki, J., Arai, H., and Ueda, N. (2012). Regulation of peroxisomal lipid metabolism by catalytic activity of tumor suppressor H-rev107. J Biol Chem 287, 2706–2718.

    Article  CAS  PubMed  Google Scholar 

  • Vaccari, C.M., Romanini, M.V., Musante, I., Tassano, E., Gimelli, S., Divizia, M.T., Torre, M., Morovic, C.G., Lerone, M., Ravazzolo, R., et al. (2014). De novo deletion of chromosome 11q12.3 in monozygotic twins affected by Poland Syndrome. BMC Med Genet 15, 63.

    Article  PubMed  PubMed Central  Google Scholar 

  • Wang, X., Zheng, Z., Cai, Y., Chen, T., Li, C., Fu, W., and Jiang, Y. (2017). CNVcaller: highly efficient and widely applicable software for detecting copy number variations in large populations. GigaScience 6.

  • Wong, K.H.Y., Levy-Sakin, M., and Kwok, P.Y. (2018). De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations. Nat Commun 9, 3040.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Xiao, S., Xie, D., Cao, X., Yu, P., Xing, X., Chen, C.C., Musselman, M., Xie, M., West, F.D., Lewin, H.A., et al. (2012). Comparative epigenomic annotation of regulatory DNA. Cell 149, 1381–1392.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Xie, C., Mao, X., Huang, J., Ding, Y., Wu, J., Dong, S., Kong, L., Gao, G., Li, C.Y., and Wei, L. (2011). KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res 39, W316–W322.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Yan, G., Zhang, G., Fang, X., Zhang, Y., Li, C., Ling, F., Cooper, D.N., Li, Q., Li, Y., van Gool, A.J., et al. (2011). Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques. Nat Biotechnol 29, 1019–1023.

    Article  CAS  PubMed  Google Scholar 

  • Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B. E., Nussbaum, C., Myers, R.M., Brown, M., Li, W., et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Zhao, Q., Feng, Q., Lu, H., Li, Y., Wang, A., Tian, Q., Zhan, Q., Lu, Y., Zhang, L., Huang, T., et al. (2018). Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat Genet 50, 278–284.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (31822052 and 31572381) to Y.J and the Science & Technology Support Program of Sichuan (2016NYZ0042 and 2017NZDZX0002) to M.Z.L. We thank the High Performance Computing platform of Northwest A&F University for their assistance with the computing.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mingzhou Li or Yu Jiang.

Ethics declarations

Compliance and ethics The author(s) declare that they have no conflict of interest.

Supplementary Materials for

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, X., Li, R., Fu, W. et al. Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data. Sci. China Life Sci. 63, 750–763 (2020). https://doi.org/10.1007/s11427-019-9551-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11427-019-9551-7

Keywords

Navigation