Skip to main content

Advertisement

Log in

Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood

  • Original Article
  • Published:
Planta Aims and scope Submit manuscript

Abstract

Main conclusion

Machine-learning approaches (MLAs) for DNA barcoding outperform distance- and tree-based methods on identification accuracy and cost-effectiveness to arrive at species-level identification of wood.

DNA barcoding is a promising tool to combat illegal logging and associated trade, and the development of reliable and efficient analytical methods is essential for its extensive application in the trade of wood and in the forensics of natural materials more broadly. In this study, 120 DNA sequences of four barcodes (ITS2, matK, ndhF-rpl32, and rbcL) generated in our previous study and 85 downloaded from National Center for Biotechnology Information (NCBI) were collected to establish a reference data set for six commercial Pterocarpus woods. MLAs (BLOG, BP-neural network, SMO and J48) were compared with distance- (TaxonDNA) and tree-based (NJ tree) methods based on identification accuracy and cost-effectiveness across these six species, and also were applied to discriminate the CITES-listed species Pterocarpus santalinus from its anatomically similar species P. tinctorius for forensic identification. MLAs provided higher identification accuracy (30.8–100%) than distance- (15.1–97.4%) and tree-based methods (11.1–87.5%), with SMO performing the best among the machine learning classifiers. The two-locus combination ITS2 + matK when using SMO classifier exhibited the highest resolution (100%) with the fewest barcodes for discriminating the six Pterocarpus species. The CITES-listed species P. santalinus was discriminated successfully from P. tinctorius using MLAs with a single barcode, ndhF-rpl32. This study shows that MLAs provided higher identification accuracy and cost-effectiveness for forensic application over other analytical methods in DNA barcoding of Pterocarpus wood.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Abbreviations

BLOG:

Barcoding with logic

CITES:

Convention on International Trade in Endangered Species of Wild Fauna and Flora

MLAs:

Machine learning approaches

NCBI:

National Center for Biotechnology Information

NJ:

Neighbor Joining

SMO:

Sequential Minimal Optimization

References

  • Bertolazzi P, Felici G, Weitschek E (2009) Learning to classify species with barcodes. BMC Bioinform 10(14):S7

    Google Scholar 

  • Brancalion PHS, Almeida DRA, Vidal E, Molin PG, Sontag VE, Souza SEXF, Schulze M (2018) Fake legal logging in the Brazilian Amazon. Sci Adv 4(8):aat1192

    PubMed  PubMed Central  Google Scholar 

  • CBOL Plant Working Group (2009) A DNA barcode for land plants. Proc Natl Acad Sci USA 106(31):12794–12797

    Google Scholar 

  • Chen S, Yao H, Han J, Liu C, Song J, Shi L, Zhu Y, Ma X, Gao T, Pang X, Luo K, Li Y, Li X, Jia X, Lin Y, Leon C (2010) Validation of the ITS2 region as a novel DNA barcode for identifying medicinal plant species. PLoS One 5:e8613

    PubMed  PubMed Central  Google Scholar 

  • Collins RA, Cruickshank RH (2012) The seven deadly sins of DNA barcoding. Mol Ecol Resour 13(6):969–975

    PubMed  Google Scholar 

  • Collins RA, Boykin LM, Cruickshank RH, Armstrong KF (2012) Barcoding’s next top model: an evaluation of nucleotide substitution models for specimen identification. Methods Ecol Evol 3(3):457–465

    Google Scholar 

  • Damm S, Schierwater B, Hadrys H (2010) An integrative approach to species discovery in odonates: from character-based DNA barcoding to ecology. Mol Ecol 19(18):3881–3893

    PubMed  Google Scholar 

  • Delgado-Serrano L, Restrepo S, Bustos JR, Zambrano MM, Anzola JM (2016) Mycofier: a new machine learning-based classifier for fungal ITS sequences. BMC Res Notes 9(1):402

    PubMed  PubMed Central  Google Scholar 

  • Dormontt EE, Boner M, Braun B, Breulmann G, Degen B, Espinoza E, Gardner S, Guillery P, Hermanson JC, Koch G, Lee SL, Kanashiro M, Rimbawanto A, Thomas D, Wiedenhoelft AC, Yin Y, Zahnen J, Lowe AJ (2015) Forensic timber identification: it’s time to integrate disciplines to combat illegal logging. Biol Conserv 191:790–798

    Google Scholar 

  • Ekrema T, Willassen E, Stura E (2007) A comprehensive DNA sequence library is essential for identification with DNA barcodes. Mol Phylogenet Evol 43(2):530–542

    Google Scholar 

  • Gasson P (2011) How precise can wood identification be? Wood anatomy’s role in support of the legal timber trade, especially CITES. IAWA J 32(2):137–154

    Google Scholar 

  • Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3(2):95–99

    Google Scholar 

  • Hajibabaei M, Smith MA, Janzen DH, Rodriguez JJ, Whitefield JB, Hebert PDN (2006) A minimalist barcode can identify a specimen whose DNA is degraded. Mol Ecol Resour 6(4):959–964

    CAS  Google Scholar 

  • Han J, Zhu Y, Chen X, Liao B, Yao H, Song J, Chen S, Meng F (2013) The short ITS2 sequence serves as an efficient taxonomic sequence tag in comparison with the full-length ITS. BioMed Res Intl 2013:741476

    Google Scholar 

  • Han Y, Duan D, Ma X, Jia Y, Liu Z, Zhao G, Li Z (2016) Efficient identification of the forest tree species in Aceraceae using DNA barcodes. Front Plant Sci 7:1707

    PubMed  PubMed Central  Google Scholar 

  • Hartvig I, Czako M, Kjaer ED, Nielsen LR, Theilade I (2015) The use of DNA barcoding in identification and conservation of rosewood (Dalbergia spp.). PLoS One 10:e0138231

    PubMed  PubMed Central  Google Scholar 

  • Hassold S, Lowry PP II, Bauert MR, Razafintsalama A, Ramamonjisoa L, Widmer A (2016) DNA barcoding of Malagasy rosewoods: towards a molecular identification of CITES-listed Dalbergia species. PLoS One 11:e0157881

    PubMed  PubMed Central  Google Scholar 

  • He T, Jiao L, Yu M, Guo J, Jiang X, Yin Y (2018) DNA barcoding authentication for the wood of eight endangered Dalbergia timber species using machine learning approaches. Holzforschung. https://doi.org/10.1515/hf-2018-0076

    CAS  Google Scholar 

  • Hebert PDN, Cywinska A, Ball SL, Dewaard JR (2003) Biological identifications through DNA barcodes. Proc R Soc B Biol Sci 270(1512):313–321

    CAS  Google Scholar 

  • Hendrich L, Morinière J, Haszprunar G, Hebert PDN, Hausman A, Köhler F, Balke M (2015) A comprehensive DNA barcode database for Central European beetles with a focus on Germany: adding more than 3500 identified species to BOLD. Mol Ecol Resour 15(4):795–818

    CAS  PubMed  Google Scholar 

  • IUCN Red List of Threatened Species (2017) http://www.iucnredlist.org/. Accessed 5 Feb 2018

  • Jiao L, Yin Y, Cheng Y, Jiang X (2014) DNA barcoding for identification of the endangered species Aquilaria sinensis: comparison of data from heated or aged samples. Holzforschung 68(4):487–494

    CAS  Google Scholar 

  • Jiao L, Liu X, Jiang X, Yin Y (2015) Extraction and amplification of DNA from aged and archaeological Populus euphratica wood for species identification. Holzforschung 69(8):925–931

    CAS  Google Scholar 

  • Jiao L, Yu M, Wiedenhoeft AC, He T, Li J, Liu B, Jiang X, Yin Y (2018) DNA barcode authentication and library development for the wood of six commercial Pterocarpus species: the critical role of xylarium specimens. Sci Rep 8(1):1945

  • Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects. Science 349(6245):255–260

    CAS  PubMed  Google Scholar 

  • Kress WJ, Wurdack KJ, Zimmer EA, Weigt LA, Janzen DH (2005) Use of DNA barcodes to identify flowering plants. Proc Natl Acad Sci USA 102(23):8369–8374

    CAS  PubMed  Google Scholar 

  • Lewis SL, Edwards DP, Galbraith D (2015) Increasing human dominance of tropical forests. Science 349(6250):827–832

    CAS  PubMed  Google Scholar 

  • Li J, Cui Y, Jiang J, Yu J, Niu L, Deng J, Shen F, Zhang L, Yue B, Li J (2017) Applying DNA barcoding to conservation practice: a case study of endangered birds and large mammals in China. BioL Conserv 26(3):653–668

    Google Scholar 

  • Libbrecht MW, Nobble WS (2015) Machine learning applications in genetics and genomics. Nat Rev Genet 16(6):321–332

    CAS  PubMed  PubMed Central  Google Scholar 

  • Little DP (2014) A DNA mini-barcode for land plants. Mol Ecol Resour 14(3):437–446

    CAS  PubMed  Google Scholar 

  • Little DP, Stevenson DW (2007) A comparison of algorithms for the identification of specimens using DNA barcodes: examples from gymnosperms. Cladistics 3(1):1–21

    Google Scholar 

  • Lowe AJ, Dormontt EE, Bowie MJ, Degen B, Gardner S, Thomas D, Clarke C, Rimbawanto A, Wiedenhoeft AC, Yin Y, Sasaki N (2016) Opportunities for improved transparency in the timber trade through scientific verification. Bioscience 66(11):990–998

    Google Scholar 

  • Lowenstein JH, Amato G, Kolokotronis SO (2009) The real maccoyii: identification tuna sushi with DNA barcodes-contrasting characteristic attributes and genetic distances. PLoS One 4:e7866

    PubMed  PubMed Central  Google Scholar 

  • MacLeod N, Benfield M, Culverhouse P (2010) Time to automate identification. Nature 467(7312):154–155

    CAS  PubMed  Google Scholar 

  • McArdle BH, Anderson MJ (2001) Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82(1):290–297

    Google Scholar 

  • Meier R, Shiyang K, Vaidya G, Peter KLN (2006) DNA Barcoding and taxonomy in Diptera: a tale of high intraspecific variability and low identification success. Syst Biol 55(5):715–728

    PubMed  Google Scholar 

  • More RP, Mane RC, Purohit HJ (2016) MatK-QR classifier: a patterns based approach for plant species identification. BioData Min 9(1):39

    PubMed  PubMed Central  Google Scholar 

  • Nalepa J, Kawulok M (2018) Selecting training sets for support vector machine: a review. Artif Intell Rev 6:1–44

    Google Scholar 

  • NCBI Resource Coordinators (2016) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 44:7–19

    Google Scholar 

  • Ng KKS, Lee SL, Tnah LH, Nurul-Farhanah Z, Ng CH, Lee CT, Tani N, Diway B, Lai PS, Khoo E (2016) Forensic timber identification: a case study of a CITES listed species, Gonystylus bancanus (Thymelaeaceae). Forensic Sci Int Genet 23:197–209

    CAS  PubMed  Google Scholar 

  • Pang X, Song J, Zhu Y, Xu H, Huang L, Chen S (2010) Appling plant DNA barcodes for Rosaceae species identification. Cladistics 27(2):165–170

    Google Scholar 

  • Parveen I, Singh HK, Malik S, Raghuvanshi S, Babbar SB (2017) Evaluating five different loci (rbcL, rpoB, rpoC1, matK, and ITS) for DNA barcoding of Indian orchids. Genome 60(8):665–671

    CAS  PubMed  Google Scholar 

  • Patel N, Upadhyay S (2012) Study of various decision tree pruning methods with their empirical comparison in WEKA. Intl J Comput Appl 60(12):20–25

    Google Scholar 

  • Rach J, DeSalle R, Sarkar IN, Schierwater B, Hadrys H (2008) Character-based DNA barcoding allows discrimination of genera, species and populations in Odonata. Proc R Soc B 275(1632):237–247

    CAS  PubMed  Google Scholar 

  • Robinson JE, Sinovas P (2018) Challenges of analyzing the global trade in CITES-listed wildlife. Conserv Biol 32(5):1203–1206

    PubMed  Google Scholar 

  • Ross HA, Murugan S, Li WL (2008) Testing the reliability of genetic methods of species identification via simulation. Syst Biol 57(2):216–230

    PubMed  Google Scholar 

  • Saatchi SS, Harris NL, Brown S, Lefsky M, Mitchard ETA, Salas W, Zutta BR, Buermann W, Lewis SL, Hagen S, Petrova S, White L, Silman M, Morel A (2011) Benchmark map of forest carbon stocks in tropical regions across three continents. Proc Natl Acad Sci USA 108(24):9899–9904

    CAS  PubMed  Google Scholar 

  • Sarkar IN, Planet PL, Desalle R (2008) CAOS software for use in character-based DNA barcoding. Mol Ecol Resour 8(6):1256–1259

    CAS  PubMed  Google Scholar 

  • Saslis-Lagoudakis CH, Klitgaard BB, Forest F, Francis L, Savolainen V, Williamson EM, Hawkins JA (2011) The use of phylogeny to interpret cross-cultural patterns in plant use and guide medicinal plant discovery: an example from Pterocarpus (Leguminosae). PLoS One 6:e22275

    CAS  PubMed  PubMed Central  Google Scholar 

  • Srivathsan A, Meier R (2012) On the inappropriate use of Kimura-2-parameter (K2P) divergences in the DNA-barcoding literature. Cladistics 28(2):190–194

    Google Scholar 

  • Tanabe AS, Toju H (2013) Two new computational methods for universal DNA barcoding: a benchmark using barcode sequences of bacteria, archaea, animals, fungi and land plants. PLoS One 8:e76910

    CAS  PubMed  PubMed Central  Google Scholar 

  • Velzen RV, Weitschek E, Felici G, Bakker FT (2012) DNA barcoding of recently diverged species: relative performance of matching methods. PLoS One 7:e30490

    PubMed  PubMed Central  Google Scholar 

  • Weitschek E, Velzen R, Felici G, Bertolazzi P (2013) BLOG 2.0: a software system for character-based species classification with DNA barcode sequences. What it does, how to use it? Mol Ecol Resour 13(6):1043–1046

  • Weitschek E, Fiscon G, Felici G (2014) Supervised DNA barcodes species classification: analysis, comparisons and results. BioData Min 7:4

    PubMed  PubMed Central  Google Scholar 

  • Wiedenhoeft AC (2014) Curating xylaria. In: Salick J, Konchor K, Nesbitt M (eds) Curating biocultural collections. A handbook. Kew Publishing, London, pp 127–134

    Google Scholar 

  • Xu C, Dong W, Shi S, Cheng T, Li C, Liu Y, Wu P, Wu H, Gao P, Zhou S (2015a) Accelerating plant DNA barcode reference library construction using herbarium specimens: improved experimental techniques. Mol Ecol Resour 15(6):1366–1374

    CAS  PubMed  Google Scholar 

  • Xu S, Li D, Li J, Xiang X, Jin W, Huang W, Jin X, Huang L (2015b) Evaluation of the DNA barcodes in Dendrobium (Orchidaceae) from mainland Asia. PLoS One 10:e0115168

    PubMed  PubMed Central  Google Scholar 

  • Yan L, Liu J, Möller M, Zhang L, Zhang X, Li D, Gao L (2015) DNA barcoding of Rhododendron (Ericaceae), the largest Chinese plant genus in biodiversity hotspots of the Himalaya-Hengduan Mountains. Mol Ecol Resour 15(4):932–944

    CAS  PubMed  Google Scholar 

  • Yao H, Song J, Chang L, Luo K, Han J, Li Y, Pang X, Xu H, Zhu Y, Xiao P, Chen S (2010) Use of ITS2 region as the universal DNA barcode for plants and animals. PLoS One 5:e13102

    PubMed  PubMed Central  Google Scholar 

  • Yassin A, Markow TA, Narechania A, O’Grady PM, DeSallea R (2010) The genus Drosophila as a model for testing tree- and character-based methods of species identification using DNA barcoding. Mol Phylogent Evol 57(2):509–517

    CAS  Google Scholar 

  • Yu M, Jiao L, Guo J, Wiedenhoeft AC, He T, Jiang X, Yin Y (2017) DNA barcoding of vouchered xylarium wood specimens of nine endangered Dalbergia species. Planta 246(6):1165–1176

    CAS  PubMed  Google Scholar 

  • Yu N, Wei Y, Zhang X, Zhu N, Wang Y, Zhu Y, Zhang H, Li F, Yang L, Sun J, Sun A (2018) Barcode ITS2: a useful tool for identifying Trachelospermum jasminoides and a good monitor for medicine market. Sci Rep 7:5037

    Google Scholar 

  • Zeng C, Hollingsworth PM, Yang J, He Z, Zhang Z, Li D, Yang J (2018) Genome skimming herbarium specimens for DNA barcoding and phylogenomics. Plant Methods 14:43

    PubMed  PubMed Central  Google Scholar 

  • Zhang AB, Sikes DS, Muster C, Li SQ (2008) Inferring species membership using DNA sequences with back-propagation neural network. Syst Biol 57(2):202–215

    CAS  PubMed  Google Scholar 

  • Zhang A, Muster C, Liang H, Zhu C, Crozier R, Wan P, Feng J (2012) A fuzzy-set-theory-based approach to analyse species membership in DNA barcoding. Mol Ecol 21(8):1848–1863

    CAS  PubMed  Google Scholar 

  • Zhang AB, Hao MD, Yang CQ, Shi ZY (2017) BarcodingR: an integrated R package for species identification using DNA barcodes. Methods Ecol Evol 8(5):627–637

    Google Scholar 

  • Zou S, Li Q, Kong L, Yu H, Zheng X (2011) Comparing the usefulness of distance, mornophyly and character-based DNA barcoding methods in species identification: a case study of Neogastropoda. PLoS One 6:e26619

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was financially supported by National Natural Science Foundation of China (Grant No. 31600451), the National High-level Talent for Special Support Program of China (Grant No. W02020331), and the China Scholarship Council (Grant No. 2017-3109). We express our gratitude to Professor Xiaomei Jiang, Dr. Min Yu, Dr. Bo Liu and Dr. Prabu Ravindran for their assistance and suggestions on this study. We thank Sarah Friedrich for her help with the figure works.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yafang Yin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Fig. S1

The confusion matrix generated by the BP-neural network for all the single barcodes and their combinations (PDF 42 kb)

Fig. S2

The criterions for assessing the SMO classifier based on the four single barcodes and their combinations (PDF 65 kb)

Fig. S3

The decision trees constructed by the diagnostic position of DNA sequences based on the four barcodes and their combinations (PDF 323 kb)

Fig. S4

Identification success rates of four barcodes and their combinations based on “best match” and “best close match” functions of TaxonDNA (PDF 19 kb)

Fig. S5

Phylogenetic trees generated from the four barcodes and their combinations based on neighbor-joining analysis (PDF 367 kb)

Table S1

DNA sequences generated from our previous study (Jiao et al. 2018) and downloaded from the NCBI GenBank (XLSX 18 kb)

Table S2

The formulae generated by BLOG for discrimination of six Pterocarpus timber species based on the four barcodes and their combinations (XLSX 16 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, T., Jiao, L., Wiedenhoeft, A.C. et al. Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood. Planta 249, 1617–1625 (2019). https://doi.org/10.1007/s00425-019-03116-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00425-019-03116-3

Keywords

Navigation