Introduction

The world is facing an enormous challenge to feed a growing population with huge economic disparity and uneven development across the globe. Among all the continents, Africa is predicted to contribute to half of the world’s population growth by 2050 (FAO 2017). Nigeria will be the third most populous country by 2050 and, along with the Democratic Republic of Congo, Ethiopia, Tanzania, and Uganda, will contribute to most of the world’s population growth (United Nations 2017). At the same time, climate change scenarios also threaten current production levels with anticipated decline in the production of staple crops (Dinesh et al. 2015). All these factors together are going to aggravate the burgeoning problem of nutritional insufficiency in the African population. At present, among the top 36 countries with a high hidden hunger index (HHI, a combination of iron deficiency, zinc deficiency, and vitamin A deficiency), 31 are from sub-Saharan Africa (SSA) and they harbor 91% of the world’s HHI-affected pre-school children (Muthayya et al. 2013). Among the adults, every fourth African above the age of 15 years has suffered some form of hunger (Food and Agriculture Organization 2017). The adulthood diseases called as non-communicable diseases (NCDs), the majority of which are exacerbated by malnutrition, will be the major cause of deaths in SSA by 2030 (Food and Agriculture Organization 2017). Most of these countries are ill-prepared to tackle these problems, as 33 of the 44 resource-poor, least developed countries are in the African continent (Food and Agriculture Organization 2017). Thus, the economic disparity, which is closely linked with development and health indices, has a huge impact on the present and future human capital, which is a prime driver of a country’s growth and development. These problems should be urgently tackled to fulfill the ambitious sustainable development goals (SDG) of ‘zero hunger’ (SDG 2) and ‘good health and well-being” (SDG 3) by 2030, as agreed by the United Nations General Assembly (United Nations 2015). It is a well-accepted fact that access to good quality, diverse, healthy, and nutritious food has an ability to immensely improve the quality of life at all the stages (Micronutrient Initiative 2009).

The Green Revolution that substantially increased agricultural production in the last century was mainly based on improved grain yield and focused primarily on wheat and rice. While effective in saving lives, it also resulted in monoculture, decreased dietary diversity, and destruction of natural habitat/forest, (Pingali 2012). On similar lines, the African countries have traditionally been positively over-incentivizing production and consumption of maize, which has reached a point where there is an unbalanced ratio of calorie intake and other dietary components such as minerals, vitamins, and proteins (Madzivhandila et al. 2016). To overcome the devastating effects of hunger as well as malnutrition and hidden hunger, solutions beyond field-based productivity measurements and calorie sufficiency must be achieved. Traditional African crops can provide the balance needed in African agri-food system through a systems and landscape approach, which is productive, as well as nutrient dense, diverse, resilient to climate change, and culturally linked with food habits of the local communities.

Africa and the African Orphan Crops Consortium (AOCC)

Africa is bestowed with immense diversity of traditional food crops which are being consumed by local communities since time immemorial. These crops are distributed discretely across the farming landscapes, but most often with low yields due to low or almost no investment (monetary, human resource, organizational, policy, government, crop improvement, etc.) into research, development, marketing, and any other portfolio. Thus, these neglected, under-researched, “orphan” crops have not benefitted from modern scientific and technological advancements. Nevertheless, due to the potential they hold to address food, nutritional, and economic security of the developing and undeveloped parts of the world, they are appropriately referred to as crops for the future (Baldermann et al. 2016). Most African orphan crops have been consumed by local communities without extensive selection and domestication, thereby maintaining genetic diversity for stress and disease tolerance, adaptability, and production traits as well as nutrition. They are rich in micro- and macronutrients and can adapt to changing climatic conditions as well as diseases and pests (Baldermann et al. 2016). Thus, these crops have immense scope for innovation in research, capacity building, social empowerment, and food value chains (i.e., production, processing, consumption, marketing, and product development), but need a nonconventional approach which is open, inclusive, and welcoming to involvement and investment from public, private, national, and international partners. Understanding this as an important African agenda, the African Orphan Crop Consortium (AOCC) was established with the full support of the African Union in 2011, mandated to work on 101 selected crops originated or naturalized in Africa (http://africanorphancrops.org) by investing in training, products, tools, services, practices, and processes to mainstream them into the African agri-food system.

Selection of AOCC crops

The list of 101 species of African orphan crops (http://africanorphancrops.org/meet-the-crops) important to African agriculture and agroforestry systems was drawn from an Africa-centric survey conducted by African Union’s (AU) New Partnership for African Development (NEPAD) with participation from African agricultural scientists, sociologists, anthropologists, nutritionists, policy makers, farmers, government representatives, universities, and various other stakeholders (Hendre and Van Deynze 2015). Three primary selection criteria used for prioritization were (i) being rich in micro- and macronutrient contents, (ii) relevant to Africa, and (iii) need for developing breeding resources. The list was judiciously spread across woody trees/shrubs (50), non-woody annuals (33), non-woody fruits (3), one palm, two each of woody climber fruits, perennial vegetables, perennial rhizomes, and one each from a perennial root, non-tree fruit, and perennial succulent species. It comprises 28 orders and 45 families. Table 1 gives information about the stage of resource generation for 60 species and areas in Africa, where they are grown or found. In addition, Supplementary Table 1 lists the nutritional importance of some of the AOCC trees and crops. The list of crops is generally fixed, but is open to evolve with time and need.

Table 1 Progress of African Orphan Crop Consortium (AOCC) for three major activities in genomics workflow

Application of genomics

Genomics technology has been positively contributing to producing quality seed stocks with desirable traits by providing tools and methods to have varieties with high productivity, high nutrients, and tolerance to various biotic and abiotic stresses (Kole et al. 2015). Genomic resources for a crop include the genome sequence, annotation of gene functions and extant diversity in the gene pool. All these together help in developing single nucleotide polymorphism (SNP) and other variant panels, which can be used to associate and introgress traits of economic and agronomic importance. Until now, 236 plant genomes have been sequenced using next-generation sequencing technologies (Chen et al. 2018) of varying quality. It can also be seen that some of the crops, which were considered as orphan crops in the last century, have now entered the next era of breeding and improvement after their genomes were sequenced and molecular breeding tools have been adopted for many of them, e.g., sorghum (Paterson et al. 2009; Fernandes et al. 2018), pearl millet (Varshney et al. 2017; Liang et al. 2018), peanut (Bertioli et al. 2016; Varshney 2016; Janila et al. 2016), chickpea (Varshney et al. 2013, Li et al. 2018), foxtail millet (Zhang et al. 2012), tef (Cannarozzi et al. 2014), finger millet (Hittalmani et al. 2017), cowpea (Boukar et al. 2018), bitter gourd (Urasaki et al. 2017), and cucurbits (Zheng et al. 2019). The AOCC-mandated genomes, if already sequenced and published, will be assessed for their quality and if needed will be corroborated using complementary sequencing technologies.

A need to attract funds by building investment cases to develop research and development programs for these orphan crops was aptly explained by Dawson et al. (2018). This report emphasized the fact that those crops which received some investments in their research and development programs had an increased yield. Thus, inclusion of genomics tools in designing breeding programs such as quantitative trait mapping, genome-wide association mapping (GWAS), and genomic selection (GS) has a huge potential to contribute by reducing the length of varietal improvement program up to a third of the existing traditional pipelines, which do not use trait–marker associations (Hickey et al. 2017).

With the publication of the first set of five genomes (Chang et al. 2018a, b, c, d, e, f), we have published the standard data generation, analysis, and annotation pipeline which is being used by AOCC in the sequencing projects. But as explained elsewhere, these pipelines will evolve to suit technological and analytical updates. As some genomes are complex and with subtle or major genomic duplications, they pose significant challenges for genome assembly and calls for technological upgradation to long-read sequencing technologies, which the consortium has already taken a note of. A general roadmap to breed orphan crops using genomic tools is also well laid out by Sogbohossou et al. (2018) using Gynandropsis gynandra (cleome) as an example. This signifies and justifies the initial investment necessary to generate genomic resources to aid in modernizing breeding programs. Using a similar model to develop improved varieties/clones but encouraging diversified farming landscape will limit some drawbacks associated with green revolution technologies such as monoculture, high input, and high emission practices. Publication of the first set of five orphan crop genomes was considered as an important step to boost research and improvement for these species (Tena 2019).

Partnership

The consortium is an uncommon partnership between public, private, academic institutes, universities, and non-government, international, and development organizations (Hendre and Van Deynze 2015). The consortium has a healthy mix of partners who bring firsthand scientific knowledge (BGI, Shenzhen, China; University of California, Davis, USA (UCD); World Agroforestry (ICRAF), Nairobi, Kenya; Agriculture Research Council of South Africa (ARC), Pretoria, South Africa; Wageningen University and Research, Wageningen, Netherlands; Ghent University, Ghent, Belgium; CyVerse, Tucson, USA); offer analytical and computational capacity (BGI, Shenzhen, China; Benson Hill Biosystems, St. Louis, USA; CyVerse, Tucson, USA; Ghent University, Ghent, Belgium); provide sequencing, genotyping and other ancillary technologies, supplies, data logistics, and infrastructure (Illumina Inc., San Diego, USA; LGC Genomics, Hoddesdon, UK; Google Inc., Mountain View, USA; Ghent University, Ghent, Belgium; KeyGene Inc., Rockville, USA; Oxford Nanopore Technologies, London, UK; Thermo Fisher Scientific, Waltham, USA); share experience of industrial food processing, technological up-scaling, marketing, and building seed value chains (Mars Incorporated., McLean, USA; Corteva Agriscience, Johnston, USA; Benson Hill Biosystems, St. Louis, USA); work on African development questions, advocacy, and policy framework (World Wildlife Fund (WWF), Gland, Switzerland; New Partnership for African Development (NEPAD), Midrand, South Africa; African Alliance for a Green Revolution for Africa (AGRA), Nairobi, Kenya; Food and Agriculture Organization (FAO), Rome, Italy; United Nation’s International Children’s Emergency Fund (UNICEF), Nairobi, Kenya); build training and capacity; provide funding support; and include ground-level organizations working on agricultural questions, germplasm repositories/gene banks, crop/tree improvement and breeding, socioeconomic context, and translational research (UC Davis African Plant Breeding Academy (AfPBA); Bioscience eastern central Africa-International Livestock Research Institute (BecA-ILRI), Nairobi, Kenya; Mars Incorporated, ICRAF, UNICEF, Integrated Breeding Platform, El Batan, Mexico). The partners and their contribution to the consortium are summarized in Supplementary Fig. 1 and described in Supplementary Table 2. The AOCC has grown from 5 founding members, NEPAD, WWF, Mars Incorporated, ICRAF, and UCD, to 24 core members involved in strategizing and developing AOCC’s roadmap and they bring onboard a substantial and broad-based expertise and complementing skill sets cutting across all the crops. The sequencing work is coordinated from the genomics laboratory of AOCC located at ICRAF, Nairobi, Kenya. Initially, four Ion Protons donated by Thermo Fisher Scientific in 2014 were used for re-sequencing, which was subsequently upgraded in 2017 to HiSeq4000 donated by Illumina Inc. An open network of more than 25 institutes and organizations primarily handling crop-specific portfolios also supports ground activities related with field experimentation, crop breeding, and translational research (http://africanorphancrops.org/partners-and-networks/). Most importantly, the AOCC is currently made up of over 116 African plant breeders from 28 countries, who are trained in the AFPBA and leaders in their institutions.

Timelines

The timelines to achieve the sequence targets are dependent largely on funding support, technology upgrades, building national crop breeders’ networks, successful advocacy, and policy initiatives. The whole genome sequencing is anticipated to be completed by 2020 and re-sequencing by 2022.

Major constraints

Working on orphan crops in the context of Africa poses significant challenges such as availability of reliable cultivation and production data, nutritional quality data, information on breeding technologies, and data on local and regional markets and value chains (Dawson et al. 2018). The AOCC will gather this information by working with local and regional partners and by encouraging the administrative bodies to collect these data through advocacy and policy measures. Apart from these important considerations, funding support is a major decisive factor. The plant breeders trained under the AfPBA as well as other collaborators, and networking partners are encouraged to support sequencing and re-sequencing activities through bilateral projects. Efforts are always on to garner support from national, international, and private funding bodies.

Important achievements of the consortium

The consortium has been successful in gathering world class expertise from diverse and unusual partners and collaborators across the research, science, technology, social, commercial, and development sectors of the agri-food system under a single net. A genomics laboratory was established at ICRAF, which also acts as a secretariat of the consortium. This laboratory hosts a HiSeq 4000 sequencing system donated by Illumina Inc. and all the necessary instrumentation and workflows. Shared responsibilities of all the partners are listed in Supplementary Table 2 and described in Supplementary Fig. 1. Various partners are involved in building investment cases, budget, and projections of deliverables with timelines.

Present status and progress

As an immediate goal, AOCC started developing genomics resources—reference genome sequencing transcriptome sequencing and re-sequencing of 100 accessions/species for these 101 crops. Extraction of nucleic acids posed peculiar challenges, as many of the species contain large quantities of mucilage, phenolics, and secondary metabolites and the processes were modified to suit the quality requirements. DNA extraction primarily used extensively modified manual CTAB-based methods, whereas RNA extraction was done using any suitable kit with certain modifications if required. Reference genome sequencing is attempted using short-read NGS technology from Illumina, and BGI-Seq, but now has been expanded to long-read and scaffolding technologies. Transcriptome sequencing is carried out using Illumina’s short-read technology, but now also extended to Oxford NanoPore. The re-sequencing was initially done over Ion Proton machines (generously donated by the then Life Technologies, Carlsbad, CA, USA now Thermo Fisher Scientific Inc.), but now shifted to Illumina’s short-read NGS technology on HiSeq 4000 (generously donated by Illumina Inc.). Standard protocols recommended by the respective supplier are used for all these activities unless some tweaking is required to suit genome complexity or any unexplained problems.

The workflow of the three activities, provided in Fig. 1, runs in parallel to each other. The first step was species prioritization and is explained above in “Selection of AOCC crops”. Material procurement involved either getting DNA, clones, or seeds from the respective collaborator(s) and growing them in a nursery. The accessions used for reference genome sequencing were selected by the researchers/breeders working in the respective crops/trees to be physically accessible with minimal passport data. The first step for whole genome sequencing (WGS) is heterozygosity survey and k-mer analysis over a shallow sequencing data (5–10 Gb). This analysis helps to understand genome complexity to decide the course of sequencing and analysis (amount of data needed, analysis pipeline, etc.). The second activity is generation of transcriptome data by sequencing an RNA pool of 8–12 tissues/developmental stages. The third activity of re-sequencing is done by sequencing 8–15 times depth/accession for 100 selected accessions/species. The panel for re-sequencing is selected to represent genetic, trait, and/or geographic diversity present within Africa or diversity important for trait improvement. The selection of material is primarily done by the breeders or groups working on these crops with inputs from the AOCC team. Standard processes were followed for genome assembly, transcriptome assembly (Chang et al. 2018a), and then mapping of re-sequenced reads. All the three activities happen independent of each other and get merged when individual results are available.

Fig. 1
figure 1

Workflow used by African Orphan Crops Consortium (AOCC) to generate genomics resources for 101 African orphan crops. The three parallel workflows are independent of each other and each gets initiated as soon as material is available; however, to develop SNP panels, information from all the three sources are taken into account

The present status of AOCC for the three major activity workflows is illustrated in Table 1. Six reference genome sequences have been already published (Lablab purpureus, Vigna subterranea, Faidherbia albida, Sclerocarya birrea, and Moringa oleifera (Chang et al. 2018a), Solanum aethiopicum (Song et al. 2019). The curated whole genome sequences are now freely available in the GigaScience database (GigaDB) (Chang et al. 2018b, c, d, e, f, g; Tena 2019; Song et al. 2019). Six other genomes are in the final stages of assembly (Eleusine coracana, Digitaria exilis, Gynandropsis gynandra, Annona cherimola, Artocarpus heterophyllus, and Artocarpus altilis). Twenty-one species are in the pipeline for WGS, 17 under transcriptome sequencing, and 15 under re-sequencing workflow. Re-sequencing has been partially completed for five species (Eleusine coracana, Vigna subterranea, Faidherbia albida, Moringa oleifera, Gynandropsis gynandra). As soon as DNA/RNA gets logged into the workflow, the status is shown as “in the pipeline”. A total of 46 species have been touched upon by AOCC where at least one of the three activities has been initiated. In addition, 19 species have been sequenced by our collaborators or other partners.

African Plant Breeding Academy (AfPBA)

UC Davis African Plant Breeding Academy (AfPBA, http://pba.ucdavis.edu/PBA_in_Africa/) is a capacity building arm of AOCC, which plans to train 150 mid-career African plant breeders by 2021. This training is designed to enable African plant breeders in the use of genomics-based data in their breeding programs with skills and tools to increase the efficiency of cultivar development programs and faster release of improved varieties to the farmers. It aims to empower African plant breeders to put the DNA sequence information on African orphan crops into action in developing new, improved varieties that meet farmer, consumer, and processor needs. To date, 80 scientists have completed the intensive 6-week training, and 36 are participating in the current class. Overall, these scientists represent 28 countries across the African continent; they work with over 105 crop species including 55 orphan crops. Most are mid-career scientists employed in national agricultural research programs; over 80% are PhD scientists, 33% are women.

Although plant breeding is a long-term effort, the outcomes of the training have already been realized (Sogbohossou et al. 2018). Collaborations have been established among the cohorts and these collaborations have been successful in securing substantial funding for research and graduate student training. A community of practice has been developed among the AfPBA graduates, which has given rise to the African Association of Plant Breeders, intended to extend continuing professional development beyond the formal training of the AfPBA. The AfPBA is an outlet for translating the resource of sequence data into crop varieties that can provide the basis of food and nutritional security for Africa.

Open data access policy

Open access and democratization of sequence information and other genomics data is considered as an important part of the process of finding new innovative solutions for emerging scientific, social, and political challenges (Pauwels 2017). This involves putting the data in public domain and making the translational technologies approachable and affordable to the general users with a minimal skill set. This is considered as an important commitment of AOCC toward the African agricultural research landscape. All the data and accessions created by the AOCC will be made publicly available through gene banks, partners, partners websites, and publications such as GigaScience, NCBI, and/or CNGB Nucleotide Sequence Archive (CNSA: https://db.cngb.org/cnsa) databases.

Conclusion

The AOCC was established with a goal of supporting food, nutrition, and income generation capacity of the African population, mainly the smallholder farmers, by providing diversified options through locally available underutilized crops/trees on farming landscape driven by nutrient-sensitive food systems approach. The AOCC is committed to using high-end technology-driven solutions to mainstream 101 under-researched, underinvested African crops, referred to as orphan or neglected crops by using next-generation technologies such as genomics and genomics-assisted breeding. In the current phase, AOCC is sequencing and generating genomics resources for 101 target African orphan crops using next-generation sequencing technologies through a vast network of core partners and a network of collaborators. The list of these crops was drawn considering African needs to support food and nutrition targets as well as economic empowerment of smallholder farmers. The AOCC has made a reasonable progress with regard to generating genomic resources for 60 species. To date, the AOCC, through the AfPBA, has empowered 116 of Africa’s top plant breeders to expedite development of improved varieties of African orphan crops and other food plants. The AOCC solutions will increase dietary diversity, create new avenues of income generation, and increase the health and standard of living of the African population, including that of smallholder farmers.

Author contribution statement

ICRAF hosts the AOCC genomics lab and UC Davis hosts AfPBA; ICRAF and UCD team were involved in overall workflow management. The authors shared responsibility as follows—PSH wrote the manuscript with relevant inputs from other authors; PSH, AM, AVD, RJ: supervision, coordination, management of genomics laboratory workflows, and logistics; AM: germplasm acquisition and logistics; PSH, RK, SM: DNA/RNA extractions, logistics, and re-sequencing workflow. BGI team was involved in planning and execution of WGS pipelines as follows—YF, BS, YC: designing and implementing genome surveys and assembly; YC, YF, ML, XL, SW, LL: genome annotations; HL, SP: NGS libraries and primary data generation; BS, SKS: manuscript revision; HL, SC, XX, HY, JW, XL: initiation, supervision, and management of overall WGS pipeline; RM: Director, AfPBA; RM and AVD: core instructor, AfPBA; H-YS, TS, AVD, RJ: conceptualization and establishment of a functional AOCC consortium, collaborations, and partnerships; all the authors read and approved the final manuscript.