Introduction

Silicon (Si) is the second-most abundant (28.8%) element in Earth’s continental crust after oxygen (Kaur and Greger 2019). The uptake and accumulation of Si in plants is associated with a number of benefits (Fig. 1). For example, Si enhances plant tolerance to various abiotic (e.g., drought, salinity, metal toxicity) and biotic (e.g., pathogen, herbivory) stressors (Souri et al. 2020; Gaur et al. 2020; Ahanger et al. 2020). The accumulation of Si in plants as carbon-containing phytoliths (Si-based structures in plant cells) also contributes to long-term carbon sequestration (Hans Wedepohl 1995). However, Si is considered a non-essential element for most plants. Plants acquire Si as silicic acid [Si(OH)4] from the soil through the root system. Silicic acid is then translocated to the aboveground tissues and organs and deposited as phytoliths through the process of biosilicification (Kaur and Greger 2019).

Fig. 1
figure 1

Molecular mechanism of silicon uptake and its effects in plants

The uptake, transport and deposition of Si from soil to a plant is mediated by a cooperative action of various transporter genes, such as channel-type influx (e.g., Lsi1/Lsi6) and efflux (e.g., Lsi2) transporters (Kaur and Greger 2019; Nawaz et al. 2020). The first Si transporter gene identified in plants was rice (Oryza sativa L.) low silicon 1 (OsLsi1) (Ma et al. 2006). Functional homologs of OsLsi1 have since been identified in many plant species, including in barley (Hordeum vulgare L.) (Chiba et al. 2009), pumpkin (Cucurbita moschata Duch.) (Mitani et al. 2011), wheat (Triticum aestivum L.) (Montpetit et al. 2012), soybean (Glycine max L.) (Deshmukh et al. 2013), maize (Zea mays L.) (Mitani et al. 2009b) and cucumber (Cucumis sativus L.) (Sun et al. 2017). Lsi1 belongs to the Nodulin 26-like intrinsic protein (NIP) III subgroup of the aquaporin gene family (Ma and Yamaji 2015). This gene is involved in Si uptake from the soil into root cells in both eudicots and monocots. Functional characterization of the Lsi1 protein in different plant species shows that Si specificity in plants is linked to the two highly conserved Asn-Pro-Ala (NPA) motifs and an aromatic/arginine (ar/R) region formed by four amino acid (AA) residues that act as a selectivity filter (SF). The SFs consist of Gly-Ser-Gly-Arg (GSGR) or Ser-Thr-Ala-Arg (STAR) residues (Mitani-Ueno et al. 2011b; Grégoire et al. 2012). In addition, a precise spacing of 108 AAs between the two NPA motifs is essential for Si permeability in plants (Deshmukh et al. 2015). Lsi2, an efflux transporter gene which was first identified in rice, transports Si from root cells to the root xylem tissue. It belongs to an uncharacterized putative anion transporter protein family. Homologs of Lsi2 have been reported in several plants, such as barley, maize (Mitani et al. 2009a) and pumpkin (Mitani-Ueno et al. 2011a).

Si in xylem sap is deposited as monosilicic acid which is then unloaded by Lsi6, a homolog of Lsi1 (Yamaji et al. 2008). Lsi6 subsequently distributes Si to the aboveground tissues where it is deposited as amorphous Si(SiO2) in phytoliths (Kumar et al. 2017). In some plants, there are specialized cells called silica cells (a leaf epidermal cell) where Si is deposited. The molecular mechanism of silica deposition has recently been uncovered in Sorghum with the identification of the protein Siliplant 1 (Slp1) (Kumar et al. 2017). Slp1 contains a short peptide (HKKPVPPKPKPEPK) that controls the silicification process in Sorghum (Kumar et al. 2020). A generalized scheme of Si uptake, transport and deposition in plants is depicted in Fig. 1.

The expression and subcellular localization of Lsi1 and Lsi2 in roots varies in different plant species, despite these transporters having similar functions. In rice roots, OsLsi1 is found on the distal side of root exodermal and endodermal cells (Ma et al. 2006, 2007). On the other hand, the maize homolog of OsLsi1, ZmLsi1, is found in seminal and crown roots on the distal side of root epidermal and hypodermal cells, as well as in the cortical cells of lateral roots (Mitani et al. 2009b). In barley, HvLsi1 is localized on the distal side of epidermal and cortical cells in seminal roots and on the distal side of hypodermal cells in lateral roots (Chiba et al. 2009). Like Lsi1, subcellular localization of Lsi2 varies by plant species. For example, in rice, OsLsi2 is localized to the proximal side of root exodermal and endodermal cells (Ma et al. 2006, 2007), while maize and barley, Lsi2 transporters are only localized to root endodermal cells and show no polar distribution (Mitani et al. 2009a). The difference in cellular localizations of Si transporters in different plant species may be associated with their Si uptake ability. Using a mathematical simulation model, Sakurai et al. (2015, 2017) concluded that the most efficient Si uptake strategy is with OsLsi1 and OsLsi2 localized to the exodermis and endodermis with a polar distribution. This model simulation is, therefore, predicting that rice a high Si accumulator. To test such predictions, Si transporters need to be experimentally identified and characterized across multiple plant species to better understand how localization affects Si uptake and accumulation.

As described above, previous studies on Si accumulation have focused on food crops (e.g., barley, rice, wheat, maize, soybean) (Souri et al. 2020; Ahanger et al. 2020). However, our knowledge on the genetic variability and molecular mechanisms of Si accumulation in bioenergy crops such as switchgrass (Panicum virgatum L.) and poplar (Populus trichocarpa Torr. & A. Gray ex W. Hook) is limited. While Si uptake by plants can have multiple benefits (Fig. 1), there may be negative impacts on the utilization of these crops for bioenergy production as Si in plant tissues with high Si content may inhibit the bioprocessing of bioenergy feedstocks. Thus, manipulating the phytolith-forming pathway, including uptake, transport, and translocation of Si in plants, will directly influence plant productivity and potentially extend to influence bioprocessing and ecosystem-level processes such as carbon sequestration (Fig. 1). In this study, we conducted an in-silico genome analysis to identify the putative transporter genes involved in Si uptake, transport, and deposition in poplar, a predominant bioenergy crop.

Materials and methods

Clustering of protein sequences into tribes

The protein-coding sequences (CDSs) of the primary transcripts annotated in the genomes of six plant species including Arabidopsis thaliana (L.) Heynh. (Araport11), Amborella trichopoda Baill. (v1.0), Oryza sativa L. (v7.0), Populus trichocarpa Torr. & A.Gray ex Hook. (v4.1), Sorghum bicolor (L.) Moench (v3.1.1), and Solanum lycopersicum L. (ITAG3.2) were obtained from Phytozome v13 (https://phytozome-next.jgi.doe.gov/) (Goodstein et al. 2012). The protein sequences translated from the CDSs of individual plant species were combined into a single protein sequence file in fasta format. Then 'all-against-all' sequence alignments in this combined protein sequence dataset were performed using DIAMOND BLASTP (Buchfink et al. 2015) with parameters of “-e 1e-5 -f 6 -k 1000000”. The BLASTP output was used for clustering the protein sequences into tribes using TRIBE-MCL (Enright et al. 2002), with various inflation values (i.e., 1.1, 1.2, 1.3, …, 2.0).

Sequence analysis

To identify Pfam domains (Bateman et al. 2002; Mistry et al. 2021), the protein sequences described in section “clustering of protein sequences into tribes”, were analyzed using InterProScan version 5.48–83.0 (Jones et al. 2014) with parameters of “-f tsv -appl Pfam”. For channel-type silicon-influx transporters (SITs), the SF and NPA motif were manually searched against OsLsi1. The distance between the two NPA motifs was manually calculated. Transmembrane domains (TMDs) were predicted by the TMHMM server (https://services.healthtech.dtu.dk/service.php?TMHMM-2.0) (Krogh et al. 2001) and the topology was visualized with the PROTTER (Omasits et al. 2014) tool. Theoretical isoelectric point (pI) and amino acid frequencies were computed using protein identification and analysis tools on the ExPASy Server (https://web.expasy.org/compute_pi/) (Gasteiger et al. 2005). Repeats in protein sequences were predicted using the RADAR web server (https://ebi.ac.uk/Tools/pfa/radar/) (Madeira et al. 2019). Conserved motifs were found in the peptide sequences using the MEME tool (https://meme.nbcr.net/meme/tools/meme) (Bailey et al. 2006). SignalP was used to predict signal peptide and cleavage sites (Armenteros et al. 2019).

Phylogenetic analysis

Phylogenetic analysis of protein sequences was performed as described previously (Yang et al. 2017). Specifically, protein sequences of the six plant species in each gene family containing known Si transporters were aligned using MAFFT (v7.453) (Katoh and Standley 2013) or CLC sequence viewer (http://www.qiagen.com), with the setting of “auto”. The protein sequence alignment was edited using Jalview software (Waterhouse et al. 2009). The protein sequence alignment for each gene family was used to infer maximum-likelihood phylogenetic tree using IQ-TREE 2 (version 2.1.2) (Minh et al. 2020), with the setting of “-B 1000”. Finally, the phylogenetic trees were visualized using MEGA X (version 10.1.6) (Kumar et al. 2018).

Gene ontology analysis and pathway annotation

Gene ontology (GO) analysis of protein sequences was performed using DeepGOPlus (version 1.0.1) (Kulmanov and Hoehndorf 2020), with a prediction threshold of 0.3. Pathway annotation of protein sequences in each gene family containing known Si transporters was performed on the KEGG Blast KOALA (Kanehisa et al. 2016), with the KEGG GENES database as “family_eukaryotes”.

Prediction of protein subcellular localization

The protein sequences in each gene family containing known Si transporters were analyzed for predicting subcellular localization using three different tools, including Plant-mSubP (http://bioinfo.usu.edu/Plant-mSubP/) (Sahu et al. 2020) with the prediction module set as “PseAACNCCDipep”, BUSCA (http://busca.biocomp.unibo.it/) with the taxonomic origin set as “Eukarya—Plants—16 compartments”, and DeepLoc-1.0 (https://services.healthtech.dtu.dk/service.php?DeepLoc-1.0) (Almagro Armenteros et al. 2017) with parameters set as Profiles (for 50 sequences or less) or BLOSUM62 (for more than 50 sequences).

Gene expression analysis

The gene expression data for the Si transporter genes in P. trichocarpa were obtained from ePlant Poplar (http://bar.utoronto.ca/eplant_poplar/) (Waese et al. 2017). Briefly, Affymetrix expression data were normalized by the GCOS method, with a TGT value of 500. Duplicate or triplicate samples were analyzed from greenhouse-grown or field-grown material (the latter in the case of the catkins). The seedings were grown on moist filter paper. All materials were grown under a diurnal cycle of 12 h light/12 h dark and sampled at midday, except for the xylem samples, which were sampled at midnight.

Protein 3D structure analysis

The three-dimensional (3D) structures of the Si transporter and silicification proteins were determined with AlphaFold 2 on Google Colab’s Notebook (Mirdita et al. 2022). This Colab notebook allows for the prediction of protein structure using a slightly simplified version of AlphaFold v.2.1.0. Colab notebooks are coupled to a special sequence search program, MMSeqs2, thus making this technology more powerful. The predicted structures were downloaded from Google Colab’s server in protein databank (PDB) format and visualized with the FirstGlance in Jmol server (http://firstglance.jmol.org).

Analysis of gene duplications

MCScanX (Wang et al. 2012) was used to detect gene synteny among different plant species as well as segmental duplication within species. The tandem gene duplication information was obtained from PLAZA 4.5 (Van Bel et al 2018).

Analysis of gene networks

The protein sequences of P. trichocarpa were used to search the UniProt database through BLAST (https://www.uniprot.org/blast/) to obtain the corresponding proteins with UniProt accession numbers. Then, the UniProt accession numbers were used as queries to search the STRING (Szklarczyk et al. 2021) database for genes connected to each query gene.

Genome-wide association studies (GWAS)

Mutation profiles for each gene were obtained from whole-genome resequencing of 917 unrelated P. trichocarpa genotypes that were used to assemble the GWAS as described by Muchero et al. (2018). Expression analysis for the GWAS panel was performed using RNA-seq as previously described (Zhang et al. 2018). In the work by Muchero et al. (2018) and Zhang et al. (2018), 533 xylem and 470 leaf samples were collected from 4-year-old trees from the Clatskanie field site, Oregon, USA, while 529 root samples were collected greenhouse-grown plants at Oak Ridge National Laboratory, Tennessee, USA.

Results and discussion

Identification of Si transporters in P. trichocarpa

Channel-type Si influx transporters (SITs)

Until now, two channel-type SIT genes (Lsi1/Lsi6) have been reported in different plant species such as rice, wheat, maize, barley and soybean (Souri et al. 2020). Here, we performed an in-silico analysis to identify the putative channel-type SITs in poplar. We first compared the protein coding genes of A. thaliana, A. trichopoda, O. sativa, P. trichocarpa, S. bicolor and S. lycopersicum at the whole-genome level to identify the putative channel-type SITs in poplar. Phylogenetic analysis of the whole-gene family containing the channel-type SITs (i.e., Lsi1/Lsi6) grouped them into four distinct clades (Fig. 2A). Further analysis of the subfamily containing the known SIT genes of rice (OsLsi1/OsLsi6) revealed that this clade contained 11 putative SITs in poplar (Fig. 2B). Out of the 11 putative SITs found in the poplar genome, only Potri.017G083300.1 (here referred to as PtLsi1) was grouped into the clade containing the rice SIT, indicating that this gene might be the candidate channel-type SIT in poplar. Further sequence analysis showed that PtLsi1 is ~ 55% similar to OsLsi1 and ~ 53% similar to OsLsi6 at the AA level.

Fig. 2
figure 2

Phylogenetic trees of the gene families containing different Si transporters. A The phylogenetic tree of the Lsi1/Lsi6 gene family, which was rooted on midpoint. A subfamily containing the known Si transporters OsLsi1 (LOC_Os02g51110) and OsLsi6 (LOC_Os06g12310) is highlighted in blue. B Phylogenetic tree of the subfamily containing the known Si transporters OsLsi1 (LOC_Os02g51110) and OsLsi6 (LOC_Os06g12310). The clade containing the two known Si transporters is highlighted in purple. C Phylogenetic tree of the gene family containing OsLsi2. The clade containing the known Si transporter OsLsi2 (LOC_Os03g01700) is highlighted in purple. D Phylogenetic tree of the gene family containing the known silicification protein SbSlp1 (Sobic.001G266300.1)

To examine the presence of NPA motifs and the SF in the putative PtLsi1, we aligned the AA sequences of rice channel-type SIT, i.e., OsLsi1/OsLsi6 with the putative PtLsi1 identified in this study. The alignment (Fig. 3A) shows that PtLsi1 contains both NPA motifs and SF, and their locations match the positions of NPA/SF reported in OsLsi1/OsLsi6. Similar to OsLsi1/OsLsi6, the distance between the two NPA motifs in PtLsi1 is 108 AA. The channel-type SITs in plants have six transmembrane domains (TMDs) (Wallace and Roberts 2004; Ma et al. 2006; Nawaz et al. 2020). We also found six TMDs in PtLsi1 (Fig. 3B), indicating that PtLsi1 is a membrane-bound protein and has an active role in transport across cells. Protein topology analysis showed that the N-terminus and C-terminus of PtLsi1 are located inside the intracellular space whereas in OsLsi1, these regions are in the extracellular space (Fig. 3B). Subcellular localization of PtLsi1 was determined using three different tools, including Wolfpsort (Horton et al. 2007), Cello (Yu et al 2006), and DeepLoc (Almagro Armenteros et al. 2017), and all tools predicted that PtLsi1 is a membrane-bound protein. A previous report in pumpkin (Mitani et al. 2011) showed that Si transport activity is hampered by a proline-to-leucin mutation in CmLsi1 at position 242 resulting in a mislocalization to the endoplasmic reticulum instead of the plasma membrane, which is essential for its activity. This mutation, however, was not found in PtLsi1.

Fig. 3
figure 3

Sequence analysis of a channel-type silicon influx transporter (SIT) in poplar. A Protein sequence alignment of the rice channel-type SIT (OsLsi1/OsLsi6) with the putative poplar homolog (PtLsi1) identified in this study. Signature motifs that characterize the channel-type SIT in plants are indicated by boxes, with the blue boxes showing the GSGR filter and the pink boxes showing the NPA motif. B Shows the presence of transmembrane domains in OsLsi1 and PtLsi1

A previous study (Vatansever et al. 2017) found that sequence similarity and Si transport ability can be used to infer accumulation capacities of different plant species. Applying a similar approach to our findings, it is likely that PtLsi1 is a high Si accumulator given sequence similarity to the high accumulating rice homolog (Vatansever et al. 2017). Indeed, a previous study (Deshmukh et al. 2015) identified “Potri.017G083300.1” as the candidate PtLsi1 in P. trichocarpa and showed that it can accumulate high levels of Si in Xenopus (clawed frog) eggs, similar to that observed with rice OsLsi1. However, when assessed in poplar plants, Deshmukh et al. (2015) found moderate levels of Si accumulation. This might be explained by the low expression of PtLsi1 in poplar roots (see section “Gene expression pattern of Si transporters in poplar” for details). One interesting observation is that the number of influx transporters is highly variable in different plant species. For example, rice contains two influx transporters (Lsi1, Lsi6) whereas maize has as many as four homologs, and both rice and maize accumulate high levels of Si. This is also the case for sunflower (Helianthus annuus L.) which accumulates Si and contains more than one channel-type influx transporter. Whether the number of Si transporters affects a plant species’ ability to take up Si has yet to be verified experimentally. Although rice has two Si influx transporter genes, only one homolog of Lsi1 was identified in P. trichocarpa. We did not find any homologs of Lsi6 in P. trichocarpa, which is involved in unloading Si from xylem and transporting it to multiple aboveground tissues. Interestingly, it has been shown that the absence of Lsi6 has no negative effect on Si uptake but can affect the pattern of Si deposition in leaves (Ma et al. 2011; Kaur and Greger 2019).

Si efflux transporter

Lsi2 is the only efflux transporter identified in plants to date. It belongs to an uncharacterized anion transporter family. Our genome-wide analysis identified 22 genes in this family, including five homologs in poplar (Potri.012G144000.5, Potri.015G146900.5, Potri.017G141600.3, Potri.017G 141800.2, Potri.017G141950.1). This phylogenetic analysis grouped the 22 genes into six distinct clades (Fig. 2C). The clade containing the rice Lsi2 (OsLsi2; LOC_Os02g51110) harbors only one poplar gene Potri.012G144000 (here referred to as PtLsi2). We next aligned the protein sequences of five poplar homologs along with OsLsi2 to visualize their conserved regions (Fig. 4A). The alignments showed that these genes are highly conserved over the full-length of the protein sequences except at positions 230–375. The similarity between the OsLsi2 and the putative PtLsi2 is ~ 50% at the AA level. We found five conserved motifs in the putative Lsi2s (except Potri.017G141950.1) in poplar and in OsLsi2 (Fig. 4B).

Fig. 4
figure 4

Sequence alignment and motif analysis of putative Lsi2 homologs in poplar. A Sequence alignment of the putative Lsi2 homolog in poplar identified in our genome-wide study. Transmembrane domains (TMDs) are highlighted with shaded colors. Each different color indicates a different TMD. B Organization of the five highly conserved motifs predicted in the putative Lsi2 homolog in poplar by the MEME tool

Previous studies found that the Lsi2 protein in plants is localized in the plasma membrane, featuring 9–11 TMDs, with extracellular N- and extracellular/cytoplasmic C-terminus regions (Vivancos et al. 2016; Vatansever et al. 2017; Nawaz et al. 2020). Similarly, we identified 9–11 TMDs in the putative poplar Lsi2 homologs except Potri.017G141950.1, which contained five TMDs (Supplementary Fig. 1A). Only Potri.012G144000.5 contained a signal peptide (Supplementary Fig. 1), further supporting the notion that this gene is the putative poplar Lsi2 homolog corresponding to OsLsi2. A recent study (Nawaz et al. 2020) identified a highly conserved region with a consensus sequence “XXTKHXWFXXCXXXXRX” among the Lsi2 proteins in different plant species such as rice, wheat, and Sorghum. This conserved sequence, however, was not found in the Lsi2 homologs in poplar. Many plants with high Si accumulation capacity, such as Equisetum (horsetail) and pumpkin, do not contain this conserved sequence either (Mitani-Ueno et al. 2011a; Grégoire et al. 2012). Therefore, this conserved sequence might not have any functional importance, but further characterization is warranted. Lsi2 is not an extensively characterized protein like Lsi1, and thus there is a little information available for this protein.

Siliplant 1 (Slp1) protein

Our genome-wide analysis identified 27 proteins in the studied plant species which belong to the Slp1 protein family. Out of the 27 proteins, there were only two poplar proteins which will be referred to as PtSlp1a (Potri.004G168600.10) and PtSlp1b (Potri.009G129900.6) in the remaining text. The isoelectric point (pI value) of these proteins was computed as positively charged AAs are involved in biosilicification (Otzen 2012; Nawaz et al. 2020; Kumar et al. 2020). The predicted pI values of the poplar Slp1 homologs were 9.96 and 9.78 (Table 1) for PtSlp1a and Ptslp1b, respectively. Similar to the known Slp1 protein in Sorghum, PtSlp1a and PtSlp1b are rich in proline (23.5% in PtSlp1a and 32.2% in PtSlp1b) and lysine (13.7% in PtSlp1a and 15.4% in PtSlp1b) (Table 1). However, they are poor in histidine (3.7% in PtSlp1a and 2.1% in PtSlp1b), whereas the known Sorghum Slp1 protein is rich in histidine (Kumar et al. 2020). We also tested whether these proteins have signal peptides because Sorghum Slp1 is a secretory protein (Kumar et al. 2020). Both poplar Slp1 homologs contain a signal peptide (Supplementary Fig. 1B).

Table 1 Characteristics of different silicon transporters in poplar and rice

The reported peptide sequence for in vitro silica precipitation is HKKPVPPKPKPEPK (Kumar et al. 2020). In addition, the well-characterized Sorghum Slp1 protein has three conserved domains, namely the A-, B- and C-domains. The A-domain is rich in histidine and aspartic acid (i.e., H, D- rich) and the consensus sequence is DXFHKXHDYHXXXXHFH, whereas the B- and C- domains are rich in proline, lysine, glutamic acid (P, K, E- rich) and proline, threonine, and tyrosine (P, T, Y-rich) with consensus sequences KKPXPXKPKPXPKPXPXPX and YHXPXPTYXSPTPIYHPPX, respectively (Kumar et al. 2020). We looked for the presence of these features in the putative Slp1 poplar homologs. We found that both PtSlp1a and PtSlp1b contain the peptide sequences reported for in vitro silica precipitation, but these sequences are partially conserved. The peptide is also more conserved in PtSlp1a than PtSlp1b. When screening for the conserved A-, B-, and C- domains, we found that PtSlp1a and PtSlp1b do not contain H, D-rich regions (i.e., A-domain). They do contain the P, K, E-rich and P, T, Y-rich regions which are partially conserved relative to the Sorghum Slp1 protein (Fig. 5A). Absence of the H, D-rich domain suggests that PtSlp1a or PtSlp1b might be a moderate Si accumulator as described by (Kumar et al. 2020). Finally, we checked the PtSlp1a/PtSlp1b protein sequences for the presence of any repeat AA sequences as the Sorghum Slp1 contains multiple repeats (Kauss et al. 2003; Kumar et al. 2020). We found that PtSlp1a contains a sequence (KPKPPVFKPPPVPI) repeated six times, whereas PtSlp1b harbors a sequence (PKPKPPIFK) which is repeated three times (Fig. 5B). The AA lengths of PtSlp1a and PtSlp1b are also different. PtSlp1a has 513 AA whereas the length of PtSlp1b is 336 AA. The length of the Sorghum Slp1 protein is 524 AA which is almost equivalent to PtSlp1a. These results suggest that PtSlp1a is the best candidate Slp1 homolog in poplar.

Fig. 5
figure 5

Screening Slp1 signature sequence in the putative poplar homologs. A Alignment of two identified poplar Slp1 homologs along with the known Slp1 protein of Sorghum. The signature domains of Slp1 proteins are indicated with colored boxes (A-domain is in pink, B-domain is in sky blue, C-domain is in red). B Repeat sequences found in the two poplar Slp1 homolog proteins identified in this study PtSlp1a (Potri.004G168600.10) and PtSlp1b (Potri.009G129900.6)

Gene expression pattern of Si transporters in poplar

In plants with high Si accumulation capabilities, channel-type SIT genes (i.e., Lsi1/Lsi6) and Lsi2 are predominantly expressed in root and stem tissue (Jadhao et al. 2020), consistent with their role of Si uptake by roots, whereas Slp1 is expressed in the inflorescence and leaves (Kumar et al. 2020). We examined the expression pattern of the putative channel-type SITs, Lsi2, and Slp1 homologs in poplar using gene expression data from the ePlant poplar (http://bar.utoronto.ca/eplant_poplar/) database. We found that the expression levels of both PtLsi1 (Potri.017G083300) and PtLsi2 (Potri.012G144000) were lower in the root than in the other parts of the plant (Fig. 6A), such as leaves, suggesting that poplar is a weak Si accumulator. PtSlp1a (Potri.004G168600) is highly expressed in the inflorescence and young leaves (Fig. 6A), similar to the Sorghum Slp1 gene (Nawaz et al. 2020; Kumar et al. 2020). Furthermore, the expression of PtSlp1a was found to be much higher than that of PtSlp1b (Potri.009G129900) in multiple tissue types (Fig. 6A). This expression pattern suggests that PtSlp1a might be the best candidate Slp1 homolog in poplar, which is consistent with our inference based on the analysis of protein sequences in section “Siliplant 1 (Slp1) protein”. Similar to observations above, analysis of expression patterns across the P. trichocarpa GWAS population also revealed that expression levels of these four genes was highest in leaves followed by roots and was negligible in xylem tissues (Fig. 6B). Moreover, there was significant variation across genotypes for each gene. When evaluated across the P. trichocarpa tissue Atlas expression profiles, Potri.004G168600 consistently had the highest expression across conditions followed by Potri.009G129900. Across all conditions, Potri.017G083300 consistently showed low expression levels (Fig. 6C).

Fig. 6
figure 6

Gene expression profiles of poplar Si transporters (PtLsi1; Potri.017G083300 and PtLsi2; Potri.012G144000) and the candidate silicification proteins (PtSlp1a; Potri.004G168600 and PtSlp1b; Potri.009G129900). A Digital expression patterns obtained from gene expression data available at http://bar.utoronto.ca/eplant_poplar/. B Expression profiles of different Si transporters across the P. trichocarpa GWAS population. C Expression profiles of Si transporters across different tissue types

Potential interaction partners of PtLsi1 and PtLsi2

We identified the putative interaction partners of PtLsi1/PtLsi2 using public data in the STRING database (Szklarczyk et al. 2021). The putative interaction partners of PtLsi1 (Potri.017G083300) include different types of ion transporters, such as POPTR_0013s05030.2 (Fig. 7A), which is an aquaporin protein. This protein belongs to the major intrinsic protein (MIP) family and small basic intrinsic protein (SIP) subfamily and is involved in the transport of water and ions into the cell (Johanson and Gustavsson 2002; Ishikawa et al. 2005). Other important interaction partners of PtLsi1 are POPTR _0005s20790.1 and POPTR_0002s07550.2, which are involved in a range of metabolic pathways. The interaction partners of PtLsi2 include Delta-1-pyrroline-5-carboxylate synthase (POPTR_0008 s06060.1, POPTR_0010s20590.1), aminomethyltransferase (gdcT1/gdcT2), and the 26S proteasome regulatory subunit family protein. Among these interaction partners, Delta-1-pyrroline-5-carboxylate synthase (P5Cs) plays a key role in proline biosynthesis in plants which helps protect the plant from different biotic and abiotic stressors (Hayat et al. 2012; Dar et al. 2016). This suggests that PtLsi2 may play a role in stress response.

Fig. 7
figure 7

Predicted interaction partners of the putative Si transporters in poplar. A PtLsi1 (Potri.017G083300; UniProt accession# B9IJY1) and B PtLsi2 (Potri.012G144000; UniProt accession# U5FXI4). The network was obtained from the STRING database (Szklarczyk et al. 2021)

Evolution of the Slp1 gene family

Gene duplication plays an important role in plant evolution. Specifically, gene family expansion and genomic evolutionary mechanisms mainly depend on gene duplication events. In this study, gene duplication events were identified in the Slp1 gene family using a bioinformatics analysis. All identified Slp1 genes were mapped to their corresponding chromosomes, as shown in Fig. 8. Some of these genes showed tandem gene duplications such as Sobic.001G265900 versus Sobic.001G266100 in Sorghum, Sobic.001G438400 versus Sobic.001G438500 in Sorghum, and LOC_Os03g14130 versus LOC_Os03g14140 on rice chromosome 3 (Fig. 8). The expansion of the Slp1 gene family partially resulted from segmental duplications, such as a segmental duplication on Sorghum chromosome 1 (e.g., the segment containing Sobic.001G265900 and Sobic.001G266100 versus the segment containing Sobic.001G438400 and Sobic.001G438500) as well as a segmental duplication between the two poplar chromosomes (the segment containing Potri.004G168600 versus the segment containing Potri.009G129900) (Fig. 8). There were clear syntenic genomic blocks conserved among different plant species, such as the block containing LOC_Os03g14130 and LOC_Os03g14140 in rice and the block containing Sobic.001G438400 and Sobic.001G438500 in Sorghum. These results indicate that the evolution and expansion of the Slp1 gene family was mediated by tandem gene duplication and segmental duplications within the same chromosome and between different chromosomes.

Fig. 8
figure 8

Circos plot presenting gene duplication (tandem and segmental) events and synteny of the Slp1 genes. Tandemly duplicated loci are indicated with neighboring locus name with red and blue fonts whereas segmental duplication are indicated by red lines

Mutation profiles of Si transporters across the P. trichocarpa GWAS population

Among the poplar Si transporters identified in this study, Potri.004G168600 had the highest number of mutations with three significant mutations based on SNPeff analysis. This gene had a codon deletion mutation at position Chr04:18772192, a non-synonymous mutation (G > A) at position Chr04:18772384, and another non-synonymous mutation (T > C) at position Chr04:18772686. Potri.009G129900 had the highest impact mutation at position Chr09:10591818 where a cytosine insertion created a frame-shift mutation (GTT > GCTT). The frame-shift mutation occurred only in the heterozygous condition in 103 of the 917 P. trichocarpa genotypes evaluated. The absence of the homozygous state of this allele might suggest deleterious effects if protein function is significantly affected. The second-most impactful mutation occurred in Potri.012G144000 at position Chr12:15654656 with a four-nucleotide deletion from the reference allele creating an alternative splice site (CATATATA > CATA). Of the 917 genotypes, 450 carried the CATA allele in the homozygous state while 366 were homozygous and 101 individuals carried the CATATATA minor allele in a homozygous state. The relative lack of high-impact mutations in these genes might be indicative of their essential role in P. trichocarpa biology. Genotypes exhibiting high-impact mutations identified from this study can be further characterized to evaluate the putative impacts on trait expression.

Protein 3D structure analysis

Figure 9 shows the 3D structures of the putative Si transporter and silicification proteins of poplar. The 3D structures were predicted with Alphafold and the accuracy of these structures was determined by the predicted local distance difference test (pLDDT) score. pLDDT scores range from 0 to 100 and a higher value indicates a more accurate prediction. Another indicator for assessing the Alphafold predicted protein structure is the predicted aligned error (PAE) which assesses the confidence in the domain packing and large-scale topology of the protein. PAE is presented as an interactive 2D plot where dark blue indicates small error and red indicates large error. Alphafold detected a well-defined 3D structure (Fig. 9A) for most of the PtLsi1 with high confidence (pLDDT > 80; Fig. 9H) except the C-terminus (residue 1–32) which was depicted as a ribbon-like structure with a low pLDDT score (pLDDT < 50; Fig. 9G). This ribbon-like structure is likely to be an intrinsically disordered region (IDR). The PtLsi1 protein also exhibited high structural similarity with OsLsi1 except the IDR which is detected in the N-terminus in OsLsi1. Disordered protein domains are often important for intracellular signaling and can transition from a disordered to an ordered state upon binding to other proteins (Ruff and Pappu 2021; David et al. 2022). Therefore, the presence of an IDR in a different location in PtLsi1 versus OsLsi1 indicates that these two proteins may have different binding partners or function differently. PtLsi2 also showed a well-folded structure predicted by Alphafold and was highly similar to the OsLsi2 3D structure (Fig. 9C, D) with a pLDDT score > 80 (Fig. 9G). Interestingly, the IDR detected in both proteins were found in very similar positions (residues 211–267 in PtLsi2 and 211–295 in OsLsi2). The high structural similarity along with a similar location of the IDR implies that PtLsi2 is the candidate homolog for OsLsi2 in poplar. For silicification proteins, both the candidates in poplar (PtSlp1a and PtSlp1b) and the known SbSlp1 in Sorghum showed highly disordered structures in the Alphafold-predicted 3D model (Fig. 9E–G) with pLDDT scores < 50 (Fig. 9G). Comparing PtSlp1a and PtSlp1b with SbSlp1, it appears that PtSlp1a is structurally more similar to SbSlp1 indicating that PtSlp1a may be the candidate silicification protein in poplar.

Fig. 9
figure 9

3D structures of the Si transporter and silicification proteins in poplar and rice. A–G Alphafold predicted 3D structures with the interactive 2D plots of predicted aligned error (PAE). Dark blue in the PAE plot indicates smaller errors whereas red indicates larger errors. H Interactive 2D plot of predicted local difference distance (pLDDT) scores for different models. The pLDDT plot gives the best information on intra-domain confidence with a value > 90 indicative of a high score

The quality and accuracy of the Alphafold prediction model varies. The pLDDT score is a good guide for model quality and it correlates well with model accuracy when assessed against an experimentally determined structure. However, further experimental characterization and structure-based mutational studies are required to validate the structural model. Another limitation of Alphafold is that the predictions yield a static picture of protein structure and contain little or no information on protein dynamics. Conformational changes in proteins are often key to their function and regulation and are frequently triggered by binding to other molecules (e.g., proteins or small molecules/ligands) or by post-translational modifications. The current prediction algorithm is unable to model these conformational changes and is therefore largely unable to provide a crucial aspect of understanding the protein structure–function relationship (Jumper et al. 2021; van Breugel et al. 2022).

Conclusion

Previous studies on Si transporters mainly focused on monocot plants, particularly food crops. In this study, we identified transporter genes putatively involved in Si uptake, transport, and deposition in the woody bioenergy crop, P. trichocarpa. Further, we identified putative interaction partners and expression profiles of Si transporter genes and have described the possible evolution of Si transporter proteins in P. trichocarpa. To the best of our knowledge, this study represents the first attempt to provide an in-depth analysis of Si transporters in P. trichocarpa. The current findings expand our understanding of Si transporters in woody plants and provide knowledge that will greatly enhance genetic engineering strategies aimed at improving carbon sequestration and biomass quality of bioenergy feedstocks. Specifically, different Si transporter genes can be combined and expressed together in P. trichocarpa to examine their effects on phytolith formation. An additional next step would examine if the expression of rice Lsi6, the homolog that was not identified in poplar, improves phytolith formation in P. trichocarpa.