Acessibilidade / Reportar erro

Microsatellites retain phylogenetic signals across genera in eucalypts (Myrtaceae)

Abstract

The utility of microsatellites (SSRs) in reconstructing phylogenies is largely confined to studies below the genus level, due to the potential of homoplasy resulting from allele size range constraints and poor SSR transferability among divergent taxa. The eucalypt genus Corymbia has been shown to be monophyletic using morphological characters, however, analyses of intergenic spacer sequences have resulted in contradictory hypotheses- showing the genus as either equivocal or paraphyletic. To assess SSR utility in higher order phylogeny in the family Myrtaceae, phylogenetic relationships of the bloodwood eucalypts Corymbia and related genera were investigated using eight polymorphic SSRs. Repeat size variation using the average square and Nei's distance were congruent and showed Corymbia to be a monophyletic group, supporting morphological characters and a recent combination of the internal and external transcribed spacers dataset. SSRs are selectively neutral and provide data at multiple genomic regions, thus may explain why SSRs retained informative phylogenetic signals despite deep divergences. We show that where the problems of size-range constraints, high mutation rates and size homoplasy are addressed, SSRs might resolve problematic phylogenies of taxa that have diverged for as long as three million generations or 30 million years.

microsatellite phylogeny; paraphyletic; homoplasy; incongruence; eucalypts


PLANT GENETICS

RESEARCH ARTICLE

Microsatellites retain phylogenetic signals across genera in eucalypts (Myrtaceae)

Joel W. OchiengI; Dorothy A. SteaneII; Pauline Y. LadigesIII; Peter R. BaverstockI; Robert J. HenryI; Mervyn ShepherdI

ICentre for Plant Conservation Genetics, Southern Cross University, Military Road, Lismore, NSW, Australia

IISchool of Plant Science, University of Tasmania, Hobart, TAS, Australia

IIISchool of Botany, University of Melbourne, Parkville, Victoria, Australia

Send correspondence to Send correspondence to: Joel W. Ochieng Present address: Faculties of Agriculture and Veterinary Medicine University of Nairobi P.O. Box 29053, 00625 Nairobi, Kenya E-mail: jochieng@uonbi.ac.ke.

ABSTRACT

The utility of microsatellites (SSRs) in reconstructing phylogenies is largely confined to studies below the genus level, due to the potential of homoplasy resulting from allele size range constraints and poor SSR transferability among divergent taxa. The eucalypt genus Corymbia has been shown to be monophyletic using morphological characters, however, analyses of intergenic spacer sequences have resulted in contradictory hypotheses- showing the genus as either equivocal or paraphyletic. To assess SSR utility in higher order phylogeny in the family Myrtaceae, phylogenetic relationships of the bloodwood eucalypts Corymbia and related genera were investigated using eight polymorphic SSRs. Repeat size variation using the average square and Nei's distance were congruent and showed Corymbia to be a monophyletic group, supporting morphological characters and a recent combination of the internal and external transcribed spacers dataset. SSRs are selectively neutral and provide data at multiple genomic regions, thus may explain why SSRs retained informative phylogenetic signals despite deep divergences. We show that where the problems of size-range constraints, high mutation rates and size homoplasy are addressed, SSRs might resolve problematic phylogenies of taxa that have diverged for as long as three million generations or 30 million years.

Key words: microsatellite phylogeny, paraphyletic, homoplasy, incongruence, eucalypts.

Introduction

Phylogenies inferred from independent data partitions may differ from one another in topology despite the fact that they are drawn from the same set of organisms (Rodrigo et al., 1993; McCracken and Sorenson, 2005). Incongruence due to statistical, sampling or computational errors can be addressed by expanded and judicious sampling, addition of phylogenetic characters or by modifying analysis and tree reconstruction models (e.g., Udovicic et al., 1995; Steane et al., 1999, 2002; Udovicic and Ladiges, 2000). However, if topological incongruence between morphological and molecular data have their origin in genealogical discordance, the conflict is not easily resolved by modifying the model used in phylogenetic reconstruction, correcting for sampling error, combining data or by other manipulations. Such topological incongruence may arise as a result of hybridization (Dumolin-Lapegue et al., 1997; McKinnon et al., 1999; Avise, 2000), paralogy and lineage sorting (Avise et al., 1990; Maddison, 1997; Avise, 2000; Lu, 2001; Takahashi et al., 2001; Ochieng et al., 2007), or homoplasy (McCracken and Sorenson, 2005). Eucalypts are the dominant forest and woodland trees of Australia, with several species being of major economic importance in Australia and other countries around the world. Phylogenetic relationships within the eucalypts present a case of conflicting datasets, particularly the phylogenetic status of Corymbia in relation to Angophora.

The plant family Myrtaceae includes two large groups in the Australian region: the 'eucalypts' and the 'melaleuca' group (Johnson and Briggs, 1984). The eucalypt group (broad sense) includes seven genera, three of which are closely related (Eucalyptus L'Hér., Corymbia K. D.Hill and L.A.S. Johnson and Angophora Cav.). The other smaller members are the monotypic genera Arillastrum Pancher ex Baill., Stockwellia D.J. Carr, D.J. Carr and B. Hyland and Allosyncarpia S.T. Blake, and Eucalyptopsis C.T. White, which includes two species (Ladiges et al., 2003). Previously, Pryor and Johnson (1971) proposed the division of the genus Eucalyptus into seven subgenera: Blakella, Corymbia, Eudesmia, Gaubaea, Idiogenes, Monocalyptus, and Symphyomyrtus, based on morphological and ecological characters, and on the lack of crossability among the subgenera. In a major taxonomic revision of the bloodwoods, two of these subgenera, Corymbia and Blakearia, were included in a new genus Corymbia, classified into seven sections (Fundoria, Rufaria, Apteria, Ochraria, Politaria, Cadagaria, Blakearia; Hill and Johnson 1995). However, Brooker (2000) presented an alternative view regarding the monophyly and hence generic recognition of Corymbia (but see Ladiges and Udovicic, 2000). Phylogenetic analysis by Hill and Johnson (1995) based largely on morphological characters, showed Corymbia to be a monophyletic taxon, sister to Angophora. However, molecular DNA data from chloroplast (trnL, trnH, psbA) (Udovicic and Ladiges, 2000) and ITS (Steane et al., 1999, 2002) suggested that Angophora is nested within Corymbia, making the latter paraphyletic. Increased taxon sampling for the ITS region by Steane et al. (2002) did not resolve the question of paraphyly. Very recently, analyses of the external transcribed spacers (ETS; Parra-O et al., 2006) showed Corymbia to be monophyletic, however, ITS alone by the same group supported earlier ITS analyses.

The ITS locus has recently been reported to exist in paralogs within eucalypt genomes (Ochieng et al., 2007; Bayly et al., 2007). It is possible that paralogous sequences confound phylogenetic resolution at this locus in eucalypts. We are currently cloning and sequencing the nrITS to investigate if gene duplication was the cause of tree incongruence in the eucalypts. So far, three ITS riboforms, two of them widespread, have been recovered within some genomes. Compelling evidence suggested that one of the divergent riboforms was a pseudogene. Phylogenies from the apparently functional riboform retained Corymbia in an apparent paraphyly, whereas the putative pseudogene recovered a phylogeny showing Corymbia as a monophyletic genus (Ochieng et al., 2007). We explained that phylogenetic signals are obscured when functional constraints in nrITS necessitate compensatory mutations in the secondary structure helices involved in RNA transcription, whereas pseudogenes mutate under neutrality. However, other explanations such as hybridization and computational problems cannot be ruled out.

If functional constraints on nrITS were the cause of apparent paraphyly for the genus Corymbia, then a neutral molecular locus with adequate phylogenetic signals should support the genus as a clade (monophyletic). One limitation with phylogenetic reconstructions using single gene region is the potential to get a biased hypothesis when genomic regions differ in their history. We revisited the unresolved phylogenetic relationships among the eucalypt genera with neutrally evolving microsatellites, which may fairly represent different genomic regions in a single dataset. Microsatellites, also referred to as simple sequence repeats (SSRs), are segments of DNA with tandem repeat of short sequence motifs, each generally less than 5 bp in length (Bruford and Wayne, 1993). SSRs have many advantages over DNA sequencing, including a greater representation of different genomic regions and faster evolution that may lead to more informative characters. However, the utility of SSRs in reconstructing phylogenetic relationships, especially among divergent taxa, is a matter of current debate. Apart from the technical difficulty in amplifying SSRs across taxa, they are believed to possess three interrelated attributes that may limit their use in reconstructing phylogenies of divergent taxa: (1) a constraint on allele size range (Goldstein and Pollock, 1997), (2) high mutation rates, and (3) size homoplasy (Bruford and Wyne, 1993). Another limitation in SSR analyses is that confident assessment of orthology for each allele pair would involve sequencing of each of the alleles, a very expensive exercise, particularly for multilocus genotyping. As such, orthology is presumed when fragments are the same/similar length. These reasons partly explain why many phylogenetic studies utilizing microsatellites have been restricted to infra-specific relationships (e.g., Goldstein et al., 1999), or to the use of the SSR flanking sequence in higher order phylogenies (e.g., Streelman et al., 1998; Zhu et al., 2000). However, some notable cases exist for the use of repeat sequence variations in highly divergent taxa: (1) Richard and Thorpe (2001) used SSR size variation to analyse the phylogenetic relationships among the western canary island lizards, a group that diverged five million years ago (MYA). This divergence time corresponds to five million generations given their short generation time of one year (Richard and Thorpe, 2001). (2) Ritz et al. (2000) applied repeat size variation at SSR loci to resolve the relationships among four genera (Bos, Bison, Bubalus and Syncerus) in the sub-family Bovini. To overcome issues of homoplasy, the authors used the average square (dµ)2 genetic distance measure (Goldstein et al., 1995). The authors found the measure to be robust despite fluctuations in population size, and retained linearity with increasing time. The tree topology was retained when data were reanalysed with Cavalli-Sforza and Edwards' (1967) chord distance (DC) that is, interestingly, based on the infinite allele model. (3) Microsatellite length variation has been used in reconstructing the phylogeny of Darwin's finches (Petren et al., 1999). Although considered to be congeneric, these birds are believed to have radiated at least three MYA (Petren et al., 1999, and references therein). With their short generation time of four months to one year (Zink, 2002), they have evolved for over five million generations.

Although these examples are mainly from animals, the rarity of SSR use in phylogenies of plant taxa may be due mainly to low levels of transferability (e.g., Peakall et al., 1998) and a low level of SSR conservation among many plant taxa (e.g., Whitton et al., 1997), rather than concerns relating to high mutation rates or other evolutionary considerations. Where the problems of range constraints, high mutation rates and size homoplasy are addressed, SSRs may be utilised in phylogenetic studies, even among divergent taxa, so long as SSR primers amplify across such taxa. In eucalypts, cross-genera SSRs transferability has recently been reported to be high (Shepherd et al., 2006). We used 8 polymorphic SSRs isolated from Corymbia variegata (F. Muell. Hill and Johnson) clones to genotype Corymbia and Angophora samples previously analysed for ITS (Steane et al., 2002), to test the hypothesis that Corymbia is monophyletic.

Material and Methods

Plant material and DNA isolation

This study utilized a total of 32 DNA samples representing Corymbia (20), Angophora (8), Eucalyptus (3), Allosyncarpia (2), Eucalyptopsis (1) and Stockwellia (1) (Table1). Within Corymbia, nine species were sampled from the red bloodwood group (sections Rufaria and Apteria) (Hill and Johnson, 1995), seven from the yellow bloodwoods assemblage (Ochraria, Politaria, Cadagaria) and two paper-fruited bloodwoods (section Blakearia) (Hill and Johnson 1995). C. eximia and C. torelliana included two samples each. Our analysis retained the same individual DNA samples analysed by Steane et al. (2002) for comparison, but we included a new section (Corymbia sect. Cadagaria), new series and species not included in the previous DNA phylogeny of these genera. Herbarium voucher numbers or origin for new samples is indicated in Table 1. For new samples, total genomic DNA was extracted from 10 mg of leaf tissue using a DNeasy plant kit (QIAGEN, Germany) according to the manufacturer's protocol. Leaf tissue was ground using tungsten carbide beads (QIAGEN) and a RETSCH MM300 Mixer Mill at frequency of 1/30 s for three lots of one minute. DNA was eluted from the filter membranes with 150 µL of elution buffer and was stored at -20 °C.

PCR amplification and fragment separation

Eight polymorphic SSRs used in this study have been published previously: EMCRC26, EMCRC32, EMCRC39 (Jones et al., 2001); EMCRC46, EMCRC51, EMCRC54, EMCRC93 (Shepherd et al., 2006); Eg126 (Thamarus et al., 2002). For each primer pair, the forward primer was fluorescently labelled with a dye. PCR was performed in 10 µL volumes comprising 1x PCR buffer (10 mM Tris - HCl pH 83, 50 mM KCl, 0001% gelatin (Sigma, St Louis, MO, USA), 025% Nonidet P40 (BHD, Poole, UK) and 2 mM MgCl2) and contained approximately 0.5 ng of genomic DNA, 0.125 mM of each dNTPs, 0.15 µM of each primer, and 0.5 Units of Platinum Taq (Invitrogen). All amplifications were carried out on an ABI 9700 Thermocycler (Applied Biosystems) with an initial denaturation of 7 min at 95 °C, followed by 10 cycles of denaturation at 95 °C, a touchdown annealing from 60 °C to 55 °C (decreasing at -0.5 °C each cycle), 1 min. extension at 72 °C. This was followed by 25 repeated cycles of denaturation at 95 °C, annealing at 55 °C and an extension of 1 min. at 72 °C. A final extension of 72 °C for 10 min was applied to all reactions. For each sample, one microliter of the PCR products were separated on a 3730 DNA analyser (Applied Biosystems; SCPG, Lismore, Australia) and raw data were imported into ABI Prism GeneMapper Software v 3.0 (Applied Biosystems) for size calling. All samples amplified successfully at the eight SSR loci, except for the three Eucalyptus species (E. urophylla, E. camaldulensis, E. globulus) that amplified only four of the loci (EMCRC26, EMCRC39, EMCRC46 and Eg126). The three samples were therefore removed from subsequent analyses. Diploid allele size data from SSRs were exported to an Excel spreadsheet for statistical analyses.

Statistical methods

Allelic counts were estimated for each informal group, i.e., yellow bloodwoods assemblage, red bloodwoods, and for Angophora using FSTAT computer programme, V 2.9 (Goudet, 2001), while the variance in allele size for each locus per group was computed from MS Excel spreadsheet. Cumulative variance was the sum of single locus variances, taking allele sizes (in bp) as values. Genetic distances based on allele size variation are modelled on the premise that when a mutation occurs, the new mutant is related to the allele from which it was derived. In this case, the difference in length between alleles contains phylogenetic information (Goldstein et al. 1995). Two measures were employed to estimate the between-individual genetic distance: the average square distance (D1) of Goldstein et al. (1995), and Nei's (1972) standard genetic distance (D). The average square distance accounts for size homoplasy, and is suitable for reconstructing trees that include more distantly related taxa. Both distances were computed using the MICROSAT programme available from the Human Population Genetics Laboratory (HPGL), Stanford University, with the option of either exhaustive or 100 bootstrap replicates. The allele sizes analysed were nucleotide counts rather than repeat scores, using the option that allows for repeat lengths = 1. Duration of linearity was calculated for each locus and averaged over loci. The primer error (size of the region flanking the SSR) was entered and corrected for, by assuming a default of no error (i.e., 0 nucleotides). Genetic distance matrices were imported into the computer programme PHYLIP (Felsenstein 1995) for phylogenetic tree reconstruction. Neighbour-Joining (NJ) trees were drawn using NEIGHBOUR with 100 bootstrap replications, using the Eucalyptopsis group (Eucalyptopsis, Stockwellia, Allosyncarpia) as an outgroup. All phylogenetic trees were displayed using TREEVIEW Version 1.5 available from the Department of Zoology, University of Glasgow. To take the small sample size into account, a second analysis was conducted for samples pooled into five main groups: three within Corymbia (yellow bloodwoods assemblage, red bloodwoods and paper fruited bloodwoods, Blackearia), Angophora and the outgroups. In subsequent discussion, the yellow bloodwoods will be termed Corymbia A, whereas the red bloodwoods will be referred to as Corymbia B, following the informal grouping by Steane et al. (2002)

Results and Discussion

Variability of SSRs

The eight markers used in this study were polymorphic with a total of 189 unique alleles obtained from 32 samples representing 29 different species. The most variable locus was EMCRC39 with 34 unique alleles, while the least variable was EMCRC93 with 14 alleles (Table 2). Corymbia A had greater intragroup diversity in terms of both the cumulative variance and the mean number of alleles (cum. variance = 204; MNA = 9.7) compared to either Corymbia B (cum. variance = 170; MNA = 9.6) or Angophora (cum. variance = 184; MNA = 7.0). However, eliminating C. torelliana (Cadagaria) from Corymbia A lowered the variance to 192, which was, nevertheless, still higher than the other groups. Our sampling of Corymbia A included three sections (Ochraria, Politaria and Cadagaria) and three series (Eximiae, Maculatae and Torellianae), while Corymbia B included two sections (Rufaria and Apteria) and seven series (see Table 1). The higher diversity within Corymbia A may relate to the fact that seven out of the eight markers used in this analysis were isolated from a clone of C. variegata, a species in Corymbia A, possibly making the Maculatae series (spotted gums: C. maculata, C. citriodora, C. henryi, C.variegata) more variable than species more distant from C. variegata, consistent with the principle of ascertainment bias.

Ascertainment bias

Ascertainment bias describes the observation that when the size distribution of microsatellite alleles across different species is compared, the absolute allele sizes in the species from which the microsatellite was derived are often greater than those found in closely related species (Ellegren et al., 1995; Forbes et al., 1995; Rubinstein et al., 1995). Ascertainment bias may result from either directional evolution occurring within different species (Rubinstein et al., 1995) or bias in the selection of clones for sequencing and primer development (Ellegren et al., 1995). Although there is no precedence for the utility of ascertainment bias as a phylogenetic probe, our data suggest that the means of allele size, averaged over loci, reflected the expected taxonomic distance, with the closest relatives of Corymbia A (from which the SSRs were developed; C. variegata) being Corymbia B, followed by Angophora, then the Eucalyptopsis group (Stockwellia/Eucalyptopsis/ Allosyncarpia; Table 2). Although it would be expected that the locus Eg 126 show a different pattern since it is based on Eucalyptus globulus, this was not observed, perhaps due to the lower polymorphism at this locus compared to the other loci used in this study. Whereas Stockwellia, Allosyncarpia and Eucalyptopsis successfully amplified at all eight loci, the three Eucalyptus species (E. urophylla, E. camaldulensis, E. globulus) failed to amplify in half of the loci studied (four out of eight). By morphology and fossil record, Eucalyptus is the closest clade to Corymbia and Angophora. It is not clear whether this failure to amplify Corymbia specific SSRs in Eucalyptus, while successfully amplifying all the loci in Eucalyptopsis group, reflects relative evolutionary distances, since the branch lengths for the Eucalyptopsis group and Eucalyptus relative to Corymbia were inconsistent between datasets (e.g., Hill and Johnson, 1995; Steane et al., 2002; Wilson et al., 2001; Parra-O et al., 2006). This observation may indicate that Eucalyptus is a faster evolving clade, thereby accumulating more mutations in the flanking sequences of the SSRs. SSR analysis excluded Arillastrum because available morphological and molecular data (Hill and Johnson, 1995; Udovicic and Ladiges, 2000; Wilson et al., 2001; Steane et al., 2002) put this genus the farthest from Corymbia among the eucalypts; Ladiges et al. (2003) suggested, based on biogeography, that the divergence of Arillastrum from the other eucalypt genera may be as old as Late Cretaceous (70 MYA; see also Crisp et al. (2004). These data suggested a potentially low prospect of transferring Corymbia SSRs to Arillastrum.

Phylogenetic relationships among the investigated genera

In this study, a Neighbour-Joining phylogenetic tree using both the average square and the standard genetic distances from the 189 alleles, showed Corymbia to be monophyletic (Figure 1). The topologies and bootstrap values for trees from the two distance measures were nearly identical, so only one such tree is presented. The tree had three major clades in the ingroup. Angophora, and Corymbia both formed monophyletic groups with moderate (71% and 81% respectively) statistical confidence. Corymbia split into three major clades, two of which corresponded to Corymbia A and B of Steane et al. (2002). However, samples of Corymbia that were new in this study, (C. torelliana, C. bella) clustered with Corymbia B rather than Corymbia A. The bootstrap support values for the partitioning of these three clades were, however, low. By pooling individual species into their traditional taxonomic groups (according to Hill and Johnson, 1995) similar phylogenetic relationships (Figure not shown) were recovered with high bootstrap support (97%). A simulated inclusion of species in the wrong (taxonomic) group caused group paraphyly, indicating that taxonomic aberrations such as lumping or oversplitting can cause paraphyly in phylogenetic assemblages.


The Eucalyptopsis group (Stockwellia, Eucalyptopsis, Allosyncarpia) was used as an outgroup because Eucalyptus, which would be an alternative outgroup for Corymbia-Angophora phylogeny, failed to amplify at four of the eight loci. The outgroup taxa formed a clade at the base of the tree, with the relationship (Stockwellia, Eucalyptopsis + Allosyncarpia). On flower development, the clade of these three rainforest genera had the relationship: Allosyncarpia, (Stockwellia + Eucalyptopsis) (Carr et al., 2002; Ladiges et al., 2003), which was also supported by Parra-O et al. (2006) based on a combined data set of nrETS and ITS. The monophyly of Corymbia has previously been proposed based on morphological and anatomical characters (Hill and Johnson, 1995; Ladiges et al., 1995), and recently, by DNA data from the ETS (Parra-O et al., 2006). However, data from ITS (Steane et al., 1999, 2002) and other regions of nrDNA and cpDNA (Udovicic et al., 1995; Udovicic and Ladiges, 2000; Wilson et al., 2001; Whittock et al., 2003) were either equivocal or suggested that the group may be paraphyletic.

Intrageneric relationships

SSR data have resolved Corymbia B as a monophyletic group (Figure 1) and the topology within the group was similar to that obtained from ITS data (Steane et al., 2002). The nesting of section Apteria (C. trachyphloia) within Rufaria was in agreement with results based on the ETS (Parra-O et al., 2006) and ITS (Steane et al., 2002) data. However, our results differ slightly from the ETS with regards to the relationships between sections Politaria, Ochraria and Blakearia. Whereas the ETS data show Ochraria and Blakearia as sister taxa relative to Politaria, SSR data support the position of Ochraria as more closely related to Politaria than to Blakearia (Figure 1) revealed by morphological data analyses (Hill and Johnson, 1995). Parra-O et al., (2006) attribute this discrepancy to taxon sampling and the absence of C. torelliana in their dataset. Our study included Cadagaria (C. torelliana) and still supported the closer relation between Politaria and Ochraria. As in all molecular data so far (ITS, trnL, trnH, psbA, ETS), SSR data suggests that Corymbia would be paraphyletic without the inclusion of Blakearia, contrary to the classification of Brooker (2000).

The relationships within Politaria are inconsistent with previously published datasets (Hill and Johnson, 1995; Asante et al., 2001; Steane et al., 2002; McDonald et al., 2000; King, 2004; Parra-O et al., 2006), possibly reflecting a high rate of interspecific hybridization among these taxa. In the SSR dataset, C. maculata was not the closest relative of C. variegata and C. henryi. ITS data (Steane et al., 2002) showed the four spotted gums as being a clade, although C. maculata was highly divergent, with eight base differences, whereas C. citriodora and C. variegata were shown to be indistinguishable. Compared with ITS sequences (Steane et al., 2002), the SSR data were more effective in resolving the relationships of Angophora species. ETS combined with ITS, however, were more informative than ITS alone (Parra-O et al., 2006), and corroborated the SSR phylogeny.

SSRs were useful in eucalypt phylogeny

It is a widely held view that SSRs may not be useful in phylogenetic studies above the species level (e.g., Streelman et al., 1998; Zhu et al., 2000). Hence it is not expected that SSRs would resolve among-genera phylogenetic relationships in the eucalypts. However, analysis of SSRs in this study recovered a tree topology congruent to those based on analyses of morphological characters and a combined ETS/ITS dataset. The following factors may explain why SSRs appear to retain informative phylogenetic signals superior to some genomic regions such as the ITS:

Appropriate genetic distance measures

Homoplasy is expected under the stepwise mutation model (SMM, Kimura and Ohta, 1978), which assumes loss or gain, with equal probability, of a single repeat unit through mutation. However, the infinite allele model (IAM, Kimura and Crow, 1964) expects no homoplasy because a mutation is assumed to result in an allelic state not previously encountered in the population. Several genetic distances that make different assumptions have been developed for use with microsatellite data, however, the appropriateness of each of these distance methods will vary from case to case, depending on the model of microsatellite evolution, mutation rates, effective population size, and time since divergence. The ideal distance measure will therefore depend on the characteristics of the SSRs and on the phylogenetic question being addressed. Since it was not clear under what model the SSRs used in this study evolved, we used two genetic distance measures: the SMM model based average square distance (dµ)2; analogous to D1) of Goldstein et al. (1995), and Nei's (1972) IAM based standard genetic distance (Gst). The average square distance (Goldstein et al., 1995) addresses size range constraints, thereby accounting for homoplasy. The distance retains linearity with increasing evolutionary distance, and hence is suitable for reconstructing trees that include more distantly related taxa (Goldstein et al., 1995; Pollock et al., 1998; Petren et al., 1999; Ritz et al., 2000; Richard and Thorpe, 2001). This distance has been successfully used in recovering well-corroborated phylogenetic hypotheses in a number of studies involving divergent taxa (e.g., Petren et al., 1999; Ritz et al., 2000; Richard and Thorpe, 2001). On the other hand, Nei's (1972) distance is expected to become more linear while the linearity of average square distance wanes as the SSR mutations become more like the IAM model (Goldstein et al., 1995). In the Bovini study (Ritz et al., 2000) reviewed earlier, the authors used the genetic distance measure, (dµ)2 (Goldstein et al., 1995) to account for size homoplasy. They found the measure to be robust despite fluctuations in population size, and retained linearity with increasing time. In our analysis, both distances recovered a similar tree topology. One way to explain this observation is that the data comprised a minimum proportion of homoplasious alleles. Also, this may suggest that SSRs in eucalypts (albeit Eucalyptus) evolve at a lower rate and are highly conserved, both in the repeats and in the flanking regions.

Range constraint and size homoplasy

Homoplasy may arise due to (i) mutations in microsatellite repeat region that result in alleles being similar in state but not by descent, and (ii) a constraint to the upper (and sometimes lower) bound on the number of repeat units at a locus may exacerbate homoplasy in the repeat region, as these size limits allow only a finite number of character states. (iii) Insertion and deletions in the flanking region making alleles similar in state but not by descent. At longer time intervals, homoplasy is expected to increase, while phylogenetic signals move to obscurity as saturation is approached (Takazeki and Nei, 1996). Our data and results do not support a likelihood of phylogenetic signal saturation for the following reasons: (i) The average of the means of allele size range for each clade (Corymbia A, Corymbia B, Angophora, Eucalyptopsis group) considered separately across all loci was (33.6), while the mean for all species combined was almost twice that value (51; Table 2). This suggested that saturation of phylogenetic signal through homoplasy due to range constraint was minimal because the allele size range of subgroups did not reach the total observed allele size range. (ii) The sizes of most alleles in the dataset differed by a number divisible by their repeat unit length, implying a low likelihood of homoplasy due to mutations in the regions flanking the repeats. Insertions and deletions should be equally likely to involve odd and even numbers of bases (iii) in theory, variation in the amount of size homoplasy is expected among SSR loci because variation in mutation rates reflects the stochasticity among loci of the coalescence process (Garza and Freimar, 1996). However, the bootstrap support for tree topology recovered in our analysis of eight SSRs reflected concordance among loci. Bootstrapping characters from loci with varied levels of homoplasy is expected to recover discordant phylogenetic hypotheses, usually signified by low bootstrap values on the consensus tree.

'Below threshold' number of generations

Corymbia and Angophora have diverged for about 30 million years (Crisp et al., 2004) which corresponds to three million generations, if the natural generation time (without human selection) of 10 years (L.D. Pryor, FAO corporate document depository) is considered. The properties that limit SSR use in phylogenetics (mutation rates, size constraint and homoplasy) relate to the number of generations since the divergence of taxa, rather than to their classification. If SSRs correctly resolved phylogenies of lizards that have diverged for five million generations (Richard and Thorpe, 2001), then they may recover the correct phylogeny for eucalypt genera that have diverged for three million generations, assuming the mutation rates are comparable. Notably, Richard and Thorpe (2001) analysed only five SSR loci, and the results corroborated the true and confirmed organismal phylogeny. Apart from the average square distance of Goldstein et al. (1995), the authors utilized other distances such as Nei's (1972) Gst and allele sharing statistic (PSA) for comparison. Their data contradicted the expectation that the SSR genetic distances may lose linearity after several thousands of generations, essentially due to range constraints in allele sizes (Feldman et al., 1997). As the authors noted, the fact that the essentials of a well-corroborated tree can be reconstructed from such a relatively small number of loci argue for their utility in this area. As stated in the introduction, SSR length variation has also been used in reconstructing the phylogeny of Darwin's finches, which are believed to have radiated at least three MYA, corresponding to over five million generations. (Petren et al., 1999). Apart from the factors discussed above, eucalypts are tree species with temporal heterogeneity in outcrossing rates (Moran and Brown 1980) and flowering asynchrony that affects pollinator behaviour (Southerton et al., 2004). This may lower their effective population sizes (Ne). The risk of homoplasy would be less for taxa with small effective population sizes (Estoup et al., 2002).

Sampling variance and phylogenetic reconstruction

We analysed the genetic distances among species, represented by a single individual in each case. There has been considerable discussion regarding the optimal sample size in population genetic analyses, with some workers recommending large samples sizes to account for sampling variance (e.g., Nei, 1978; Ruzzante, 1998). In this study, pairwise genetic distance between individuals rather than allele frequencies are relevant. Kalinowski (2005) recently simulated the relationship between sample size, polymorphism, and the coefficient of variation of genetic distances derived from microsatellite markers. He found that when the differentiation among the taxonomic units to be measured is large, one or two samples per group would give similar results to a large sample size. Increasing sample size under a large FST scenario produced diminishing effect on the coefficient of variation of the genetic distance. Kalinowski's simulated data showed that the rate at which increasing sample size decreased the coefficient of variation was determined principally by the amount of differentiation between populations. This means that more individuals are necessary only when the degree of differentiation is low. In the case of eucalypt genera Corymbia and Angophora, the differentiation in question is among species rather than just between populations of the same species. Hence the between species and between genera FST values are expected to be large since the two genera have diverged for tens of millions of years (Ladiges et al., 2003). Apart from SSRs, proteins have been used in phylogenetic reconstruction. Demastes and Remsen (1994) analysed allozyme variation to reconstruct the phylogeny of eight bird genera in the family Cardinalinae, using a single individual to represent each genus in the family. Their tree topologies supported phylogenetic analyses of morphological characters. As the authors noted, in a phylogenetic context the priority switches from more samples to more phylogenetic characters (Demastes and Remsen, 1994, and references therein). We are aware that allozymes are less polymorphic compared to SSRs, however, Kalinowski's (2005) simulation addresses this difference in variability and its implications. When we pooled samples into their prevailing taxonomic groups (according to Hill and Johnson, 1995) and conducted phylogenetic analysis as described for ungrouped samples, using the same distance measures and tree methods, the tree topology recovered was congruent to that obtained for ungrouped samples. In part, grouping of samples into larger taxonomic assemblages catered for the few samples per species (most species were represented by a single sample) analysed in the individual-specific distance measures. Also, for grouped samples, we wanted to estimate the group effect for each taxonomic assemblage. If, for some reason, a species were classified under an assemblage where it does not belong in a molecular genetic sense, then we would expect to see a relationship shift in tree topology.

Acknowledgment

We thank Dean Nicolle of Currency Creek Arboretum, ACT and David Lee, DPI Queensland, Australia, for supplying tissues of some of the new samples used in this study. This work forms part of J.W. Ochieng's Doctoral Research, supported by a Commonwealth Scholarship and an Australian Research Council Linkage grant LP0455522.

Data analysed in this study can be obtained on arrangement with the communicating author.

Internet Resources

MICROSAT programme: Human Population Genetics Laboratory (HPGL), Stanford University http://hpgl.stanford.edu/.

TREEVIEW Version 1.5 http://taxonomy.zoology.gla.ac.uk/ rod/.

Received: February 23, 2007; Accepted: July 3, 2007.

Associate Editor: Márcio de Castro Silva Filho

  • Asante KS, Brophy JJ, Doran JC, Goldsack RJ, Hibbert DB and Larmour JS (2001) A comparative study of the seedling leaf oils of the spotted gums: Species of the Corymbia (Myrtaceae), section Politaria Aust J Bot 49:55-66.
  • Avise JC (2000) The History and Formation of Species. Harvard University Press, Cambridge, Massachusets, 439 pp.
  • Avise JC, Ankney CD and Nelson WS (1990) Mitochondrial gene trees and the evolutionary relationship of mallard and black ducks. Evolution 44:1109-1119.
  • Bayly MJ and Ladiges PY (2007) Divergent paralogues of ribosomal DNA in eucalypts (Myrtaceae). Mol Phylogenet Evol 44:346-356.
  • Brooker MIH (2000) A new classification of the genus Eucalyptus L'Her. (Myrtaceae). Aust Syst Bot 13:79-148.
  • Bruford MW and Wyne RK (1993) Microsatellites and their application to population genetic studies. Curr Opin Genet Dev 3:939-943.
  • Carr DJ, Car SGM, Hyland BPM, Wilson PG and Ladiges PY (2002) Stockwellia quadrifida (Myrtaceae), a new genus and species in the eucalypt group. Biol J Linn Soc 139:415-421.
  • Cavalli-Sforza LL and Edwards WF (1967) Phylogenetic analysis: Models and estimation procedures. Evolution 21:550-570.
  • Crisp M, Cook L and Steane DA (2004) Radiation of the Australian flora: What can comparisons of molecular phylogenies across multiple taxa tell us about the evolution of diversity in present-day communities? Phil Trans R Soc B 359:1551-1571.
  • Demastes JW and Remsen Jr. JC (1994) The genus Caryonthraustes (Cardinalinae) is not monophyletic. Wilson Bull 106:733-738.
  • Dumolin-Lapegue S, Demesure B, Fineschi S, Le Corre V and Petit RJ (1997) Phylogeographic structure of white oaks throughout the European continent. Genetics 146:1475-1487.
  • Ellegren H, Primmer CR and Sheldon BC (1995) Microsatellite evolution: Directionality or bias in locus selection. Nat Genet 11:360-362.
  • Estoup A, Jarne P and Cornuet J-M (2002) Homoplasy and mutation model at microsatellite loci and their consequences for population genetics analysis. Mol Ecol 11:1591-1604.
  • Feldman MW, Bergman A, Pollock DD and Goldstein DB (1997) Microsatellite genetic distances with range constraints: Analytical description and problems of estimation. Genetics 29:207-216.
  • Felsenstein J (1995) PHYLIP: Phylogeny Inference Package. University of Washington, Seattle.
  • Forbes SH, Hogg JT, Buchanan FC, Crawford AM and Allendorf FW (1995) Microsatellite evolution in congeneric mammals: Domestic and Bighorn sheep. Mol Biol Evol 12:1106-1113.
  • Garza JC and Freimer NB (1996) Homoplasy for size at microsatellite loci in humans and chimpanzees. Genome Res 6:211-217.
  • Goldstein DB and Pollock DD (1997) Launching microsatellites: A review of mutation processes and methods of phylogenetic inference. Heredity 88:335-342.
  • Goldstein DB, Roemer G, Smith D, Reich DE, Bergman A and Wayne R (1999) The use of microsatellite variation to infer population structure and demographic history in a natural model system. Genetics 151:797-801.
  • Goldstein DB, Ruiz LA, Cavalli-Sforza LL and Feldman MW (1995) An evaluation of genetic distances for use with microsatellite loci. Genetics 139:463-471.
  • Goudet J (2001) A computer program to calculate F-statistics. Heredity 86:485-456.
  • Hill KD and Johnson LAS (1995) Systematic studies in the eucalypts 7. A revision of the bloodwoods, genus Corymbia (Myrtaceae). Telopea 6:185-504.
  • Johnson LAS and Briggs BG (1984) Myrtales and Myrtaceae - A phylogenetic analysis. Ann Mo Bot Gard 71:700-756.
  • Jones ME, Stokoe RL, Cross MJ, Scott LJ, Maguire TL and Shepherd M (2001) Isolation of microsatellite loci from spotted gum (Corymbia variegata), and cross-species amplification in Corymbia and Eucalyptus. Mol Ecol Notes 1:276-278.
  • Kalinowski ST (2005) Do polymorphic loci require large sample sizes to estimate genetic distances? Heredity 94:33-36.
  • Kimura M and Crow JF (1964) The number of alleles that can be maintained in a finite population. Genetics 49:725-738.
  • Kimura M and Ohta T (1978) Stepwise mutation model and distribution of allelic frequencies in a finite population. Proc Natl Acad Sci USA 75:2868-2872.
  • King R (2004) Spatial structure and population genetic variation in a Eucalypt species complex. PhD Thesis, Griffith University, Australia.
  • Ladiges PY, Udovicic F and Drinnan AN (1995) Eucalypt phylogeny - Molecules and morphology. Aust Syst Bot 8:483-497.
  • Ladiges PY and Udovicic F (2000) Comment on a new classification of the Eucalypts. Aust Syst Bot 13:149-152.
  • Ladiges PY, Udovicic F and Nelson G (2003) Australian biogeographical connections and the phylogeny of large genera in the plant family Myrtaceae. J Biogeogr 30:989-998.
  • Lu Y (2001) Roles of lineage sorting and phylogenetic relationship in the genetic diversity at the self-incompatibility locus of Solanaceae. Heredity 86:195-205.
  • Maddison WP (1997) Gene trees in species trees. Syst Biol 46:523-536.
  • McCracken KG and Sorenson MD (2005) Is homoplasy or lineage sorting the source of incongruent mtDNA and nuclear gene trees in the stiff-tailed ducks (Nomonyx-Oxyura)? Syst Biol 54:35-55.
  • McDonald MW, Butcher PA, Bell JC and Larmour JS (2000) Intra- and interspecific allozyme variation in eucalypts from the spotted gum group, Corymbia, section 'Politaria' (Myrtaceae). Aust Syst Bot 13:491-507.
  • McKinnon GE, Steane DA, Potts BM and Vaillancourt RE (1999) Incongruence between chloroplast and species phylogenies in Eucalyptus subgenus Monocalyptus (Myrtaceae). Am J Bot 86:1038-1046.
  • Moran GF and Brown AHD (1980) Temporal heterogeneity of outcrossing rates in alpine ash (Eucalyptus delegatensis R.T. Bak.). Theor Appl Genet 57:101-105.
  • Nei M (1972) Genetic distance between populations. Am Nat 106:283-292.
  • Nei M (1978) Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89:583-590.
  • Ochieng JW, Henry RJ, Baverstock PR, Steane DA and Shepherd M (2007) Nuclear ribosomal pseudogenes resolve a corroborated monophyly of the eucalypt genus Corymbia despite misleading hypotheses at functional ITS paralogs. Mol Phylogenet Evol 44:752-764.
  • Parra-O C, Bayly M, Udovicic F and Ladiges P (2006) ETS sequences support the monophyly of the eucalypt genus Corymbia (Myrtaceae). Taxon 55:653-663.
  • Peakall R, Gilmore S, Keys W, Morgante M and Rafalski A (1998) Cross-species amplification of soybean (Glycine max) simple sequence repeats (SSRs) within the genus and other Legume genera: Implications for the transferability of SSRs in plants. Mol Biol Evol 15:1275-1267.
  • Petren KB, Grant R and Grant PR (1999) A phylogeny of Darwin's finches based on microsatellite DNA length variation. Proc R Soc London B 266:321-329.
  • Pollock DD, Bergman A, Feldman MW and Goldstein DB (1998) Microsatellite behaviour with range constraints: parameter estimation and improved distances for use in phylogenetic reconstruction. Theor Pop Biol 53:256-271.
  • Pryor LD and Johnson LAS (1971) A Classification of the Eucalypts. Australian National University Press, Canberra, 102 pp.
  • Richard M and Thorpe RS (2001) Can microsatellites be used to infer phylogenies? Evidence from population affinities of the western Canary Island lizard (Gallotia galloti). Mol Phylogenet Evol 20:351-360.
  • Ritz LR, Glowatzki-Mullis ML, MacHugh DE and Gaillard C (2000) Phylogenetic analysis of the tribe Bovini using microsatellites. Anim Genet 31, 178-185.
  • Rodrigo AG, Kelly-Borges M, Bergquist PR and PL Bergquist (1993) A randomization test of the null hypothesis that two cladograms are sample estimates of a parametric phylogenetic tree. New Zeal J Bot 31:257-268.
  • Rubinstein DC, Amos W, Leggo J, Goodburn S, Jain S, Li S-H, Margolis RL, Ross CA and Ferguson-Smith MA (1995) Microsatellite evolution - Evidence for directionality and variation in rate between species. Nat Genet 10:337-343.
  • Ruzzante DE (1998) A comparison of several measures of genetic distance and population structure with microsatellite data: Bias and sampling variance. Can J Fish Aquat Sci 55:1-14.
  • Shepherd M, Kasem S, Lee D and Henry R (2006) Construction of microsatellite genetic linkage maps for Corymbia Silvae Genet 55:228-238.
  • Southerton SG, Birt P and Ford HA (2004) Review of gene movement by bats and birds and its potential significance for eucalypt plantation forestry. Aust For 67:44-53.
  • Steane DA, McKinnon GE, Vaillancourt RE and Potts BM (1999) ITS sequence data resolves higher level relationships among the eucalypts. Mol Phylogenet Evol 12:215-223.
  • Steane DA, Nicolle D, McKinnon GE, Vaillancourt RE and Potts BM (2002) Higher-level relationships among the eucalypts are resolved by ITS-sequence data. Aust Syst Bot 15:49-62.
  • Streelman JT, Zardoya R, Meyer A and Karl SA (1998) Multi-locus phylogeny of cichlid fishes (Pisces, Perciformes): Evolutionary comparison of microsatellite and single-copy nuclear loci. Mol Biol Evol 15:798-808.
  • Takahashi K, Terai Y, Nishida M and Okada N (2001) Phylogenetic relationships and ancient incomplete lineage sorting among cichlid fishes in Lake Tanganyika as revealed by analysis of the insertion of retroposons. Mol Biol Evol 18:2057-2066.
  • Takezaki N and Nei M (1996) Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics 144:389-399.
  • Thamarus KA, Groom K, Murrel J, Byrne M and Moran GF (2002) A genetic linkage map for Eucalyptus globulus with candidate loci for wood, fibre, and floral traits. Theor Appl Genet 104:379-387.
  • Udovicic F and Ladiges PY (2000) Informativeness of nuclear and chloroplast DNA regions and the phylogeny of the eucalypts and related genera (Myrtaceae). Kew Bullet 55:633-645.
  • Udovicic F, McFadden GI and Ladiges PY (1995) Phylogeny of Eucalyptus and Angophora based on 5S rDNA spacer sequence data. Mol Phylogenet Evol 4:247-256.
  • Whittock S, Steane DA, Vaillancourt RE and Potts BM (2003) Molecular evidence shows that the tropical boxes (Eucalyptus subgenus Minutifructus) are over-ranked. Trans R Soc S Aust 127:27-32.
  • Whitton J, Rieseberg LH and Ungerer MC (1997) Microsatellite loci are not conserved across the Asteraceae. Mol Biol Evol 14:204-209.
  • Wilson PG, O'Brien MM, Gadek PA and Quinn CJ (2001) Myrtaceae revisisted: A reassessment of infrafamilial groups. Am J Bot 88:2013-2025.
  • Zhu Y, Queller DC and Strassmann JE (2000) A phylogenetic perspective on sequence evolution in microsatellite loci. J Mol Evol 50:324-338.
  • Zink RM (2002) A new perspective on the evolutionary history of Darwin's finches. The Auk 119:864-871.
  • Send correspondence to:
    Joel W. Ochieng
    Present address: Faculties of Agriculture and Veterinary Medicine
    University of Nairobi
    P.O. Box 29053, 00625 Nairobi, Kenya
    E-mail:
  • Publication Dates

    • Publication in this collection
      13 Dec 2007
    • Date of issue
      2007

    History

    • Received
      23 Feb 2007
    • Accepted
      03 July 2007
    Sociedade Brasileira de Genética Rua Cap. Adelmio Norberto da Silva, 736, 14025-670 Ribeirão Preto SP Brazil, Tel.: (55 16) 3911-4130 / Fax.: (55 16) 3621-3552 - Ribeirão Preto - SP - Brazil
    E-mail: editor@gmb.org.br