Introduction

Ribosomal DNA (rDNA) encodes the four essential genes needed for ribosome function: the 5S, 5.8S, 18S, and 28S rRNAs. They have been intensively studied at the cytogenetic and molecular levels. Probes derived from their conserved regions hybridise to chromosomes of diverged biological taxa, making rDNAs the first choice chromosome marker. This is probably the reason why a molecular cytogenetic approach became popular among (cyto)taxonomists in systematics studies. Development and widespread usage of fluorescence in situ hybridisation (FISH) techniques (Pinkel et al. 1986; Leitch et al. 1994) enabled to map rDNA loci on chromosomes of thousands of species over past decades until present. Employing rDNA-FISH may also provide information about the condensation status of rDNA chromatin representing, thus, a useful complement to molecular and cytogenetic (silver staining) expression studies.

rDNA evolves under the concept of concerted evolution (Zimmer et al. 1981; Dover 1982), a process maintaining its homogeneity and functionality. The process is believed to be mediated by homologous and non-homologous recombination and gene conversion. One puzzling feature is that despite overall sequence conservation (Averbeck and Eickbush 2005), rDNA tends to change the copy number (McTaggart et al. 2007; Wang et al. 2017) and position on chromosomes rapidly (Schubert and Wobus 1985; Dubcovsky and Dvorak 1995; Roy et al. 2005). Occasionally, studies have detected changes in the chromosomal location and size of specific rDNA arrays (loci). For example, the location of arrays differed between sibling species within the Drosophila melanogaster complex (Lohe and Roberts 1990). Rapid changes were also suggested in populations of the brown trout, Salmo trutta (Castro et al. 2001), and in the grasshoppers, Eyprepocnemis plorans (Cabrero et al. 2003) and Podisma pedestris (Veltsos et al. 2009).

The amount of literature containing cytogenetic rDNA data has been steadily increasing in the last years (Fig. 1). For illustration purposes, the searches of WOS (Web of Science) database and Google Scholar using keywords such as “*rDNA* and *chromosome* and *in situ hybridisation* and *animal*” have yielded more than 500 results receiving annually more than 1100 citations. The literature is probably even more extensive since our search conditions were quite stringent and some works are published in non-indexed journals, conference proceedings, doctoral theses, and various monographs. Given the interest of such data through the number of publications in this area in recent times (approximately 50% of the publications listed in the database are from the last 6 years; more than 60 papers related with the topic have been published just in 2016), there is a need of assembling, storing, and analysing such information. Therefore, with the purpose of providing a tool allowing a better and easier use of animal rDNA cytogenetic information on the number and position of loci available, we have constructed the animal rDNA database. The resource is freely accessible at www.animalrdnadatabase.com representing a parallel to the plant rDNA database (www.plantrdnadatabase.com), created by our team previously (Garcia et al. 2012), providing the same information on plants. We have also analysed the database searching for relationships between the number of 5S and 45S (nucleolus organiser regions (NOR)) loci and for their preferential position (if any) on chromosomes.

Fig. 1
figure 1

Number of publications included in the database over nine successive 5-year periods and the 6-year period 2011–2016, between 1965 and 2016

Methods

Data assembly

The database comprises information about the number and position of rDNA in animal species collected until the end of 2016, coming from 541 publications. Papers were compiled by searching Thomson Reuters WOS, MEDLINE/PubMed, Scopus, and Google Scholar using the queries “rDNA and chromosome,” “rRNA genes and chromosome,” “rDNA and karyotype,” “rRNA and karyotype,” and “rDNA and in situ hybridisation.” Most journals were categorised within the areas of Genetics and Heredity, Biochemistry and Molecular Biology, Zoology, and Multidisciplinary Journals. The majority (~ 95%) of data are coming from fluorescent in situ hybridisation using 45S (18S, 28S, and internal transcribed spacer) and 5S rDNA probes. A smaller (~ 5%) proportion of entries was obtained from older studies based on radioactive hybridisation methods and morphological observation of secondary constrictions after the staining by classical histochemical dyes. We also aimed to include as many model representative species as possible. Together with basic information on the number, position, and linked/unlinked arrangement of rRNA genes, the resource also supplies data on chromosome number and genome size (taken from Gregory 2017) and whether the rDNA occurs on B chromosomes. The diploid locus numbers (sites) are presented as a range and mean. Three main categories of rDNA positions in chromosomes were distinguished: (i) pericentromeric = proximal sites (counting pericentromeric and centromeric positions together); (ii) distal = terminal sites (counting telomeric and subtelomeric positions together); and (iii) interstitial sites. In few cases where the hybridisation signals occupy whole chromosome arm, the database returns a “whole arm” position.

Web site construction

The tabular database structure comprising the information on number and position of rDNA loci and the source publications was created in SQL (structured query language) tables on a MySQL server. Each table had its own different field type and size. The initial spreadsheet table in which the data were compiled was exported to a CSV (comma-separated values) file. A unique ID was given for each entry together with the date and time of the export and the version of the data. Then it was imported to the SQL database (www.animalrdnadatabase.com). The website was programmed in HTML (HyperText Markup Language), CSS (Cascading Style Sheets), and JS (Javascript) for visualisation; the custom functions in PHP (PHP: Hypertext Preprocessor) were written to query the database, process, and display the data.

Statistical analysis

Basic statistics such as mode, average, and median were obtained through MsExcel functions. Shapiro-Wilk test for normality, Pearson’s correlation, and Mann-Whitney U test were performed in MsExcel and RStudio, v.0.98.1078, a user interface for R (www.rstudio.com). If a species had rDNA locus numbers that differed between or within populations, we treated each difference as a separate record. The assessment of rDNA numbers, rDNA positions, and chromosome morphologies comes exclusively from the literature, i.e. based on each authors’ evaluation, when available. In few cases where this piece of information was not explicitly mentioned in the article, the assessment was done by evaluating of provided in situ hybridisation images.

Availability of data and materials

All data generated or analysed during this study are included in this published article, its supplementary information files, and the internet web site (http://www.animalrdnadatabase.com/).

Results and discussion

The created database (www.animalrdnadatabase.com) includes 1358 karyotypes (Table 1). The total number of species is 1343 from eight phyla, which roughly reflect animal kingdom diversity. Yet, despite this good deal of information, much of the cytogenetic data is still missing, e.g. for model species such as Daphnia magna (a small planktonic crustacean) or Columba livia (pigeon). Surprisingly, little cytogenetic information also exists for domesticated animals such as cats and reports in Felidae are limited to classical karyological studies in the leopard (Tanomtong et al. 2008). Furthermore, except for a few fish (Mantovani et al. 2005) and insect (Cabrero et al. 2003) genera, the interpopulation-level studies assessing cytogenetic variability have been rarely attempted. Hence, the database content could be a useful source for further research.

Table 1 Species representation of the rDNA database

Number of loci per karyotype

Considering the whole database (karyotypes), the average number of 45S and 5S sites per diploid chromosome set (2C) was 3.8 and 4.5, respectively (Supplementary Table S1). The median was two sites (single locus/1C) for both 45S and 5S rDNA, respectively, indicating that most karyotypes tend to maintain locus numbers moderately low. Relatively large differences between means and medians indicated a non-Gaussian distribution of values (also revealed by significant results in the Shapiro-Wilk tests). Indeed, in each group, we identified several karyotypes largely deviating from the average (Fig. 2). The maximum numbers of 45S sites were found in the Amazonian fish Schizodon fasciatus (54/2C, de Barros et al. 2017) and the brook trout Salvelinus fontinalis (50/2C, Fujiwara et al. 1998). In mammals, the maximum number of 45S loci was identified in Mus pahari (rodent) having 42 sites/2C (Britton-Davidian et al. 2012). The maximum numbers of 5S sites were found in the neotropical lizards from the Teiidae family, Kentropyx calcarata (68/2C) and K. pelviceps (74/2C) (Carvalho et al. 2015). These karyotypes apparently account for relatively high average number of 5S loci in reptiles (Fig. 2). In mammals, the highest number of 5S loci was found in Rhinolophus hipposideros (bat) having 18 sites/2C (Puerma et al. 2008). About 12% species showed variation at the species level (Supplementary Table S2). The variation is explained by the presence of rDNA loci in sex chromosomes and supernumerary B chromosomes (both particularly frequent in insects), polyploidy (mainly in fish), and overall interpopulation variation. One also has to consider variation arising from differential experimental approaches used in labs.

Fig. 2
figure 2

Number of 5S and 45S rDNA sites in different animal taxa. Values are presented for the diploid karyotypes. Black dots indicate the average number of sites per group; lines show the range. The relatively high average number of 5S sites in reptiles is explained by an exceptionally high number of loci recorded in some members of the Teiidae family (Carvalho et al. 2015) and generally few data available for this group

Factors influencing rDNA loci multiplicity

About 60% karyotypes (766/1277) had a single 45S locus and 57% of karyotypes (358/628) had a single 5S locus. Karyotypes with multiple loci (both 5S and 45S) occurred almost in every group. In insects, multiple loci were found mostly in Orthoptera (e.g. grasshoppers, crickets, and locusts). These species are known to have relatively large genomes (Gregory 2017). Because of the known correlation between genome size and number of rDNA copies (Prokopowich et al. 2003), it is possible that dispersion of rDNA across chromosomes is related to their large genome sizes (~ 10 pg/2C, (Rees et al. 1978)). However, genome size cannot explain the high number of rDNA loci in actinopterygian fishes (e.g. Ráb et al. 2002; Mantovani et al. 2005; Cioffi et al. 2010; da Silva et al. 2011; Lima-Filho et al. 2014; Sember et al. 2015; Symonová et al. 2017) that generally harbour small genomes (~ 1 pg/2C). The increased number of loci could also be related to the large number of rDNA pseudogenes reported in some grasshopper genomes (Keller et al. 2006). In contrast, a whole genomic study in Esox lucius (Northern pike, fish) did not reveal increased pseudogenisation of highly (> 20,000 copies) amplified 5S genes (Symonová et al. 2017), suggesting that amplification does not automatically lead to pseudogenisation and that retention of pseudogenes varies between the genomes.

The amplification of rDNA has often been attributed to polyploidy (Gornung 2013). However, species with extremely large number of chromosomes (> 100/2C) do not automatically exhibit a high number of loci (Fig. 3 and Supplementary Table S3). For example, members of the arthropod genus Austropotamobius (2n = 176) harbour only four 45S sites (Mlinarec et al. 2016). Similarly, the fish Acipenser baerii and A. transmontanus (2n = 262) show only moderate numbers of 5S (four) and 45S (11) sites (Fontana et al. 2003). Reduction of rDNA in these enlarged karyotypes could be related to the “genomic shock” following polyploidy events (Semon and Wolfe 2007; Garcia et al. 2017). On the other hand, some moderate karyotypes harbour high number of rDNA loci. For example, in Ctenogobius smaragdus (emerald goby, 2n = 48, Lima-Filho et al. 2014), S. fontinalis (fish, 2n = 84, Fujiwara et al. 1998), and M. pahari (mouse, 2n = 48, Cazaux et al. 2011), the loci were distributed across 91, 88, and 50% of chromosomes, respectively. Certainly, polyploidy cannot explain large numbers of loci in these species and other mechanisms such as interlocus recombination (Cazaux et al. 2011), transposon activity (Symonová et al. 2013), and integration of extrachromosomally replicated rDNA (Cohen et al. 2010) should be considered.

Fig. 3
figure 3

Plots showing a relationship between chromosome number (x-axis) and rDNA sites (y-axis). The arrow marks a typical 2n = 40 karyotype in the Mus genus showing the striking variation in the number of 45S but not 5S rDNA sites

Mutual relationships between 5S and 45S rDNA

Numerous case studies indicate likely independent amplification events of 5S and 45S rDNA in the genomes. For example, in the Mus genus, large (> 10-fold) variation occurs in 45S locus numbers (Britton-Davidian et al. 2012) without concomitant variation in 5S loci (Matsuda et al. 1994) (Fig. 3). Furthermore, two different cytotypes (2n = 52 and 2n = 54) of the Amazonian fish Erythrinus erythrinus (Cioffi et al. 2010) varied as much as 11-fold in the number of 5S loci while that of 45S loci was constant. Similarly, grasshopper genomes show extensive but independent variation in the number of 5S and 45S rDNA clusters (Cabral-de-Mello et al. 2011). Our comparative analysis of more than 500 karyotypes (Supplementary Fig. S1 and Supplementary Table S4) revealed that the numbers of both loci are not correlated (Pearson, r = 0.047, p value > 0.05). On the other hand, 43% karyotypes showed the same number of 45S and 5S loci, suggesting a potential relationship. However, the majority (89%) of equinumber karyotypes harboured a single locus of each, and the equinumber karyotypes with multiple loci were relatively rare (11%) which can be explained by a general tendency of genomes to keep the number of both loci low (Fig. 2).

In plants, equality of 45S and 5S loci was detected in 33% of karyotypes and a significant correlation (p < 0.005) between the number of 5S and 45S was observed (Garcia et al. 2017). This can be accounted to frequent whole genome duplications in plants through which both loci are equally multiplied. In animals, about 75% of karyotypes had 5S and 45S loci on different chromosomes (separate arrangement), while 25% of karyotypes had at least one chromosome bearing both loci (colocalised). Thus, a tendency towards 5S and 45S colocalisation on the same chromosome does not appear to be as strong as in plants where colocalisation occurs in 58% of genera (Roa and Guerra 2015). Perhaps, this could be related to the increased number of loci in plants (median for 45S and 5S sites is 4/2C) (Roa and Guerra 2012; Garcia et al. 2017)) compared to the animals where medians are generally lower (2 sites/2C (Supplementary Table S1)). Colocalisation may also stimulate recombination frequency between both loci possibly leading to their physical linkage and formation of 45S-5S units. Of note, linked 45S-5S units are relatively common in plants (Garcia et al. 2009; Wicke et al. 2011; Garcia and Kovarik 2013) while in animals, they have been described in few arthropods (Drouin et al. 1992) and crustaceans (Drouin and de Sá 1995) so far. The number of 5S and 45S rRNA gene copies seems to be harmonised following concerted copy number variation in human and mouse (Gibbons et al. 2015). Thus, there may not be a simple relationship between the number of loci and the number of copies since gene richness may differ between loci. In this context, the size of nucleoli has been correlated with the number of 45S ribosomal RNA genes in amphibians (Miller and Brown 1969).

Position of rDNA on chromosomes

In the literature, there have been considerable debates over “randomness” of rDNA chromosomal positions (Hillis and Dixon 1991; Gornung 2013; Roa and Guerra 2015; Garcia et al. 2017). The information gathered in this database allowed us to address the question of preferential position (if any) of rDNA in chromosomes. We selected groups (Fig. 4) containing at least 40 species allowing robust statistical evaluation. The pie charts (Fig. 4 and Supplementary Table S5) show distribution of loci along different parts of chromosomes. Although it is clear that rDNA may occur at nearly any chromosomal position, there were significant trends in particular groups of animals. A distal location of 45S is clearly preferred in mammals, fish, and mollusks while in arthropods, its distribution is more balanced. The 5S loci were more evenly placed along the chromosomes than the 45S loci, consistent with previous observations (Baumlein and Wobus 1976; Roa and Guerra 2015; Garcia et al. 2017). In arthropods, the proximal positions of rDNA loci (both 5S and 45S) were significantly more common than in other groups (Fig. 4, for a statistical support, see Supplementary Table S6). Arthropods are the largest and most diversified phylum, representing around 70% of all animals (IUCN 2014). Since insects are highly represented within arthropods and also in our database (Table 1), we analysed 45S rDNA positions in its two largest orders, Coleoptera (beetles) and Orthoptera (mostly grasshoppers and crickets). Strikingly, significant (Supplementary Table S7) differences in 45S rDNA positions were found between both groups: Coleoptera had mostly distal distribution of 45S loci while Orthoptera had these genes preferentially located at pericentromeric positions (Fig. 5), and terminal positions were found only exceptionally (Veltsos et al. 2009).

Fig. 4
figure 4

Position of rDNA sites in chromosomes. The numbers of 45S and 5S sites counted in each group are as follows: fish (N = 479 and N = 417, respectively), mammals (N = 156 and N = 40, respectively), arthropods (N = 424 and N = 96, respectively), and mollusks (N = 54 and N = 33, respectively). The source data are given in Supplementary Table S5

Fig. 5
figure 5

Relationship between chromosome morphology (x-axis) and 45S rDNA positions (y-axis) in two of the largest orders of insects, Coleoptera (N = 85) and Orthoptera (N = 141). Only chromosomes with well-resolved morphologies were considered for the analysis. Chromosome type: m—metacentric/submetacentric; a/t—acrocentric/telocentric. The source datasets are in Supplementary Table S7

There are several caveats in determining rDNA position of chromosomes. First, many karyotypes harbour chromosomes that are too small, preventing the accurate determination of loci positions. This is particularly the case of species with high number of chromosomes and relatively small genomes. Second, the resolution of FISH experiments may not permit to ascertain whether a site is located closer to the centromere or to the telomere in telocentric chromosomes or in short arms of acrocentric chromosomes. In these morphological types, the rDNA position could be considered either distal (as it appears at the end of the chromosome) or proximal (as it is located in the terminal centromere characterising these chromosomes). Hence, the “proximal-distal” location could be a more appropriate term for “pericentromeric” rDNA in acrocentric and telocentric chromosomes. For such reasons, the information about the position on chromosomes should be taken with great care since interpretation of FISH signals may vary between the researchers.

Are there functional constrains for the maintenance of distinct rDNA positions?

More than 50% of karyotypes in the database had 45S rDNA at distal (subtelomeric) positions. The number of sites located close to the chromosome ends could actually be even higher since many proximal locations can be considered as proximal-distal (78%, Supplementary Table S8). The question arises as to the functional significance (if any) of these observations, made independently in both animals and plants (Lima-de-Faria 1976; Roa and Guerra 2012; Garcia et al. 2017):

  1. I.

    Position of 45S rDNA close to chromosome ends may be important for accurate positioning of 45S rDNA chromatin within and around the nucleolus (Gornung 2013). It is known that during mitosis, parts of the nucleolar proteins remain at the NORs (Schwarzacher and Wachtler 1993). Perhaps, association of partially decondensed rDNA chromatin with these proteins is better maintained at distal (or “distal-proximal”) than at interstitial or centromeric positions during the transfer through mitosis. If so, distally positioned loci may better secure that rDNA transcription is rapidly resumed following mitosis early after cell division, perhaps via specific chromatin configuration. However, pericentromeric NORs were found in metacentric chromosomes of several single locus karyotypes (e.g. Barth et al. 2013; Singh and Barman 2013) suggesting that these positions, although infrequent (2% karyotypes; Supplementary Table S8), are probably compatible with expression of residing rDNA. Furthermore, secondary constrictions, thought to be remnants of activity in previous interphase, were identified at interstitial positions in some species (Fagundes et al. 2003). These studies suggest that there are no functional constrains limiting position of NOR in chromosomes with respect to the nuclear topology.

  2. II.

    Non-coding functions of 45S rDNA should be considered (Kobayashi 2008). Perhaps, rDNA heterochromatin could fulfil a structural function contributing to the stabilisation of telomeric and centromeric (in distal-proximal positions) domains. In this regard, the principal determinant in rDNA silencing, the nucleolar remodelling complex (NoRC), is also important to maintain genome stability (Guetg et al. 2010) and the formation of heterochromatin (Postepska-Igielska and Grummt 2014). Cazaux et al. (2011) proposed that rDNA may predispose the chromatin to centromere formation. Indeed, pseudogenised rDNA copies that seem to regularly occur in different genomes at variable frequencies (Mentewab et al. 2011; Wang et al. 2016; Robicheau et al. 2017) may homogenise and even evolve in independent satellites (Lim et al. 2004; Ferreira et al. 2007).

  3. III.

    Concerted evolution may be more efficient at chromosome termini than in other regions. It is well established that rDNA evolves via the “concerted evolution” model that maintains homogeneity of multigenic families (Zimmer et al. 1981; Dover 1982). Gene conversion and non-homologous recombination are the major players of concerted evolution (reviewed in Nieto Feliner and Rosselló 2012). The regions near the ends of chromosomes of several organisms show higher recombination rates than more centric sequences (McKim et al. 1988; Jensen-Seaman et al. 2004). Functional rDNA copies may be located in chromosome sites with intensive recombination in subtelomeric regions and hence these positions would be favoured by natural selection. Yet, patterns of 45S rDNA unit divergence seem to be similar in species with distal (humans, Gonzalez and Sylvester 2001) and proximal locations (house mouse, Sasaki et al. 1987). However, in the Mus genus, 45S rDNA loci are preferentially positioned at telocentric chromosome close to the chromosome ends, which better correspond to proximal-distal location defined above. Cazaux et al. (2011) proposed that a specific configuration of these specific domains in interphase may stimulate meiotic recombination between non homologous loci.