Introduction

For the first 2.5 billion years (Ga) of life on Earth, the size of most species was generally much smaller than 1 mm, and rarely was this exceeded (Carroll 2001). Prokaryotes, with few exceptions, have remained unicellular organisms optimizing their size (Carroll 2001), gene and protein content (Ochman and Davalos 2006) as well as the flexibility of their protein-based gene regulation (Lozada-Chavez et al. 2006) in order to maximize metabolisms that drive Earth's biogeochemical cycles (DeLong et al. 2010). Plants, animals, fungi, and protozoan seaweeds, on the other hand, are multicellular organisms that dominate the terrestrial landscapes and the oceans since one billion years ago with a remarkable diversity in genotypic and phenotypic complexity (King 2004; Knoll 2011). A multicellular organism is a collection of self-organized cells that express different phenotypes, despite having the same genotype, in response to the specialization of tasks to perform a cooperative physiological division of labour within an economic organization.

How Difficult is the Transition to Multicellularity?

At least 25 independent transitions to multicellularity have been recorded during the evolution of cellular complexity on Earth (Bonner 1998; Grosberg and Strathmann 2007; Rokas 2008; Knoll 2011). The transition to multicellularity has been repeatedly promoted from unicellular and colonial ancestors (Bonner 1998; Carroll 2001; Kaiser 2001; Medina et al. 2003; King 2004; Grosberg and Strathmann 2007). In several bacterial and eukaryotic organisms, these transitions are an inducible response to environmental stimuli such as predation and starvation (Bonner 1998; Lurling and Van Donk 1999; Kaiser 2001; Kolter et al. 2001; Kong et al. 2009). Surprisingly, multicellurarity can be reverted back to an unicellular state in several bacterial lineages (Velicer et al. 1998; Kolter et al. 2001). A special case are defectors (i.e., mutant cell lineages that selfishly improve their own fitness and fail to cooperate with the other cell types of the organism) in vertebrates, where non-viral transmissible cancerous cells could in effect become independently evolving unicellular colonies (Banfield et al. 1965; Strathmann 1991; Pearse and Swift 2006; Weiss et al. 2006).

The frequent origination and spread of multicellularity suggests that (1) selection favoring this transition is pervasive across organisms and time, (2) the genetic and developmental obstacles to this transition are relatively “easy” to overcome, and (3) adaptive mechanisms that control defectors and stabilize the transition are widely available in natural populations (Grosberg and Strathmann 2007). According to Grosberg and Strathmann (2007), all these pieces of empirical evidence together support the idea that evolution of muticellularity can be considered itself a significant but minor transition, and that cellular diversity can evolve easily when functionally called for by selective advantages. Nevertheless, multicellularity shows two states that presumably are not just extremes of a continuous spectrum but are fundamentally different: simple or complex (Bonner 1998; King 2004; Rokas 2008; Knoll 2011). Increase of organismal size, diversity of cell types, division of labor and functional specialization are interrelated reflections of multicellular complexity; however, all these factors are differentially represented between simple and complex multicellular organisms, as described below (Table 1). Thus, we here support the hypothesis that transitions to complex multicellularity have required, in addition to the advent of the eukaryotic cell and several other key factors, the development of a pervasive non-protein-coding RNA-based gene regulation. Non-protein-coding RNAs (ncRNAs) and proteins have been used as regulatory molecules in prokaryotes and eukaryotes, and conventional wisdom holds that increased complexity requires more regulators. Nevertheless, we here polarize this major evolutionary transition to the contrary: pervasive non-coding RNA-based regulatory systems are prerequisite to complex multicellularity. We are going to argue (1) why pervasive non-coding RNA-based regulatory systems can only be supported and easily explored in higher eukaryotes, and (2) how a vast expansion of regulatory ncRNAs rather than large numbers of novel protein regulators can easily contribute to the emergence of complex multicelullarity.

Table 1 Some key characters that collectively underpin multicellularity in model organisms

Simple multicellular organisms (SMOs) include filaments, balls or sheets of cells that arise either via mitotic division from a single progenitor with the offspring sticking together (aquatic origin) or when several solitary cells aggregate to form a colony (terrestrial origin). They form a coherent and reproducible morphology by cell-cell adhesion, and differentiation of somatic and reproductive cells is common (Bonner 1998; Wolpert and Szathmary 2002; Grosberg and Strathmann 2007). However, complex differentiation patterns and intercellular signaling are limited, so that every cell lies in direct contact with the environment during active metabolism (Knoll 2011). SMOs are found in both multiple eukaryotic lineages (such as chlorophyceae, dictyostelia and oomycetes) and in some eubacterial clades, e.g., cyanobacteria, myxobacteria and actinobacteria (Bonner 1998; Kaiser 2001; Rokas 2008). In fact, the first signs of cell differentiation come from fossils of filamentous and mat-forming cyanobacteria-like organisms that diverged once between 2.4 and 2.1 Ga (Tomitani et al. 2006).

Complex multicellularity, on the other hand, is limited to Eukarya where it arose independently in at least six clades (Fig. 1): once for the eumetazoan animals (King 2004), but multiple times (with possible secondary losses) in embryophytic land plants, florideophyte red algae, stramenopile brown algae (from the order Laminariales), basidiomycete, and ascomycete fungi (Niklas 2000; Medina et al. 2003; Bonner 2004; Grosberg and Strathmann 2007; Rokas 2008; Cock et al. 2010; Knoll 2011). Complex multicellularity arose relatively late in the history of life, probably less than 1000 million years (Ma) ago (Benton and Ayala 2003), and left an extended fossil record during the Ediacaran and Cambrian periods (~600 Ma ago) (Knoll 2011). Complex multicellular organisms (CMOs) show not only evidence of genes involved in cell-cell and cell-matrix adhesion but also a diverse “toolkit” of genes associated with developmental and cell-death programs comprising intercellular signaling, specialization of cell types, and multiple tissue differentiation patterns mediated by complex regulatory networks. This genetic toolkit (including protein-based gene regulation) has been the product of evolutionary innovations, tinkering and expansions of genetic material from ancestral unicellular organisms (King 2004; Bowman and Floyd 2007; King et al. 2008; Rokas 2008; Specht and Bartlett 2009; Cock et al. 2010; Srivastava et al. 2010). Interestingly, the multicellular genetic toolkit corresponds only to a few hundred genes from a few dozen gene families (Bowman and Floyd 2007; Rokas 2008; Erwin 2009; Specht and Bartlett 2009; Knoll 2011) that belong to the 3,000 novel gene families diverged from the last common ancestor of eukaryotes (Koonin et al. 2004).

Fig. 1
figure 1

Multiple origins of CM and C-value and G-value paradoxes among the 100 Eukarya with complete genome sequences. The phylogenetic relationships for higher taxa were obtained from Hedges (2002) and Hampl et al. (2009); main eukaryotic clades with independent origins of CM are underlined and marked with *. Red circles indicate taxa harboring at least one CMO with a complete sequence genome. There is no clear relationship between multicellularity and genome size, although CMOs tend to have larger genomes (black dots). This observation is, however, at least in part biased by the overrepresentation of vertebrates and flowering plants. Comparable proportions of protein-coding genes (bars) are found among unicellular species, SMOs and CMOs. The total number of nuclear protein-coding genes and the non-organellar genome size are depicted in Mega base (Mb) pairs for each genome project (with 7X or greater fold coverage), and were obtained from the reports of their latest publication and updated databases; when necessary, the gene content was corrected by counting the coding sequences for proteins (CDS) from the corresponding annotation of their best gene models

The Eukaryotic Cell as a Source for Complex Multicellularity

Large organismal size, the origin of endosymbiotic energy production, and a “passive” increase of non-protein-coding DNA (ncDNA) in the genome may have predisposed the eukaryotic cell as substrate for complex multicellularity.

Increase of Body Size and Cell Differentiation

Organismal size positively correlates with the number of cell types in CMOs (Valentine et al. 1994; Bonner 1998; Carroll 2001; Bonner 2004; McCarthy and Enquist 2005; DeLong et al. 2010) (Table 1). Once size had increased, the putative advantages of this change would follow (Bonner 1998). Knoll (2011) proposed that large three-dimensional sizes of CMOs could have been enhanced by a positive feedback cycle from the availability of ambient oxygen (pO2). The gradual increase of pO2 in the oceans and atmosphere about 2.5 Ga ago (Holland 2006), and the exposure to pO2 during the transition of life from water to land (Bonner 1998; Hedges et al. 2004; Knoll 2011), would have increased the permissible size of diffusion-limited multicellular organisms. Oxygen may not have started all CMOs, but it would have imposed severe constraints on the evolution of macroscopic organisms with high energy demands. A larger size, in turn, would have allowed larger surface-to-interior gradients of oxygen, nutrients, signaling molecules, bulk transport, as well as the formation of reactive oxygen species (in response to environmental cues) which would have been capable of inducing cell differentiation (Blackstone 2000; Aguirre et al. 2005; Lesser 2006; Knoll 2011). Environmental cues selecting cell differentiation more than just larger organismal size should have been crucial for the transition to CMOs, given that the presence of developmentally differentiated cell types in a colony (or in an organism) is what makes it truly multicellular (Wolpert and Szathmary 2002).

The Acquisition of an Endosymbiotic Energy Production

In general terms, a cell type is a cell with a discrete pattern of gene expression driving a distinct morphological or functional cellular shape in comparison to other cell types with the same organismal genotype. Gene expression of cell types not only needs the involvement of signal(s) to start and maintain differentiation and a considerable diversity of regulatory elements to control expression, but it also needs a powerful source of energy to express both conserved and novel genes in a combinatorial manner. Thus, the massive difference in mean genome size between prokaryotes and eukaryotes is most revealing in terms of the energy available to transcribe and translate their genes (Lane and Martin 2010). Whereas the energetic cost of possessing genes is trivial (~2%), the cost of expressing them as RNA transcripts and proteins is not: protein synthesis consumes most (~75%) of the cell's total energy budget (Wagner 2005; Lane and Martin 2010). For example, if a bacterial genome is increased tenfold in size, it could still be replicated, but there is no current known mechanism (such as the number of regulatory proteins or ribosomes, carbon metabolism, respiratory chain or giant polyploids) that can circumvent the energetic barrier to express ten times as many proteins (Lane and Martin 2010). The origin of eukaryotes, however, entails a bioenergetic innovation that is key to sustain multicellular life. Lane and Martin (2010) elegantly argue that the endosymbiosis that gave rise to mitochondria restructured the distribution of DNA in relation to bioenergetic membranes. By enabling oxidative phosphorylation across a wide area of internal membranes, mitochondrial genes permitted a remarkable 200,000-fold expansion of genome size compared to bacteria. Thereby, mitochondrial power expanded the genotype that an eukaryotic cell could express, inherit, and evolve by four to six orders of magnitude, affording the cell the possibility (but not the necessity) of becoming complex.

A ncRNA-Based Regulatory Network is Hidden Within ncDNA

Large-scale tandem and block duplications have been extensively reported among eukaryotes, hinting at polyploidy in their ancestry (Gregory 2005). Furthermore, fundamental changes in gene structure, such as the advent of introns, allowed the expansion of the eukaryotic proteome by alternative splicing (Nilsen and Graveley 2010). In contrast to prokaryotes (Konstantinidis and Tiedje 2004), however, there is no clear relationship between eukaryotic complexity and either genome size (C-value) or the number of protein-coding genes (G-value) (Fig. 1). In higher eukaryotes, indeed, the vast majority of nuclear DNA is non-protein-coding (ncDNA) (Lynch and Conery 2003; Mattick et al. 2007; Lynch et al. 2011). Intragenic ncDNA is either intronic sequence or untranslated region (UTR), whereas intergenic ncDNA is composed (in different proportions across species) of repetitive transposable elements from class I (e.g., LINEs, SINEs and LTRs) and class II (e.g., MITEs), simple sequence repeats as well as segmental and pseudogene duplications (Gregory 2005; Lynch et al. 2011). The progressive expansion of ncDNA in higher eukaryotic organisms is thought to be a consequence of the reduced efficiency of selection acting against the passive accumulation of “mutationally hazardous” DNA in taxa experiencing elevated magnitudes of random genetic drift (Lynch et al. 2011). This effect may be explained because of reduced effective population sizes (invertebrates and vascular plants ~105–106, and vertebrates ~104–105; whilst prokaryotes ~108, and unicellular eukaryotes and fungi ~107) (Lynch and Conery 2003; Wagner 2005), reduced recombination in large genomes, and a mutational bias toward insertions of large segments of DNA (Lynch 2006; Lynch et al. 2011). The extent and ways by which these ncDNA components contribute to the phenotype of eukaryotic species are still being elucidated (Kazazian 2004; Gregory 2005). Nevertheless, some ncDNA components retain several regulatory elements (in particular cis-regulatory ones), and a considerable proportion of them encodes for a huge diversity of ncRNAs (i.e., functional RNA molecules that are not translated into a protein), at least a substantial fraction of which is thought to have regulatory functions.

Certainly, regulatory ncRNAs were first found in prokaryotes and, for instance, they currently represent ~2% of the total number of genes in the unicellular Escherichia coli, which represents almost half of the genes encoding for protein regulators (~5%) (Storz and Waters 2009). There is increasing evidence of (non-coding) antisense transcripts in bacteria for which regulatory functions are at least suspected (Sharma et al. 2010). More than 100 types of regulatory ncRNAs have been identified throughout the bacterial kingdom, and riboswitches are one of the best known examples. A particular class of riboswitches, namely those responding to coenzyme thiamine pyrophosphate, has also been identified in some eukaryotic lineages (Cheah et al. 2007; Bocobza and Aharoni 2008); however, it exhibits an uncertain evolutionary origin in Eukarya (Sudarsan et al. 2003; Bocobza and Aharoni 2008). With the exception of small nucleolar RNAs (snoRNA) present in Archaea and Eukarya, there is no homologous relationship between the regulatory ncRNAs found in prokaryotes and eukaryotes. Furthermore, the physiological role of bacterial regulatory ncRNAs has been evolutionary driven to mediate rapid responses to changing environmental conditions by modulating specific metabolic pathways or stress cues, like pathogenesis and SOS response (Repoila and Darfeuille 2009; Storz and Waters 2009).

ncRNAs Dominate the Genotype and Phenotype of Complex Multicellular Organisms

In contrast to prokaryotes, regulatory ncRNAs are encoded basically everywhere in the eukaryotic genome, and in particular they cover the ncDNA regions which dominate the genotypes of CMOs. Regulatory ncRNAs are transcribed from pseudogenes, they are also produced from protein-coding loci both by alternative splicing and as independent transcripts (e.g., anti-sense RNAs, enhancer RNAs, intronic transcripts, UTR associated RNAs, repetitive elements), and from their own “intergenic” genes (Kim et al. 2009; Ponting et al. 2009; Voinnet 2009; Ren 2010; Cabili et al. 2011). It appears generally accepted by now that eukaryotic ncDNA is pervasively transcribed (Deng et al. 2006; Berretta and Morillon 2009; Kapranov et al. 2010; Clark et al. 2011; Tisseur et al. 2011), but see van Bakel et al. (2010, 2011) for a dissenting opinion. Accordingly, it has been extensively reported that regulatory ncRNAs show a differential, widespread and complex transcription that gives rise to a considerable number of functional ncRNA families (Table 2) (Gingeras et al. 2007; Jacquier 2009).

Table 2 A brief description of some regulatory ncRNA types in animals

Different classes of small and large ncRNAs have been reported as regulating a larger number of both species-specific and deeply conserved cellular processes in tissue identity and stem cell self-renewal and differentiation through well defined mechanisms within the major eukaryotic clades (Millar and Waterhouse 2005; Lin and Gangaraju 2009; Ponting et al. 2009; Arendt et al. 2010; Bartel and Nodine 2010) (Fig. 2). Furthermore, ncRNAs have been shown to regulate almost every level of gene expression, including the activation and repression of homeotic genes and the targeting of chromatin-remodeling complexes (i.e., epigenesis) (Mattick et al. 2008; Chu et al. 2011; Pauli et al. 2011). Indeed, recent transcriptome analyses and different experimental approaches in cell development provide strong evidence that perturbations in ncRNA regulation are involved in complex developmental disorders, including cancers and neurological diseases in mammals; for review see Costa (2005); Spector and Prasanth (2007). Nevertheless, the loss of ncRNA function rarely results in a lethal phenotype. One of the counter-examples is the elimination of microRNA-1-2 (a miRNA expressed in skeletal muscle in vertebrates), which results in a lethal phenotype with defects to cardiac morphogenesis, electrical conduction, and cell cycle control in mouse (Zhao et al. 2007). With the exception again in mouse (Bernstein et al. 2003), however, the genetic inactivation of the central ncRNA processing enzyme Dicer in vertebrates does not dramatically affect cell differentiation and gene expression patterning (Cobb et al. 2005; Giraldez et al. 2005; Harfe et al. 2005). All these lines of evidence in conjunction with the evolutionary trends of these regulators (described below) support the hypothesis that regulatory ncRNAs are primarily moderators of complex multicellular phenotypes on Earth.

Fig. 2
figure 2

Regulatory functions of ncRNAs are involved in all the stages of the central dogma of biology. Genetic information flows from DNA (the genotype) to RNA (with thick black lines). RNA (the phenotype) then decodes proteins (from a mRNA) and/or ncRNAs (from a transcript or a mRNA). Regulatory ncRNAs control gene expression (shown in dashed lines), whereas structural ncRNAs (e.g., ribosomal and transfer RNAs) are involved in protein synthesis (shown in thin doubled-line). ncRNA symbology: mi, micro; nc, non-coding; pi, piwi; rasi, repeat-associated small interfering; si, short interfering; sn, small nuclear; sno, small nucleolar; r, ribosomal; t, transfer; SRP, signal recognition particle; TF, transcription factor. Figure is modified with kind permission from Condorelli and Dimmeler (2008)

Why Can ncRNAs Quickly and Selectively Acquire a Regulatory Functionality in Complex Multicellular Organisms?

The observations that i) energy costs constrain the evolution of gene expression in any organism, ii) ncRNAs could be fixed and even conserved on eukaryotic genomes by nearly neutral forces (perhaps without any advantage or disadvantage for the organism in the beginning), and that iii) possible pervasive transcription of new ncRNAs could perturb existing regulatory networks, raise the question how critical changes of gene regulation have arisen during the evolution of complex multicellularity. Wagner (2005) suggested at least three scenarios that may operate at the same time to promote any gene expression change. First, changes in messenger RNA (mRNA) or protein half-life may contribute significantly to reduce the cost of controlling the cellular concentration of a gene product, given that half-lives are energetically less constrained than synthesis rates. Second, a substantial influx of regulatory mutations may balance the total energy consumption by increasing the synthesis of some gene products and decreasing the synthesis of others. Finally, substantial changes in mRNA and protein synthesis rates can only go to fixation in a large population (like in bacteria and unicellular eukaryotes) when they provide an advantage sufficiently large to overcome the cost and thus the effect of selection opposing it. Ohta’s Near Neutral Theory shows, however, that slightly deleterious mutations can be fixed by genetic drift (Ohta 1992). This effect allows to transcripts without an adaptive effect to persist and spread in a population despite its small energetic cost, and to accumulate mutations. The mutation-selection balance is shifted, furthermore, in favour of genetic drift in small populations, as we expect for the higher multicellular organisms. Indeed, apparently non-functional pseudogenes (i.e., degenerate copies of functional protein-coding transcripts) continue to be expressed in measurable numbers from the human genome (Zheng et al. 2007). New potential regulatory changes can then open the possibility (but again not the necessity) to become positively selected by exploring and canalizing functional phenotypic novelties.

ncRNA-based regulatory networks have shown their evolutionary potential to guide new critical phenotypes. First, ncRNAs may allow the cell to exert faster control of gene expression and thus to improve adaptation to environmental conditions in a specific space and time through low energetic costs. A major distinction between protein- and RNA-based gene regulations can be found in their mode of action. Transcription factors (TFs) recognize sequence-specific cis-motifs in the regulatory upstream regions of target genes (i.e., the promoter), and regulate transcription when bound to their promoters. After transcription, in contrast, microRNAs (miRNA) recognize complementary cis-acting sites in the 3' UTR of the mRNA, and ultimately control the protein translation of the mRNA by promoting its degradation once bound to a target site (Millar and Waterhouse 2005) (Table 2). On the basis of common sequence motifs and structural features, snoRNAs bind to a transcript via base complementarity to regulate alternative splicing at sequences subject to RNA editing (Kishore and Stamm 2006). Additionally, a large number of different functions have been found for long ncRNAs (lncRNA) which often do not appear to show dependence on particular secondary structures. The control of mRNA populations by lncRNA includes the organization of nuclear bodies, a direct involvement in epigenetic mechanisms and transcriptional regulation as well as the inhibition of several receptors (Mattick et al. 2009; Spector et al. 2009; Chu et al. 2011; Kim and Sung 2011). Furthermore, snoRNAs and lncRNA can be also processed into small regulatory RNAs similar to miRNAs (Spector et al. 2009; Scott and Ono 2011). Therefore, precision in genic output (as measured by the variation in number and half-life of mRNA molecules) is often achieved by ncRNAs rather than by protein regulators. ncRNAs could confer robustness (i.e., invariance of the resulting phenotype in the face of perturbation) to regulatory networks by preventing unwanted ectopic protein molecules, buffering fluctuations in expression levels or reducing transcriptional noise (Hornstein and Shomron 2006).

Unlike protein-coding sequences, moreover, ncRNAs do not necessarily exhibit a conserved sequence to perform their functions. In most cases, indeed, the biogenesis or function of the RNA molecule is only possible if the molecule folds into a characteristic two- and three-dimensional structure via formation of intra-molecular base pairs into “stems” or “helices” (Fig. 3b). The disruption of these paired regions through mutations in the primary sequence may result in conformational changes of the structure and can compromise the function of the RNA molecule; however, compensatory mutations that replace one type of base pair by another one in the paired regions of the molecule can restore its functional conformation (Fig. 3c). Under the influence of various selection forces, such as purifying (speed up the loss of mutant alleles), stabilizing (stabilize the frequency of an allele in a population), or positive (promote faster fixation of an advantageous alleles in a population), a RNA molecule may accumulate nucleotide double-substitutions (i.e., covariations) to maintain a structural-functional class or may store mutations to explore a new one.

Fig. 3
figure 3

Comparison of the 5′ stems of the regulatory 7SK small nuclear RNAs. a) Phylogenetic distribution of 7SK candidate sequences in Arthropoda. Figure is adapted with kind permission of Gruber et al. (2008). A black dot indicates a match in the genomic sequence; the hexagons refer to partial ESTs. Aligned blocks are shown in black, gray bars indicate gaps in the alignment, and missing sequence data adjacent to EST regions appear white. b) Simplified cartoon model of the functional secondary structure of 7SK RNA (based on Peterlin et al. 2012) showing the position of the M3 stem located at the 5’ stem region. The position of the M3 stem region is shown by a blue block at the top of the alignment in a) highlighting the highly conserved motif GAUC-GAUC. c) Structural models for the M3 stems in several eumetazoan clades. The consensus structural models for Drosophilidae, Neoptera and Arthropoda are based on the alignment in a). Conserved regions are colored in red; two and three compensatory mutations are shown in ochre and green. Lower case letters denote a deletion in some sequences. Helices are highlighted by a gray background. The open rectangle at the M3 stem for Drosophilidae shows the highly conserved motif GAUC-GAUC. Hairpin loops (with variable size and no clear consensus folds) are drawn as dashed ellipses. Figure is reproduced with kind permission of Mosig et al. (2009)

The effect of mutations on structured RNAs is quite well understood. On the one hand, RNAs are robust against mutations in the sense that a large fraction of mutations does not affect the folding (Fig. 3c). These neutral mutants form extended neutral networks in sequence space on which drift leads to a diffusion-like evolutionary dynamics. Another large fraction of mutations leads to dramatic structural changes, so that any frequent structure is realized in the vicinity of any arbitrary chosen sequence, an effect termed shape space covering (Schuster et al. 1994). Taken together, these two competing effects imply that drift leads to a rapid exploration of novel variants at the fringes of the neutral networks (Huynen 1996; Huynen et al. 1996). In small populations, slightly detrimental mutations at ncDNA regions can also be fixed by genetic drift (Ohta 1973), so that the small energy cost of transcribing non-functional ncRNAs does not lead to their rapid elimination from the genome. Furthermore, second order effects, such as an increased probability to produce offspring with a detrimental mutation in a gene that was fitness-neutral in the parent, cannot be selected because drift dominates the very small effective fitness effect. The first order effect of an advantageous mutation is readily selectable, however. Thus, non-functional transcripts can “wait” for rare advantageous mutations to place them under stabilizing selection. The evolutionary dynamics thereby becomes dominated by the accessibility of advantageous mutations (Fontana and Schuster 1998). This structural accessibility of the RNA molecule could explain in part the structural and functional convergence of the regulatory miRNA-like RNA class in plants, animals and fungi, which is in congruency with an independent origin of their multicellular development (Liu et al. 2010; Carrington et al. 2011).

Evolution and Selection of Regulatory ncRNAs in Eukaryotes: The Case of microRNAs

Functional regulatory ncRNAs, even those under strong stabilizing selection, may exhibit rapid evolution at the sequence level. Thus, some subclasses are clearly under purifying selection acting predominantly on the secondary structure in order to keep their regulatory function (Ponting et al. 2007; Pain et al. 2008; Amar et al. 2009; Marques and Ponting 2009; Ponting et al. 2009; Gerstein et al. 2011). Furthermore, the genetic structure involved in the transcription and function of some regulatory ncRNAs, such as introns and splice sites, is also subject of stabilizing selection (Hiller et al. 2009; Rose et al. 2011). Hence several models have been suggested to describe how selection may be driving the birth-and-death evolutionary dynamics of some ncRNAs (Hornstein and Shomron 2006; Rajewsky and Chen 2007; Lai et al. 2008; Peterson et al. 2009; Carrington et al. 2011).

In particular, comparative analyses of miRNA homologs in diverse animal and plant species have revealed both long-term maintenance and taxa-specific occurrence of miRNA families (Carrington et al. 2007; Donoghue et al. 2008; Wang et al. 2008; Carrington et al. 2011). In animals, it has been proposed that new miRNAs could evolve by point mutations from existing hairpin structures in the genome that are transcribed at low levels and in specific cell types. This process is followed by selection against inadequate miRNA/mRNA pairing and expression modifications, and simultaneously is preserving many of the neutral or advantageous targets that increase the expression of the miRNA (without being highly deleterious to the organism) (Rajewsky and Chen 2007; Lai et al. 2008; Peterson et al. 2009; Carrington et al. 2011). Thus, animal miRNAs and fungal miRNA-like RNAs (milRNAs) have exploited the robustness of the hairpin structure both to maintain the regulatory function and to diversify their activity by targeting imperfectly complementary sequences (Lai et al. 2008; Liu et al. 2010). Because miRNA/target pairing is almost perfectly complementary and extends over twice as many nucleotides in plants, when compared to animals and fungi, the evolution of new plant miRNAs has been proposed to be driven by inverted duplication of target gene sequences (i.e., pre-miRNA-encoding regions) (Carrington et al. 2004). Thus, newly formed expressed RNA hairpins may experience successive selective sweeps until they reach an evolutionarily optimized form, allowing a one-to-one miRNA/target relationship to be established, and then be incorporated into new or existing regulatory networks (de Meaux et al. 2008; Carrington et al. 2011).

Both models have been supported by experiments showing that unlike highly conserved-ancient plant and animal miRNAs, evolutionary young miRNAs are typically expressed at low levels, processed imprecisely, lack targets, and display patterns of neutral variation (Rajewsky and Chen 2007; de Meaux et al. 2008; Lu et al. 2008; Carrington et al. 2011). These trends suggest that young MIRNA loci tend to evolve neutrally, possibly helping to maintain negative pleiotropic effects at low levels, until compensatory and advantageous mutations have emerged (Rajewsky and Chen 2007; Carrington et al. 2011). Exceptional evidence for evolutionary optimization of young pre-miRNA stem-loop structures has been recently presented for two miRNA-encoding loci in Arabidopsis thaliana (de Meaux et al. 2008) as well as for five miRNA-encoding loci in Drosophila melanogaster (Lu et al. 2008). In both works, the authors demonstrated that pervasive variation occurs at miRNA-encoding loci, and that the structural variation among alleles suggests non-random evolution of a thermoresistant substructure in the miRNA precursor, which presumably impacts the processing of the mature form. In A. thaliana, for instance, miR856 shows a weak signature of a selective sweep and miR824 displays signs of balancing selection; whereas the polymorphism pattern of miR310/311/312/313 in D. melanogaster is indicative of hitchhiking under positive selection.

Detailed knowledge of evolutionary trends of regulatory non-coding transcripts is currently restricted to the much better understood microRNAs. Even though some genome projects have estimated the proportion and conservation of some ncRNA families, their extension and evolutionary patterns on a large phylogenetic scale are still being determined. Part of the methodological problem lies in the characterization of ncRNA evolution by large inhomogeneous variations in both sequence and structure (Mosig et al. 2009) (Fig. 3a and c), which hampers their discovery and functional description. The assessment of a large compendium of ncRNAs in basal eukaryotes would help us to understand the origin and evolution of ncRNA-based gene regulation in Eukarya. Interestingly, both genomic and functional similarities among regulatory ncRNAs might be the result of convergent evolution due to duplication and transposition mechanisms, similarly processing enzymes, and comparable selective pressures across species, but it might also reflect either the divergent evolution from a common RNA ancestor or the transition from one ncRNA type to another (Scott and Ono 2011).

Why and How Can a Vast Expansion of RNA Regulators Contribute to the Emergence of Complex Multicellularity Rather than Large Numbers of Novel Protein Regulators?

Many families of regulatory proteins that contribute to the complexity of developmental programs have been conserved, expanded and innovated throughout the evolution of eukaryotes. They are, however, under more stringent constraints than regulatory ncRNAs. Novel transcription factors, for instance, require the independent acquisition of DNA binding sites in all target promoters, while additional paralogs are forced to rapidly diverge from an existing regulatory module to avoid deleterious effects at the time of acquiring a new target or domain of expression (e.g., in a new tissue) (Rajewsky and Chen 2007). Furthermore, for a new protein to come under positive selection, the mRNA must be transcribed (easy with pervasive transcription) and the messenger must be translated (maybe also easy, but protein synthesis is energy intensive). Hence a neutral coding gene should be selected against more strongly than a neutral non-translated transcript. Indeed, protein-coding genes can lose their coding capacity and hang around long enough as pseudogenes to become regulatory ncRNAs (Yano et al. 2004; Guo et al. 2009). One of the best documented examples is the regulatory ncRNA Xist, which is exclusively present in eutherian mammals and its regulatory role is involved in the initiation process of X chromosome inactivation (Table 2). Xist evolved by pseudogenization of the protein-coding gene Lnx3, which is conserved in all vertebrate classes, except Eutheria. Both Xist and Lnx3 are thus present in a mutually exclusive manner at a syntenic locus in all vertebrates (Duret et al. 2006).

Conversely, multiple aspects of ncRNA function and evolution, such as a) leaky repression of transcripts to allow “fine-tuning” of gene expression, b) reduction of phenotypic variation by buffering genetic noise or avoiding unwanted ectopic molecules, c) a presumably regulatory increase in lineages of CMOs over geologic time, and d) the rarely dramatic impact in a phenotype due to loss of function, suggest that regulatory ncRNAs act as key players in canalizing genetic programs (Hornstein and Shomron 2006; Peterson et al. 2009). In this context, canalization can be visualized as the process of formation of virtual phenotypic (relatively) invariant canals in which developmental programs flow (Hornstein and Shomron 2006). If the main function of ncRNA regulation is to stabilize gene expression levels, then almost paradoxically, ncRNAs may increase the heritability of a phenotypic trait through natural selection by decreasing its expression variability about the mean, i.e., making robust the phenotype (a result) by canalizing the genetic program (a cause). Hence, regulatory ncRNAs could be the evolutionary innovatory instrument to canalize development such that phenotypic variation decreases over geologic time at the cost of increasing developmental precision and consequently enhancing the morphological complexity (Peterson et al. 2009).

Here, we suggest a model that explains why novel RNA regulators with advantageous functions are readily accessible in multicellular organisms. Consider a gene that is expressed at some homogeneous level x in an organism. The expression level x will be selected to optimize the organism’s fitness. Now suppose that the organism has two cell types. Figure 4 shows the fitness landscape as function of the expression levels in the two cell types f(x 1 , x 2 ). As long as the regulation of the gene is identical in both cell types, only the points on the median line (x 1 = x 2 ) are accessible. If a regulatory ncRNA is innovated in one of the two cell types (regulator + in the Fig. 4), then it can modify the expression levels of the gene. Evolution can now adjust the organism's fitness to optimize f(x 1 , x 2 ) without the constraint that x 1 = x 2 . Mathematically, we know that max f(x 1 , x 2 )max f(x, x) simply because we now optimize over a larger domain. In general, the optimum for f(x 1 , x 2 ) will be reached for some point x 1 x 2 so that the fitness difference f(x 1 , x 2 )max f(x, x) will generically be strictly positive for a non-empty set A of expression values x 1 and x 2 . If the new regulatory ncRNA takes the system into this set A with a small energetic cost (S cost ), then it has a “window of opportunity” to be placed under stabilizing selection. Thus, this new regulatory ncRNA could make the phenotypic traits influenced by the regulated gene much more “evolvable”. If the organism’s fitness depends on the heritability of those phenotypic traits and, accordingly, also on the expression levels of that gene, then the new regulatory ncRNA becomes subject to positive selection (S*) to optimize f(x 1 ,x 2 ) (relationship shown in Fig. 4 as S cost + S*). If the fitness depends continuously on the expression levels, moreover, set A touches the maximal value on the median line and hence is reachable from (x, x) by either a relative increase or a relative decrease of x 1 over x 2 . In summary, for every novel RNA regulatory molecule, there is a “window of opportunity” for being placed under stabilizing selection if it can be used to disentangle expression patterns of other functionally relevant genes in different cell types. We argue that this mechanism makes advantageous innovations easily accessible in differentiated multicellular organisms. The resulting refinement in regulation in turn could lead to the innovation of further cell types, opening the same “window of opportunity” again for the incorporation of further regulators. Since ncRNAs are much easier to generate from genomic DNA and energetically cheaper to produce than functional proteins in eukaryotes, we conclude that complex multicellular organisms should be able to rapidly accumulate ncRNA regulators –as we observe indeed in evolutionary history.

Fig. 4
figure 4

Model for the acquisition of ncRNA regulators in CMOs. The figure shows the fitness landscape of gene a as a function of its expression levels in two cell types f(x 1 ,x 2 ). As long as the regulation of the gene a is identical in both cell types, only the points on the median line (x 1 = x 2 ) are accessible. If a regulatory ncRNA is innovated (regulator +) in one of the cell types, then it can modify the expression levels of gene a in that compartment. If the new regulatory ncRNA takes the system into a new set A of expression values with a small energetic cost (Scost), then it has a “window of opportunity” to be placed under stabilizing selection. Thus, this new regulatory ncRNA could make the phenotypic traits influenced by the regulated gene a much more “evolvable”. If the organism’s fitness depends on the heritability of those traits and, accordingly, on the expression levels of gene a, then the new regulatory ncRNA becomes subject to positive selection (S*) to optimize f(x 1 ,x 2 ). The resulting refinement in regulation in turn could lead to the innovation of further cell types, opening the same “window of opportunity” again for the incorporation of further regulators. See text for a detailed description

This avenue to increasing cellular complexity is much less attractive in single-cell organisms, and in particular prokaryotes with their small size, in which gene products are distributed much more homogeneously. The mechanism of Fig. 4 critically depends on the availability of at least two cell types with the same genetic background and the same ancestral pattern of gene expression from the parental cell, a situation simply not available either in unicellular organisms or in SMOs organized in colonies during certain life stages. Solitary cells thus do not generically provide an opportunity for the invasion of novel regulators, at least not when they are already well-adapted to their environment. The smaller size and hence the expected larger effective population size, in concert with strong dependence of genome size on energy constraints, also implies a much larger selective pressure against non-functional transcripts. Accordingly, prokaryotes should also have a much smaller repertoire of nearly neutral transcripts that persists for much shorter time scales, and thus have a much smaller chance to stumble across a functional improvement or innovation. All these constraints have placed protein regulators, instead of ncRNAs, to “easily” fulfill the selective advantages to dominate the regulation of phenotypes in prokaryotes. Global protein regulators, acting in conjunction with cis-regulatory elements, protein and RNA co-regulators, and specific environmental signals, control the morphogenesis and metabolic complexity of bacterial species (Martinez-Antonio and Collado-Vides 2003; Lozada-Chavez et al. 2008). Accordingly, bacterial protein regulators are highly flexible in switching their mode of action between global and local targeting and have a propensity to spawn homologs with similar function at short evolutionary time scales (Babu et al. 2006; Lozada-Chavez et al. 2006, 2008). Furthermore, the total number of transcription factors positively correlates with genome size and life style (Cases et al. 2003; Babu et al. 2006). Therefore, each bacterial species has evolved its own set of transcription factors, suggesting that the emergence of distinct repertoires of protein regulators is a crucial step for the adaptation to new environments and to control the bacterial phenotypic variance.

Role of RNA Throughout Evolution of Life on Earth

RNA is one of the three major biopolymers that manages the fluxes of biological information of life on Earth. The “RNA world” hypothesis proposes that the biochemical, structural, and catalytic properties of RNA once were sufficient to encode the genotype, to decode the phenotype, and to drive evolution in the early state of life on Earth. As a corollary, RNA has been replaced to a large extent by DNA and proteins on the way to contemporary life. Its capabilities, however, have been preserved in the core of the most fundamental and highly conserved biological processes across the three domains of life (Joyce 2002). Furthermore, the pervasive regulatory role of RNA in the three domains of life, particularly in eukaryotes, has been recently recognized (Fig. 2).

We suggest here that non-coding RNA-based genetic regulation is a prerequisite for the emergence of multicellular complexity. The evolution of complex multicellularity is based upon the successive innovation and refinement of cell and tissue types, requiring a flexible layer of compartment-specific gene regulation. We have argued here that novel regulators that refine gene expression patterns are easily selected for, and that non-coding transcripts are entirely capable of fulfilling such roles at all levels of cellular regulation. Since they are less costly than peptides, non-functional transcripts are available as evolutionary raw material for much longer time spans, and preferably in smaller populations where genetic drift is stronger. Thus, non-coding transcripts have much more time and much more wide-spread opportunities to find an adaptable role. Functional ncRNAs, therefore, are integrated into the genomes of complex multicellular organisms at a faster pace than paralogous proteins. Consequently, regulatory ncRNAs become the driving force behind increasingly complex modes of regulation. Accordingly, it would not be hard to expect that the total number of regulatory ncRNAs could be higher than the total number of protein regulators in CMOs. If so, regulatory ncRNAs could substantially contribute to the theorical and practical definition of simple and complex multicellularity. The origin of RNA on early Earth is still controversial (Robertson and Joyce 2010). Nevertheless, it is becoming clear that once RNA emerged into a protocellular system, its relevance within the central dogma of biology has been greater than we have thought.