Introduction

The human gastrointestinal tract harbours complex microbial communities1,2, dominated by bacteria from the phyla Bacteroidetes and Firmicutes3. The composition and diversity of the gut microbiota are affected by numerous factors, including host genetics4, long-term diet5,6, drugs1,7,8, and several other environmental factors9. Evidence suggests that the composition of the microbiota is associated with the development of obesity3,10,11,12, diabetes13,14, inflammatory bowel disease15,16, colorectal cancer17,18, and non-alcoholic fatty liver disease19,20. Therefore, the composition and function of the microbial species living in our gut are of crucial importance for maintenance of health. Short-chain fatty acids (SCFAs) produced by fermentation of dietary fibre by several abundant genera of the intestinal microbiota, including Roseburia, Eubacterium, and Faecalibacterium21, have been reported to elicit beneficial effects on energy metabolism and prevent colonization of pathogens22. Bacteria of the genus Faecalibacterium, abundant butyric acid-producing bacteria colonizing the human gut, display anti-inflammatory effects and may be used as potential probiotics for treatment of gut inflammation23,24.

The genus Faecalibacterium, belonging to the family Ruminococcaceae within the order Clostridiales, comprises only one validated species, Faecalibacterium prausnitzii25, and two non-validated published species, ‘Faecalibacterium moorei26 and ‘Faecalibacterium hominis27, all originally isolated from human faeces. F. prausnitzii is a gram-negative, non-spore-forming, and strictly anaerobic rod-shaped bacterium. The genomic G + C content of the genus Faecalibacterium ranges from 47 to 57%28. The fermentation products from glucose are butyrate, D-lactate, and formate. In the present study, we describe two novel species of the genus Faecalibacterium by using a polyphasic taxonomy approach along with whole genome sequence analysis.

Results

Phenotypic and chemotaxonomic characterization

Strains AF52-21T and CM04-06T were isolated from the faeces of two healthy Chinese donors. Both strains were observed to be obligate anaerobic, gram-negative, non-spore-forming, non-motile, and rod-shaped bacteria (Fig. 1). After incubation on MPYG agar at 37 °C for 2 days, the colonies appeared 1.0–2.0 mm in diameter, round, creamy white to yellowish, convex, and opaque with entire margins for AF52-21T, and 2.0 mm in diameter, round, yellowish, slightly convex, and opaque with entire margins for CM04-06T. The growth temperature was 20–42 °C (optimum 37 °C) for AF52-21T and 30–45 °C (optimum 37 °C) for CM04-06T. Growth was observed at pH 6.0–7.5 (optimum 7.0–7.5) for AF52-21T and pH 5.0–8.0 (optimum 7.0–7.5) for CM04-06T. Strains AF52-21T and CM04-06T grew with 0–1% and 0–3% NaCl, respectively. Both strains were found to be catalase-negative. The major metabolic end products for strains AF52-21T and CM04-06T were acetic acid, formic acid, butyric acid, and lactic acid. Differential physiological and biochemical characteristics of strains AF52-21T and CM04-06T with the closest related species of genus Faecalibacterium are listed in the species description and in Table 1 (Fig. 2).

Figure 1
figure 1

Micrographs of strains AF52-21T and CM04-06T after Gram staining. (A) AF52-21T; (B) CM04-06T.

Table 1 Differential phenotypic characteristics of strains AF52-21T, CM04-06T, and the related species F. prausnitzii ATCC 27768T.
Figure 2
figure 2

Maximum-likelihood phylogenetic tree based on 16S rRNA gene sequences showing the phylogenetic relationships of strains AF52-21T, CM04-06T and the representatives of related taxa within the family Ruminococcaceae. Clostridium butyricum DSM 10702T (AQQF01000149) was used as an out-group. Bootstrap values based on 1000 replications higher than 70% are shown at the branching points. Bar, substitutions per nucleotide position.

The result of cellular fatty acid profiles of strains AF52-21T and CM04-06T and related species are shown in Table 2. The major components of fatty acids (constituting > 5% of the total) present in strain AF52-21T were found to be C18:1 ω9c (39.0%), C16:0 (16.3%), iso-C19:0 (12.9%), C18:1 ω7c (8.1%), and C14:0 (5.9%). The profiles including C18:1 ω9c (32.5%), C16:0 (25.5%), iso-C17:1 I/anteiso B (9.7%), C18:1 ω7c (7.5%), and iso-C19:0 (5.9%) were detected as the predominant fatty acids for strain CM04-06T. The highest levels of fatty acids, including C16:0 and C18:1 ω9c, were found to be similar, but not identical comparing strains AF52-21T, CM04-06T, and ATCC 27768T. Furthermore, strains AF52-21T, CM04-06T, and ATCC 27768T could be differentiated by less abundant fatty acids, such as C18:1 2OH, anteiso-C15:0, anteiso-C17:0, C13:0 3OH/Iso-C15:1 I, C16:1 ω7c/C16:1 ω6c, and antei-C18:0 /C18:2 ω6, 9c (Table 2). Strains AF52-21T and CM04-06T were found to contain meso-diaminopimelic acid as the diamino acid of the peptidoglycan. The polar lipid profiles of strains AF52-21T, CM04-06T, and F. prausnitzii ATCC 27768T are shown in Supplementary Fig. S1. The polar lipid profiles of AF52-21T and CM04-06T were observed to be similar to that of the most closely related strain F. prausnitzii ATCC 27768T, with diphosphatidylglycerol (DPG), phosphatidylglycerol (PG), and several unidentified glycolipids (GL1, GL3) being present in both strains. However, the presence/absence of three unidentified lipid (L, L1, L2), unidentified phospholipid (PL), unidentified phosphoglycolipids (PGL) and an unidentified glycolipid (GL2) can be used to distinguish strains AF52-21T and CM04-06T from the closest relative. Quinones were not detected in either strain (Table 3).

Table 2 Fatty acid profiles of strains AF52-21T, CM04-06T, and the closest related species F. prausnitzii ATCC 27768T.
Table 3 Levels of 16S rRNA gene sequence similarity and ANI values (in percentages) based on BLAST for strains AF52-21T, CM04-06T, and the phylogenetically related species F. prausnitzii ATCC 27768T and the unrecognized species ‘Faecalibacterium hominis’ 4P-15.

Genome analysis

The assembled draft genomes of strains AF52-21T and CM04-06T comprised total lengths of 2,851,918 bp and 3,011,178 bp with 73 and 47 scaffolds, respectively (Table 4). The G + C contents calculated from the genome sequences were 57.77% and 57.51%, which are slightly higher than the range reported previously for the genus Faecalibacterium (47–57 mol%)25. CheckM analysis of the genomes showed high completeness (> 90%) and low contamination (< 5%) (Table 4), indicating these are high-quality genomes sequences. The genome comparison between strains AF52-21T, CM04-06T, ATCC 27768T, and ‘Faecalibacterium hominis’ 4P-15 showed ANI values ranging from 82.53% to 90.19% (Table 3), which are significantly below the proposed cutoff value of 95–96% for delineating bacterial species, indicating that strains AF52-21T and CM04-06T represent novel species in the genus Faecalibacterium. Circular maps of the two strains AF52-21T and CM04-06T are shown in Fig. 3.

Table 4 Genome properties of F. butyricigenerans AF52-21T and F. longum CM04-06T.
Figure 3
figure 3

Circular map of AF52-21T and CM04-06T. Innermost circle, GC skew; circle 2, G + C content; circle 3, contigs; circles 4, predicted prophage remnants; circle 5, tmRNA, tRNA and rRNA genes; circles 6, CDS; circles 7–9, (A) homologous genomic segments from CM04-06T, F. prausnitzii ATCC 27768T and ‘F. hominis’ 4P-15, (B) homologous genomic segments from AF52-21T, F. prausnitzii ATCC 27768T and ‘F. hominis’ 4P-15.

16S rRNA gene sequence extraction and phylogenetic analysis

The almost complete 16S rRNA gene sequences of strains AF52-21T and CM04-06T were extracted from the genomes, in which the locations are Scaf2_220520-222018 and Scaf13_51882-53380, respectively. The length of 16S rRNA gene sequences was found to be 1499 bp for both strains. BLAST analysis of the 16S rRNA gene sequences against the EzBioCloud server showed that the two strains are most closely related to F. prausnitzii ATCC 27768T, which is the sole valid species of the genus Faecalibacterium, with similarity values of 97.27% and 96.51%, respectively. Strains AF52-21T and CM04-06T share a 16S rRNA gene sequence similarity of 98.65% and 97.68% with ‘Faecalibacterium hominis’ 4P-15. The 16S rRNA gene sequence similarity between strains AF52-21T and CM04-06T is 98.53% (Table 3). All these values are lower than the recommended threshold (98.7%) for classification of human-associated bacterial isolates at the species level29. Phylogenetic analysis based on the maximum-likelihood, neighbour-joining, and minimum-evolution (Fig. 2, Supplementary Figs. S2 and S3, respectively) confirmed the affiliation of the novel isolates with the genus Faecalibacterium, revealing that the two isolates form a distinct cluster with F. prausnitzii ATCC 27768T, supported independently of the treeing method by a high bootstrap value.

Function annotation

For genome annotation, the distributions of the genes into clusters of orthologous groups (COGs) functional categories are depicted in Supplementary Fig. S4 and Table S1. Both strains AF52-21T and CM04-06T share identical COGs functional categories, but different functional genes numbers. Annotated genes associated with synthesis of diaminopimelic acid (DAP), teichoic and lipoteichoic acids, lipopolysaccharides, and metabolism of polar lipids and polyamines by RAST annotation, comparing strains AF52-21T and CM04-06T with ATCC 27768T are shown in Table S2. For strain AF52-21T, 11 genes/proteins were observed to be associated with biosynthesis of DAP, 18 genes/proteins with biosynthesis of polar lipids, 12 genes/proteins with biosynthesis of polyamines, 3 genes/proteins with biosynthesis of teichoic and lipoteichoic acids, and 14 genes/proteins with biosynthesis of quinones. For strain CM04-06T, 12 genes/proteins were found to be associated with biosynthesis of DAP, 19 genes/proteins with biosynthesis of polar lipids, 13 genes/proteins with biosynthesis of polyamines, 2 genes/proteins with biosynthesis of teichoic and lipoteichoic acids, and 16 genes/proteins with biosynthesis of quinones. We detected no genes involved in the biosynthesis of lipopolysaccharides or mycolic acids in strains AF52-21T and CM04-06T.

The functional annotation showed that AF52-21T, CM04-06T, and ATCC 27768T contain a complete acetyl-CoA to butyrate synthesis pathway, but possess butyryl-CoA:acetate CoA-transferase activity only in the final step (Fig. 4), as discussed previously30,31. The antiSMASH analysis of biosynthetic gene clusters (BGCs) showed that strains AF52-21T and CM04-06T both contain two potential BGCs, which encode bacteriocin and sactipeptide, respectively, while ATCC 27768T contains BGCs encoding microcin and sactipeptide, respectively (Supplementary Fig. S5). Prophages were identified using the PHAST software, and the results are shown in Supplementary Fig. S6. Two incomplete phage sequences were detected in the AF52-21T genome, one of which encodes the Phd_YefM protein, an antitoxin component. Three incomplete phage sequences and two intact prophages were detected in the CM04-06T genome, encoding the Phd_YefM protein, relaxase/mobilisation nuclease domain, bacterial mobilisation protein (MobC) /ribbon-helix-helix protein, helix-turn-helix, and predicted transcriptional regulators. Moreover, the antibiotic resistance analysis indicated that strain AF52-21T contains macrolide antibiotic, lincosamide antibiotic, and streptogramin antibiotic genes, while strains CM04-06T and ATCC 27768T contain aminoglycoside antibiotic genes (Fig. 5). To better understand the biosynthetic pathway contributing to the in vitro characteristics of strains AF52-21T and CM04-06T, we explored genes related to important pathways involved in carbohydrate metabolism. The comparison of in vitro and in silico characteristics is presented in Table 5.

Figure 4
figure 4

The synthesis pathways from acetyl-CoA to butyrate. Strains AF52-21T, CM04-06T and ATCC 27768T are presented as blue, red, and yellow, respectively. Thl, thiolase; Hdb, β-hydroxybutyryl-CoA dehydrogenase; Cro, crotonase; Bcd, butyryl-CoA dehydrogenase; But, butyryl-CoA:acetate CoA transferase; Ptb, phosphate butyryltransferase; Buk, butyrate kinase.

Figure 5
figure 5

Comparison of antibiotics genes in strains AF52-21T, CM04-06T, and F. prausnitzii ATCC 27768T.

Table 5 Comparison of in vitro and in silico characteristics.

Discussion

16S rRNA gene phylogeny, genome sequence comparison, and physiological results showed that the two new isolates AF52-21T and CM04-06T represent two novel species. The ANI values between AF52-21T, CM04-06T and the closest related species ATCC 27768T were found to be 82.54% and 90.09%, respectively, which support the delineation of new species. The result of biochemical and genomic functional analyses showed that both strains AF52-21T and CM04-06T are butyric acid-producing bacteria.

Most strains in the genus Faecalibacterium exhibit a common ability to produce butyric acid, bioactive peptides, and other anti-inflammatory substances with immunomodulatory effects23,24,32. Several studies have confirmed that a decreased abundance of this genus is related to the occurrence and development of inflammatory bowel diseases33,34,35. Accordingly, bacteria of the genus Faecalibacterium are receiving much attention as possible candidate next-generation probiotics (NGPs), which may be used for disease treatment36,37.

Previous studies based on comparative genomics of isolates suggested a wide diversity of this genus, with the presence of at least two phylotypes in F. prausnitzii26. A recent study analysing the Faecalibacterium-like MAGs, proposed that Faecalibacterium from the human gut can be divided into 12 clades37. These studies have expanded the diversity of the genus Faecalibacterium and proposed that different phylotypes have different functions with potentially different contributions in relation to health or diseases.

Moreover, as a candidate taxa for the NGPs, the bacteria of the genus Faecalibacterium can be used for in vitro functional verification and animal model experiments to further explore possible probiotic functions, and ultimately, used in clinical disease intervention trials.

Emended description the genus of Faecalibacterium

The genus description is as given by Duncan et al25 with the following changes. Cells are able to produce formic acid, acetic acid, and butyric acid. The major polar lipids are diphosphatidylglycerol, phosphatidylglycerol and several unidentified glycolipids. Genomic DNA G + C content is 47–63 mol%. Genome size is 2.68–3.32 Mb.

Emended description of Faecalibacterium prausnitzii

Cells are able to produce formic acid, acetic acid, butyric acid, and lactic acid. The major fatty acids (constituting > 5% of the total) are C16:0, C18:1 ω7c, and C18:1 ω9c. The rest of the species characteristics are as described by Cato et al38, Duncan et al25, and Fitzgerald et al26. The type strain is Faecalibacterium prausnitzii ATCC 27768T (= NCIMB 13872T).

Description of Faecalibacterium butyricigenerans sp. nov.

Faecalibacterium butyricigenerans (bu.ty.ri.ci.ge′ne.rans. N.L. n. acidum butyricum butyric acid; L. part. adj. generans, producing; N.L. adj. butyricigenerans, butyric acid-producing; referring to its production of butyric acid).

Cells are gram-negative, non-motile, non-spore-forming and rod-shaped. Strictly anaerobic and catalase negative. Colonies on PYG agar are round, creamy white to yellowish, convex, and opaque with entire margins, and colony size is approximately 1.0–2.0 mm in diameter after incubation at 37 °C for 2 days. Cells are able to grow at 20–42 °C with optimum temperature at 37 °C. The pH range for growth is 6.0–7.5 (optimum at 7.0–7.5). Growth occurs at NaCl concentrations 0–1%. Indole is not produced. Positive for hydrolysis of esculin and negative for gelatin. Formic acid, acetic acid, butyric acid, and lactic acid are the fermentation products. The major fatty acids are C14:0, C16:0, C18:1 ω7c, C18:1 ω9c, and iso-C19:0.

The type strain, AF52-21T (= CGMCC 1.5206T = DSM 103434T), was isolated from human faeces. The G + C content of the genomic DNA is 57.77 mol% as calculated from whole genome sequencing.

Description of Faecalibacterium longum sp. nov.

Faecalibacterium longum (lon′gum. L. neut. adj. longum long, the shape of the cells).

Cells are gram-negative, non-motile, non-spore forming, long rod in shape. Strictly anaerobic. Catalase and urease are negative. Colonies are round, yellowish, slightly convex, and opaque with entire margins with 2.0 mm in diameter on PYG agar for incubation at 37 °C for 48 h under anaerobic condition. The strain shows growth at 30–45 °C (optimum temperature is 37 °C). Growth is observed at pH 5.0–8.0 (optimum pH is 7.0–7.5). NaCl is tolerated with concentrations up to 3%. Indole is not produced. Gelatin is hydrolysed, but aesculin is not. Major end products are acetic acid, formic acid, butyric acid, and lactic acid. The major fatty acids (constituting > 5% of the total) are C16:0, C18:1 ω7c, C18:1 ω9c, iso-C19:0, and iso-C17:1 I/anteiso B.

The type strain, CM04-06T (= CGMCC 1.5208T = DSM 103432T), was isolated from human faeces. The G + C content of the genomic DNA is 57.51 mol% as calculated from whole genome sequencing.

Methods

Origin of bacterial strains

Faeces samples were collected from two healthy donors living in Shenzhen, Guangdong province, China, one donor is an adult female (AF), and the other is a male child (CM). The samples were stored refrigerated and kept anaerobically until processed. The collection of the samples was approved by the Institutional Review Board on Bioethics and Biosafety of BGI under number BGI-IRB17005-T2. All protocols were in compliance with the Declaration of Helsinki and explicit informed consent was obtained from the participant and the parents of the male child. 1 g of faecal sample was diluted with 0.1 M PBS (pH 7, supplemented with 0.5% cysteine) and spread onto modified peptone-yeast extract-glucose (MPYG, supplemented with 5 g/L sodium acetate in DSMZ 104 medium) agar plates in an anaerobic box (Bactron Anaerobic Chamber, BactronIV-2, shellab, USA). The plates were incubated at 37 °C under anaerobic conditions (90% N2, 5% CO2, and 5% H2, v/v) for 3–5 days. Single colonies were randomly picked and purified by repetitive subculturing on the new plates containing the same medium and incubated under the same conditions as described above. Among the pure cultures, two isolates, designated as AF52-21T and CM04-06T, respectively, were obtained and subsequently maintained in 20% (v/v) glycerol and frozen at -80 °C.

Phenotypic characterization

The morphological characteristics of strains AF52-21T and CM04-06T were performed on cultures grown on MPYG medium at 37 °C. Bacterial cell shape was examined by phase contrast microscopy (Olympus BX51, Japan) during the exponential phase of growth. Cell motility was examined using semi-solid MPYG medium containing 0.5% agar39. The Gram reaction was carried out using a Gram-staining kit (Solarbio, China). Spore formation and presence of flagella were determined by staining using spore stain kit and flagella stain kit supplied by Solarbio (China) following the manufacturer’s instructions. Colony morphology was observed following growth of the cultures on PYG agar for 2 days at 37 °C. Optimal temperature for growth was determined using growth in MPYG medium at 4, 10, 20, 25, 30, 35, 37, 45, and 50 °C for 7 days. The pH range for growth was also measured in MPYG medium covering the range of pH 3.0–10.0 (at an interval of 0.5 pH units) at 37 °C for 7 days, and the pH test medium stabilized with the appropriate buffers as described by Sorokin40. Growth at various NaCl concentrations (0–6%, in increments of 1.0%) was performed for determining tolerance to NaCl. Catalase activity was assessed by gas formation after dropping the fresh cells in 3% H2O2 solution. Biochemical properties, including utilization of substrates, acid production from carbohydrates, enzyme activities, hydrolytic activities, were determined using the API 20A, API 50CHL, and API ZYM systems (bioMérieux Inc., Marcy-l’Étoile, France) according to the manufacturer’s instructions with modification by adding sodium acetate at concentration of 0.5% in all tests. The reference type strain was tested under the same condition as used for strains AF52-21T and CM04-06T. In all tests, the strains were incubated under anaerobic conditions.

Chemotaxonomic characteristics

Chemotaxonomic features were investigated by analysing of cellular fatty acids, cell wall composition, polar lipids, and quinones. Biomasses of strains AF52-21T, CM04-06T, and ATCC 27768T were harvested from cells growing in MPYG at 37 °C under anaerobic conditions for 2 days. Whole cell fatty acid methyl esters (FAMEs) were extracted, separated and identified according to the MIDI Microbial Identifications System and performed by CGMGG (China General Microbiological Culture Collection Center, Beijing, China) identification service. The diagnostic isomer of diaminopimelic acid in whole-cell hydrolysates was identified by TLC as described by Zou et al.41. The polar lipids of strain AF52-21T, CM04-06T, and ATCC 27768T were extracted from lyophilized bacterial cells and analysed using two-dimensional TLC as described42. Menaquinone components were extracted and identified by HPLC (LC-20AD; Shimadzu) coupled with a single quadrupole mass spectrometer (LCMS-2020; Shimadzu) as described42.

Fermentation products analysis

For analysis the metabolic end products from glucose fermentation, including SCFAs and organic acids, cells were cultured in MPYG broth at 37 °C under anaerobic conditions for 2 days. Supernatant harvested from the cultures centrifuged at 10,000 g for 10 min was used for determining SCFAs and organic acids. SCFAs detection was performed using a gas chromatograph (GC-7890B, Agilent) equipped with a flame ionization detector (FID) and capillary column packed with Agilent 19091 N-133HP-INNOWax porapak HP-INNOWax (30 m × 0.25 mm × 0.25 μm). Organic acids were analysed by equipping capillary column packed with Agilent 122-5532G DB-5 ms (40 m × 0.25 mm × 0.25 μm).

Genome sequencing, assembly, and annotation of isolates

For genome sequences of strains AF52-21T and CM04-06T, genomic DNA was extracted following the method described above. The draft genome was sequenced on an Ion Proton Technology (Life Technologies) platform at BGI-Shenzhen (Shenzhen, China) after constructing a paired-end DNA library with insert size of 500 bp. The resulting reads were assembled using the SOAPdenovo 2 package43. CheckM (v1.1.2) was used to estimate genome completeness and contamination44. Genome assemblies were visualized using CGView Server45 (http://stothard.afns.ualberta.ca/cgview_server/index.html). Annotation of the assembled genome was performed using the Rapid Annotation Using Subsystem Technology (RAST) server46 and COG database47. The G + C content in genomic DNA was calculated from the whole genome sequence. The genes in known pathways from acetyl-CoA to butyrate were annotated by BLAST (evalue = 1e−5, identity ≥ 60%, coverage ≥ 90%)30. AntiSMASH 5.0 was used to predict BGCs. A search for prophages was performed by PHAST (http://phast.wishartlab.com/)48. Antibiotic resistance was analysed using the CARD database49. The carbohydrate active enzymes genes were annotated by dbCAN250. The dbCAN-PUL51 database was used to determine genes related to important carbohydrate metabolism pathways.

Average nucleotide identities

Genome relatedness was investigated by calculating average nucleotide identity (ANI)52, with a value of 95–96% proposed for delineating bacterial species, corresponding to the traditional 70% DNA-DNA reassociation standard53,54. The ANI values between strains AF52-21T, CM04-06T, and closely related species were determined using the FastANI55.

Phylogenetic analysis based on 16S rRNA genes sequence

16S rRNA gene sequences were extracted from the genomes using RNAmmer56. The obtained 16S rRNA gene sequences of strains AF52-21T and CM04-06T were compared with the sequences of type strains retrieved from the EzBioCloud database (https://www.ezbiocloud.net/)57 and an unrecognized species ‘Faecalibacterium hominis’ 4P-1527 using the BLAST program to determine the nearest phylogenetic neighbours and 16S rRNA gene sequence similarity values. Phylogenetic trees were reconstructed by using the neighbour-joining method58 (K2 + G model of substitution), maximum-likelihood method59 (GTR + G + I model of substitution) and minimum-evolution method60 (K2 + G model of substitution) with the MEGA X program package61, after Clustal W multiple alignment of the sequences. 1548 nucleotide positions were finally used for tree constructions. Robustness of the phylogenetic trees was evaluated by using the bootstrap resampling method (1000 resamplings) of Felsenstein62.