Journal of Biological Chemistry
Volume 295, Issue 44, 30 October 2020, Pages 14826-14839
Journal home page for Journal of Biological Chemistry

Computational Biology
Global analysis of adenylate-forming enzymes reveals β-lactone biosynthesis pathway in pathogenic Nocardia

https://doi.org/10.1074/jbc.RA120.013528Get rights and content
Under a Creative Commons license
open access

Enzymes that cleave ATP to activate carboxylic acids play essential roles in primary and secondary metabolism in all domains of life. Class I adenylate-forming enzymes share a conserved structural fold but act on a wide range of substrates to catalyze reactions involved in bioluminescence, nonribosomal peptide biosynthesis, fatty acid activation, and β-lactone formation. Despite their metabolic importance, the substrates and functions of the vast majority of adenylate-forming enzymes are unknown without tools available to accurately predict them. Given the crucial roles of adenylate-forming enzymes in biosynthesis, this also severely limits our ability to predict natural product structures from biosynthetic gene clusters. Here we used machine learning to predict adenylate-forming enzyme function and substrate specificity from protein sequences. We built a web-based predictive tool and used it to comprehensively map the biochemical diversity of adenylate-forming enzymes across >50,000 candidate biosynthetic gene clusters in bacterial, fungal, and plant genomes. Ancestral phylogenetic reconstruction and sequence similarity networking of enzymes from these clusters suggested divergent evolution of the adenylate-forming superfamily from a core enzyme scaffold most related to contemporary CoA ligases toward more specialized functions including β-lactone synthetases. Our classifier predicted β-lactone synthetases in uncharacterized biosynthetic gene clusters conserved in >90 different strains of Nocardia. To test our prediction, we purified a candidate β-lactone synthetase from Nocardia brasiliensis and reconstituted the biosynthetic pathway in vitro to link the gene cluster to the β-lactone natural product, nocardiolactone. We anticipate that our machine learning approach will aid in functional classification of enzymes and advance natural product discovery.

natural product biosynthesis
acetyl-CoA synthetase
enzyme catalysis
bioinformatics
coenzyme A (CoA)
β-lactone synthetases
adenylate-forming enzymes
machine learning
Nocardia
substrate specificity

Cited by (0)

This article contains supporting information.

Author contributions—S. L. R., M. H. M., and L. P. W. conceptualization; S. L. R., S. J. P., T. P. S., and M. H. M. resources; S. L. R' B. R. T. data curation; S. L. R. and B. R. T. software; S. L. R. formal analysis; S. L. R., M. H. M., and L. P. W. funding acquisition; S. L. R., B. R. T., and S. J. P. validation; S. L. R., B. R. T., M. D. S., and S. J. P. investigation; S. L. R. visualization; S. L. R., B. R. T., and M. H. M. methodology; S. L. R. writing-original draft; S. L. R., T. P. S., M. H. M., and L. P. W. project administration; S. L. R., B. R. T., M. D. S., S. J. P., T. P. S., M. H. M., and L. P. W. writing-review and editing; M. H. M. and L. P. W. supervision.

Funding and additional information—S.L.R. is supported by the National Science Foundation Graduate Research Fellowship under NSF grant number 00039202 and a Graduate Research Opportunities Worldwide (GROW) fellowship to the Netherlands supported by the NSF and the Netherlands Organization for Scientific Research (NWO) grant number 040.15.054/6097 (to S. L. R. and M. H. M.).

Conflict of interest—M. H. M. is a co-founder of Design Pharmaceuticals and on the scientific advisory board of Hexagon Bio.

Present address for Serina L. Robinson: Institute of Microbiology, ETH Zürich, Zürich, Switzerland.

Abbreviations—The abbreviations used are:

    NRPS

    nonribosomal peptide synthetase

    A domain

    adenylation domain

    SACS

    short-chain acyl-CoA synthetase

    MACS

    medium-chain acyl-CoA synthetase

    LACS

    long-chain acyl-CoA synthetase

    FAAL

    fatty acyl-AMP ligase

    LUC

    luciferase

    BLS

    β-lactone synthetase

    ARYL

    aryl-CoA ligase

    pHMM

    profile Hidden Markov Model

    FSI

    fatty acyl-AMP ligase-specific insertion

    AUROC

    area under the receiver operating characteristic curve

    aa

    amino acid(s).