ABSTRACT
Enzymes that cleave ATP to activate carboxylic acids play essential roles in primary and secondary metabolism in all domains of life. Class I adenylate-forming enzymes share a conserved structural fold but act on a wide range of substrates to catalyze reactions involved in bioluminescence, nonribosomal peptide biosynthesis, fatty acid activation, and β-lactone formation. Despite their metabolic importance, the substrates and catalytic functions of the vast majority of adenylate-forming enzymes are unknown without tools available to accurately predict them. Given the crucial roles of adenylate-forming enzymes in biosynthesis, this also severely limits our ability to predict natural product structures from biosynthetic gene clusters. Here we used machine learning to predict adenylate-forming enzyme function and substrate specificity from protein sequence. We built a web-based predictive tool and used it to comprehensively map the biochemical diversity of adenylate-forming enzymes across >50,000 candidate biosynthetic gene clusters in bacterial, fungal, and plant genomes. Ancestral enzyme reconstruction and sequence similarity networking revealed a ‘hub’ topology suggesting radial divergence of the adenylate-forming superfamily from a core enzyme scaffold most related to contemporary aryl-CoA ligases. Our classifier also predicted β-lactone synthetases in novel biosynthetic gene clusters conserved across >90 different strains of Nocardia. To test our computational predictions, we purified a candidate β-lactone synthetase from Nocardia brasiliensis and reconstituted the biosynthetic pathway in vitro to link the gene cluster to the β-lactone natural product, nocardiolactone. We anticipate our machine learning approach will aid in functional classification of enzymes and advance natural product discovery.
Footnotes
Updated references, minor grammatical changes, and links to code repos for reproducibility of figures and analysis