Introduction

The structural features and folding mechanisms of polypeptides and proteins arise from a complex and subtle balance of different interactions, fundamental not only as a whole but also at the molecular level. Their architecture is sustained by a fragile combination of inter- and intramolecular noncovalent interactions in the polypeptide chains1. Many of these interactions are very well known and include hydrogen bonds, Coulombic interactions, van der Waals interactions and hydrophobic effects2.

Another type of interaction in which two carbonyl groups form attractive interactions with each other has been termed an n → π* interaction. This interaction was first discussed in early 1970 but only recently has attracted significant attention, and it is hypothesized to impart substantial stability to proteins, mainly due to abundance of carbonyl groups3,4,5,6. Concretely, the Raines group has suggested that numerous residues in folded proteins are oriented to take advantage of this energy release, inferring that n → π* interactions between carbonyls could contribute significantly to the three-dimensional structure and conformational stability of proteins7,8,9,10,11,12,13.

Recent crystallographic studies are in agreement with the presence of such interactions14. Their evaluation is nevertheless challenging because of the solvent environment, or crystal packing forces in solids, which obscures them15,16,17,18. Moreover, the current challenges in de novo structure prediction and protein design show that our understanding of these interactions is at least incomplete19,20,21. This is reflected in the few experimental studies that directly probe n → π* interactions22,23,24. These typically employ either esters22 or alkenes22,23,24 as surrogates for the peptide bond, or rely on synthetizing residue mimics and altering one of their substituents to explore the trans/cis isomerization ratio7,25. As the n → π* interaction is only possible in the trans isomer, the ratio of isomers (Ktrans/cis) reports on the energy of the interaction.

Despite the important information obtained thus far, none of these methods give an exact evaluation of the interactions involved. Furthermore, all of these methods are approximate and necessarily result in the alteration of these interactions26,27. Usually, either they overestimate the strength of the n → π* interactions or they are obscured by other interactions. The evaluation of n → π* interactions in solution is difficult as there are attenuated effects in polar solvents, suggesting polar interactions rather than orbital interactions28,29. What if we could take a simple dipeptide with such competing intramolecular interactions that mimics structures found in crystals and isolate it so that there is no external influence? What if we use a powerful spectroscopic technique that provides accurate structural information? This combination should allow us to provide evidence for n → π* interactions in an unperturbed medium and evaluate their intrinsic importance.

On this basis, we pursued the study of the HGlyProOH dipeptide, which is a very common sequence in proteins. For example, collagen, the most abundant protein in vertebrates, consists of three peptide chains forming a triple helix. Each helix is composed of about 1000 amino acid residues with more than 300 repeats of the –Gly-Pro– amino acid sequence30,31. Furthermore, in a collagen triple helix, all of the peptide bonds are in the trans configuration, suggesting that the strength of the n → π* interaction that may stabilize the trans configuration correlates with the thermostability of collagen7. Therefore, any structural determination of the HGlyProOH sequence is of vital importance to understand the structure of proteins and to gain insight into the nature of the interactions. Under jet-cooled conditions, gas phase spectroscopy studies of peptides offer superior spectral resolution as compared to solvent-broadened condensed phase efforts. Gas phase studies examine the intrinsic conformational propensities of peptide residues without any external perturbation. Combining this ideal environment with Fourier transform microwave spectroscopy32,33,34,35, we obtain accurate rotational constants and quadrupole coupling constants of the nitrogen atoms, allowing us to obtain accurate structural parameters. Only very recently, a solid sample of the simplest HGlyGlyOH dipeptide has been transferred into the gas phase by laser ablation36, opening up the possibility of measuring a new range of molecules of biological importance using high-resolution rotational studies. In connection with the interest on the –Gly-Pro– sequence in proteins, we tackle the challenging problem of determining the exact nature of the HGlyProOH dipeptide.

Here, we present evidence in gas phase that unambiguously confirms the relevant role that the n → π* interaction has between carbonyls, which manifests even in a small system such as the HGlyProOH dipeptide. The results not only show that the trans arrangement is significantly stabilized over the cis configuration by an n → π* interaction, but also that this stabilization is meaningful, being even more stable than the hydrogen bonded form. Furthermore, there is a remarkably good agreement between the crystal structure and the experimentally determined gas phase structure, which further highlights the importance of the conclusions extracted.

Results

The rotational spectrum of HGlyProOH dipeptide

Briefly, neutral molecules of HGlyProOH (m.p. > 183 °C) were transfered to gas phase by laser ablation (LA) of finely powdered samples, using the third harmonic (355 nm) of a picosecond laser. The vaporized products were seeded in neon and supersonically expanded into a vacuum chamber. We can anticipate a rich conformational behavior arising from all plausible interactions between the polar groups, the torsional flexibility of its side chains and the endo or exo configurations that pyrrolic ring could present (Fig. 1a). Hence, up to 10 conformers have been predicted by ab initio calculations within 1000 cm−1 relative to the global minimum (see Fig. 1b, Supplementary Table 1 and Supplementary Data 110). Fortunately, the two-body collision with the carrier gas should cool the stable conformers to very low temperatures, thereby trapping them in their energy minima. Those with sufficient population can be probed in the supersonic expansion by Fourier transform microwave spectroscopy. It should also be highlighted that, while in solution phase the carboxyl group of proline is ionized, the molecules are in their neutral and protonated form in the gas phase.

Fig. 1
figure 1

Chemical structure and most stable conformers of HGlyProOH dipeptide. a The torsional flexibility of the side chain and the endo/exo configurations of the pyrrolic ring of HGlyProOH dipeptide gives rise to several conformers. The arrows of the sketch indicate the hindered single-bond rotations that govern conformational equilibrium. b The calculated low-lying conformers at MP2/6-311++G(d,p). The colors indicate the different atoms: red for oxygen, blue for nitrogen, gray for carbon and white for hydrogen. The values of the energetics and Gibbs free energies at 298 K in wavenumbers are also indicated (ΔEG)

More than 100,000 free induction decays were averaged in the time domain to obtain the broadband chirped pulse Fourier transform microwave (CP-FTMW) spectrum in the 3.0-8.0 GHz frequency range. We demonstrated36 how by careful control of the experimental parameters the signal of moderately large species for rotational spectroscopy in gas phase could be obtained. Figure 2a displays a portion of the broadband spectrum, while the whole spectrum is included in the Supplementary Figure 1. The spectrum shows that there is a large number of rotational transitions pointing to the presence of more than one conformer. On a first inspection, after the lines of photofragment species were identified and removed from the spectrum, it was possible to recognize three sets of a-type R-branch transitions as belonging to three distinct rotamers of HGlyProOH dipeptide labeled as 1, 2 and 3. New predictions and observations allowed the assignment of other b- and/or c-type R-branch lines confirming rotational assignments (see all details in the Supplementary Note 1). The main difficulty of observing the rotational spectrum of HGlyProOH was the nuclear quadrupole coupling interactions produced by two quadrupole nuclei 14Np and 14Na that splits each rotational level into several sublevels37. Consequently, the overall intensity of each rotational transition spread over many hyperfine components that appeared not well resolved with the attainable resolution of our broadband LA-CP-FTMW technique. This is exemplified for the 41,4←31,3 rotational transition in the insets shown in Fig. 2a and Supplementary Figure 2.

Fig. 2
figure 2

Rotational spectrum of HGlyProOH. a Portion of the broadband spectrum of HGlyProOH in the 4.5–5.5 GHz frequency region (see the Supplementary Figure 1 for the whole spectrum). The insets show the 414-313 rotational transitions of the three detected conformers of HGlyProOH highlighting that their hyperfine structure cannot be resolved using LA-CP-FTMW spectrometer. b Predicted spectrum for each of the detected conformers. Due to multi-resonance excitations52 the relative intensity of the experimental transitions is affected, and therefore only the position of the peaks must be taken into account. For a correct interpretation of the intensities, the high-resolution spectra using the LA-MB-FTMW spectrometer must be considered. c, d, e The 414-313 rotational transitions of the three detected conformers of HGlyProOH highlighting their hyperfine structure completely resolved using LA-MB-FTMW spectrometer. Each hyperfine component labeled with the corresponding quantum numbers I’, F’ ← I”, and F” is split by the Doppler effect. The predicted components assuming the Doppler effect (as bars) and the convoluted spectra by adjusting each line to a Gaussian function show an excellent agreement. The label in the right top indicates the scaling factor used to scale the intensity

At first glance, the frequencies of the rotational transitions were roughly measured at the center of the line clusters and fitted to a rigid rotor Hamiltonian leading to a preliminary set of rotational constants for the three rotamers (see all details in the Supplementary Note 1). These values were compared with those theoretically predicted for the most stable conformers in Table 1. Unfortunately, the difference in the values of the rotational constants is not large enough to allow complete discrimination of the observed rotameric species (see all details in the Supplementary Note 1). However, while the rotational constants are strongly related to mass distribution, the diagonal elements of the nuclear quadrupole coupling tensor, also included in the predictions of Table 1, depend critically on the electronic environment, position and orientation of the 14N nuclei. Hence, the predicted values for the diagonal elements of the nuclear quadrupole coupling tensor (χaa, χbb and χcc) for the 14N nuclei (see Table 1) could provide an independent approach to discriminate the observed species if the nuclear quadrupole hyperfine structure is well resolved and analyzed.

Table 1 Calculated spectroscopic parameters for the low-lying conformers of HGlyProOH

At this point, we took advantage of our narrowband LA-MB-FTMW technique38, which provides the sufficient resolution to resolve the complicated nuclear hyperfine structure. An example is illustrated in Fig. 2c–e which shows the same 41,4←31,3 transitions identified in the broadband spectrum in Fig. 2a now fully resolved (see also Supplementary Figure 3). Thus, in a second stage of the investigation, a selected set of rotational transitions of the three rotamers were analyzed using this high-resolution technique. All measured hyperfine components and detailed explanation of the fitting procedure39 are given in the Supplementary Tables 24.

Conformational characterization and assignment

The main goal of this paper is to obtain meaningful information about the interactions of the HGlyProOH dipeptide, more precisely to validate the n → π* interactions in the absence of any external perturbation. If this is possible in these structures, then it serves as a probe of principle that these interactions are fundamental. Therefore, an accurate structural determination is mandatory. Table 2 lists the spectroscopic parameters obtained from the analysis. They can be directly compared with those of Table 1 from in vacuo ab initio predictions and clearly discriminate the different rotamers. Notice how these values are unique and can be considered as the fingerprint of each structure. Thus, rotamers 1, 2 and 3 are conclusively identified as conformers 1, 2 and 10, respectively, depicted in Fig. 3 and Supplementary Data 13. These assignments are further confirmed by the consistency of the observed selection rules and intensities with the predicted values for electric dipole moment components in Table 1. Finally, taking into account the predicted value of the dipole moment components and averaging the intensity of selected transitions using the LA-MB-FTMW spectrometer, a good qualitative estimation of the relative abundances can be done. With this approach, the relative abundances of the characterized structures are conformer 1 ≥ 210. The absence of the rest of the conformers can be easily explained by the conformational cooling as described in the Supplementary Note 2 and Supplementary Figure 4.

Table 2 Experimental spectroscopic parameters for the detected conformers of HGlyProOH
Fig. 3
figure 3

Determined structures of HGlyProOH. The three determined structures of HGlyProOH dipeptide. The intramolecular bonds are highlighted

Noncovalent interactions

We unequivocally demonstrate the experimental characterization of conformers 1, 2 and 10. In fact, because of the excellent matching between the predicted and accurate experimental values of the rotational constants and quadrupole terms, the structures are very close to the calculated ones. Therefore, the intermolecular interactions in the HGlyProOH dipeptide can be analyzed through the structures adopted by the detected conformers. Figure 3 shows that all the conformers are stabilized by a bifurcated N-H•••O = C hydrogen bond, i.e., a C5 hydrogen bond, similar to those observed in α-amino acids40,41,42. Conformer 2 possess an additional O-H•••O = C that forces a trans-COOH arrangement. This interaction is missed in conformer 1, which is also in a trans arrangement. Conformer 10 shows the bifurcated N-H•••O = C interaction, but the C = O group is rotated opposite to the COOH group in a cis arrangement. The first important observation comes from this fact. Why is conformer 1 more stable than conformer 10? In principle, they both adopt a similar structure and are stabilized solely by the same bifurcated N-H•••O = C hydrogen bond. Therefore, the explanation must lie somewhere “hidden”.

In a quest to solve this puzzle, as well as to gain some insight into the nature of the interactions, we employed calculations at the MP2/6-311++G(d,p) level of theory with natural bond orbital (NBO) analysis.43 The markedly energetic difference from conformer 1 is attributed to the existence of an n → π* interaction between the non-bonding electron pair of the oxygen atom of the carbonyl group and the π* orbital at the carbonyl group of the carboxylic group. This interaction is shown in Fig. 3 and is a direct indication of why the trans disposition is preferred in proteins. Indeed, the second-order perturbation theory analysis shows that there is surprising stabilization energy of 0.77 kcal/mol (269 cm−1) due to this interaction. Furthermore, we recall that our results show that conformer 1 is more abundant than conformer 2, even when the latter has an extra O-H•••O interaction, highlighting once again the importance of this interaction. It is surprising that, in HGlyProOH, the C5-membered conformer 1 showing the n → π* interaction is more energetic than the C7-membered hydrogen bonded equivalent conformer 2, analogous to a “ϒ-turn”. The orbital overlap between the lone pair (n) of the donor carbonyl oxygen with the π* antibonding orbital of the second carbonyl group is possible when the putative donor forms a sub-van der Waals’ contact with the acceptor (d < 3.22 Å) along the Bürgi–Dunitz trajectory for nucleophilic addition (95° < θ < 125°)44. The distance found in this work is 2.93 Å, and the angle of approach of the donor oxygen to the acceptor carbonyl is 93.3°, consistent with an n → π* interaction. It is also interesting to note that the bifurcated C5 hydrogen bond and the n → π* interaction share the same p orbital of the carbonyl’s oxygen, which could weaken the strength of the n → π* interaction. The same is true for conformer 2 and the C7 hydrogen bond.

Discussion

A striking observation comes when comparing the observed conformers, particularly the most abundant species of HGlyProOH conformer 1, with that found in collagen peptide crystals. Figure 4 shows a comparison between the crystal and molecular structure of a collagen-like peptide at 1.9 Ångstrom resolution45 and the dominant structure of HGlyProOH obtained in this work. The results speak by themselves: the resemblance between the crystal structure and that got in the gas phase is almost identical. We note that in Fig. 4 no manipulation of the atoms has been done and only a reorientation of the views has been carried out for easier visualization. The reason for such a replica between the crystal and conformer 1 is due to the n → π* interaction, confirming its importance even at a molecular level. Usually, these interactions that are present in protein structures, especially helices, are likely to contribute with a 0.27 kcal/mol of stabilization energy per interaction for the amides25. The results of this work show that the stabilization energy of this interaction is three times larger in HGlyProOH, highlighting the importance of this interaction. Its importance is such that the structure of the isolated dipeptide is maintained identically in the crystal, where the interactions between different peptide chains and the surrounding solvent are expected. This is very well illustrated in Fig. 4b.

Fig. 4
figure 4

Comparison between crystal and gas phase. a Comparison between the crystal and molecular structure of a collagen-like peptide at 1.9 Ångstrom resolution (pdb:1CAG) (http://www.rcsb.org/pdb/explore/explore.do?structureId=1cag) and the most relevant structure of the HGlyProOH dipeptide in a reoriented view. b The comparison of a subtracted piece from the crystal and that of the experimental conformer 1 in this work (with circled spheres and showing the n → π* interaction) by overlapping the pyrrolic rings highlights the close resemblance between the two experimental structures. This consolidates the n → π* interactions

The accurate structural determination of HGlyProOH, which is maintained in the crystal form, together with the observation of the n → π* interaction in the most abundant conformer in this simple dipeptide, confirms the importance of this interaction which could have considerable implications for protein structure. For example, it is known that formation of the triple helix conformation in collagen requires the presence of a repeated –Gly-X-Y– sequence, the most common sequence being –Gly-Pro-Hyp–, because of being the most stabilizing tripeptide unit for the triple helix conformation46,47,48,49. A consequence of this folding pattern is that only Gly is small enough to fit as every third residue in each polypeptide chain where the three chains pack nearby. The hydrogen bonds in the triple helix occur between –NH group of glycine from a polypeptide α-chain and the carbonyl (C = O) group of proline residues from another chain (N-H•••O = C). The results in this work show that in HGlyProOH the glycine and proline skeletons are maintained in the same plane, like in the crystal structure, and that the COOH group whose OH will be replaced by the next amino acid is stabilized by an n → π* interaction. This interaction not only could help to stabilize the peptide/protein, but it could also shape each peptide in a helicoidal arrangement due to the perpendicular and fixed disposition of the carboxylic group. Therefore, the presence of the carbonyl groups in these amino acids could serve not only to template the ideal backbone dihedral angles of the collagen triple helix, but also to rely on the effect attraction between adjacent backbone carbonyl groups through an n → π* interaction. Clearly, this structural adaptation allows a close association of the collagen fibers within the molecule, facilitating hydrogen bonding and the formation of intermolecular cross-links. It is interesting how the stabilization energy is, in fact, 0.77 kcal/mol. This interaction is so significant that, in a simple dipeptide, the structure with such an interaction is as stable as that structure with a hydrogen bond. Therefore, due to the isolated conditions of the gas phase, the results presented in this work are probably the most accurate ones to evaluate the importance of the stabilizing interactions, and are in line with the hypothesis and findings regarding the importance of the n → π* interactions4,7,50.

Methods

Experimental

A commercial sample of HGlyProOH was used without any further purification. A solid rod was prepared by pressing the compound’s fine powder mixed with a small amount of commercial binder and was placed in the ablation nozzle. A picosecond Nd:YAG laser (355 nm, 20 mJ per pulse, 20 ps pulse width) was used as a vaporization tool. Products of the laser ablation were supersonically expanded using the flow of carrier gas (Ne, 8 bar) and characterized by chirped pulse and molecular beam Fourier transform microwave spectroscopies (LA-CP-FTMW, LA-MB-FTMW), using a recently constructed instrument38 dedicated to maximizing its performance from 2 to 8 GHz. It is ideal to record the rotational spectrum of large molecules such as HGlyProOH providing the high resolution necessary to analyze the hyperfine structure due to the presence of several 14N nuclei in the molecule.

Simulations

Geometry optimizations of HGlyProOH were done using Gaussian suite programs51. The model of choice was the Møller–Plesset (MP2) perturbation theory in the frozen core approximation, with the Pople’s 6-311++G(d,p) basis set. Frequency calculations were also computed to ensure that the optimized geometries are true minima and to calculate the Gibbs free energies. The NBO analysis was also done using the same program and the value for the n → π* interaction is taken from the second-order perturbation theory for the donor–acceptor interaction. See ref. 44 and references therein for more details.