The dipeptide conformations of all twenty amino acid types in the context of biosynthesis
© Bywater and Veryazov. 2015
Received: 11 August 2015
Accepted: 12 October 2015
Published: 4 November 2015
There have been many studies of dipeptide structure at a high level of accuracy using quantum chemical methods. Such calculations are resource-consuming (in terms of memory, CPU and other computational imperatives) which is the reason why most previous studies were restricted to the two simplest amino-acid residue types, glycine and alanine. We improve on this by extending the scope of residue types to include all 20 naturally occurring residue types. Our results reveal differences in secondary structure preferences for the all residue types. There are in most cases very deep energy troughs corresponding either to the polyproline II (collagen) helix and the α-helix or both. The β-strand was not strongly favoured energetically although the extent of this depression in the energy surface is, while not “deeper” (energetically), has a wider extent than the other two types of secondary structure. There is currently great interest in the question of cotranslational folding, the extent to which the nascent polypeptide begins to fold prior to emerging from the ribosome exit tunnel. Accordingly, while most previous quantum studies of dipeptides were carried out in the (simulated) gas or aqueous phase, we wished to consider the first step in polypeptide biosynthesis on the ribosome where neither gas nor aqueous conditions apply. We used a dielectric constant that would be compatible with the water-poor macromolecular (ribosome) environment.
There are many reasons why there has been so much interest in calculating peptide conformations (Gould et al. 1994; Wu et al. 2010; Bellesia et al. 2010; Hovmoller et al. 2002; Bywater and Veryazov 2013; Carrascoza et al. 2014). These include the need to understand the preferred conformations of physiologically active peptides, the way peptides are incorporated into polypeptide and protein structures, and the conformation of de novo peptide formation in the ribosome. Most previous studies (Gould et al. 1994; Wu et al. 2010) were concerned with small peptides per se, in the gas or aqueous phase, while we addressed the latter question, that of peptide biosynthesis.
Throughout this and our previous work, and in keeping with the usage adopted by previous authors (Gould et al. 1994; Bywater and Veryazov 2013; Carrascoza et al. 2014), we study constructs that we refer to as primitive dipeptides with a N-acetyl-(XXX)(2)-N′-methylamine as a generic structure in which XXX represents the defining amino acid residue type for the particular dipeptide. In this context, N-acetyl is employed as a surrogate for the first amino acid residue in the dipeptide. Although not a true amino acid residue as such it is needed, together with the C-terminal amide group, to provide the correct electronic arrangement for a dipeptide and in order to block zwitterion formation. We chose to study all twenty members of the canonical set of amino acid types. A previous publication (Carrascoza et al. 2014) also reported studies of the entire set of amino acids, with somewhat different results, as discussed below.
As referred to in earlier papers (Bywater et al. 2001; Bellesia et al. 2010; Hovmoller et al. 2002; Bywater and Veryazov 2013; Carrascoza et al. 2014), different residue types have different propensities to adopt one or other of the regularly repeating polypeptide structures, α-helix, 310 helix, polyproline II helix (here abbreviated as PP-helix) or β-strand (Bywater et al. 2001; Liljas et al. 2009). These preferences are however not absolute, they can vary according to context: both near neighbours and internal 3D contacts can affect the outcome. In many different areas of protein science it is of interest to know what are the energetic differences between these conformations. In the area we wish to investigate, that of the conformation adopted by newly synthesized peptides on the ribosome, it has previously been proposed (Lim and Spirin 1984, 1986) that the α-helix is the predominant structure. Our earlier results (Bywater and Veryazov 2013) support this prediction, but an alternative, the PP-helix, emerged as an equally likely and in some cases stronger contender. Furthermore, any extended α-helix would be vulnerable to disruption upon the appearance of a proline residue (Bywater et al. 2001). These reflections added to the importance of studying all twenty amino-acid types so as to see how these preferences are distributed throughout the entire set. It is important to note that while the α-helix and the β-strand are, either singly or in combination, by far the most predominant secondary structure types found in globular proteins (membrane proteins are either all-α-helix or all-β-strand), for fibrous proteins the converse is true, these are typically proline- and glycine-rich structures similar to the PP-helix which plays a prominent role. The protein biosynthesis machinery must be able to cater for both classes of protein.
We constructed the starting structures for each of the many thousands of calculations in the same way as before (Bywater and Veryazov 2013) using the Yasara protein modelling program package (Krieger et al. 2002). A complete set of conformers was constructed for each set, whereby the Ci-1–Ni–CAi–Ci angles (the φ angle) were stepped through at intervals of 3° (120 steps) while for each φ rotamer the Ni–CAi–Ci–Ni+1 angle (ψ) was stepped through 120 steps of 3°. This produced a total of 1681 structures for each amino acid type (41 for the special case of proline). In contrast to certain other studies (e.g. Carrascoza et al. 2014) there was no attempt to optimize these input structures. Instead a so-called rigid scan regime was imposed whereby, for each amio acid type, the rotameric state of the side chain was maintained while the φ,ψ angles were changed. This was considered essential in order to be able to make like-for-like comparisons for each amino acid type at these different backbone angles. If the side chain rotameric state for each different backbone geometry were allowed to relax, that would produce an energy minimized structure, but that would be a rather uninteresting object of study because it could not be compared with the thousands of other backbone geometries. Furthermore, the minimum side chain energy state may or may not be relevant at all. For all but the “smallest” side chain types there are multiple rotameric states that are accessible (Ponder and Richards 1987; Pupo and Moreno 2009). It would be impossible to cater for all of them.
For each of these conformers DFT calculations with B3LYP functional and ANO-L-VDZP basis set were performed using Molcas 7.8 (Aquilante et al. 2010). The PCM model was used to simulate solvation effects (Karlström et al. 2003; Pomelli and Tomasi 1997). As before (Bywater and Veryazov 2013) we selected a dielectric constant of 2.5 to reflect the water-poor environment of the peptidyltransferase site and the extremely slow tumbling rate of an object as large as a ribosome.
Description of the topography of the energy surface in φ, ψ space with remarks concerning the secondary structure preferences for each residue type
Consensus ranges for secondary structure types
−72° < φ < − 60°
−45° < ψ = − 39°
−75° < φ < − 74°
−5° < ψ < − 4°
−72° < φ < − 76° –145° < ψ < −144°
100° < φ < 180°
90° < ψ < 180°
Ramachandran plot shows a classical pattern with −30° < φ < 30° forbidden zone. The β-strand and PP-helix are well populated while α-helix is not strongly favoured. The right-hand region which is fully accessible to glycine (see G) is weakly accessible to A compared to other (non-G) amino acid types
Very similar to A, but an additional forbidden zone shows up on the far right (φ ≈ 152°). Like A, β-strand and PP-helix are well populated while α-helix is not strongly favoured
Very similar to C the right-hand forbidden zone broader now due to the bulk of the side chain carboxylate moiety. PP-helix and 310 helix preferring rather than the “more famous” α-helix and β-strand.
Very much like D except now the α-helix comes more into prominence (polyglutamate or glutamate-rich peptides are known to favour the α-helix). E also has a side chain carboxylate moiety but it is displaced further away from the peptide chain by an additional methylene, so the forbidden band is narrower than that for D
Even distribution amongst all secondary structure types but very reminiscent of E. F is likewise known to favour the α-helix. Strongly forbidden 120° < φ < 152° region due to bulky aromatic side chain
Essentially symmetrical distribution about the universal −30° < φ < 30° barrier. β-strand and PP-helix dominate. G can occur within α-helices but this residue type uniquely does not favour the standard right-handed geometry over a left-handed one, both isomers are equally possible [for A that would be an extremely rare event (but not unknown)]. Because of its rotational flexibility G is an important turn motif
Similar to F but now with α-helix not strongly favoured. But like F has a very prominent 120° < φ < 152° restricted region
This residue type offers a major difference to most of the others. A “new” restricted region −120° < ψ < −140° appears, indicative of significant steric clashes due to β-branching [NB. In this context “β” means that the branching occurs at the CB atom, as with T and V (qv)]. As for the allowed regions, the polyproline region φ = −72°, ψ = 144° is evident while the α-helix region φ = −72°, ψ = −45°, is considerably eroded (this was already reported in Bywater and Veryazov 2013 and similar findings were reported recently in Ilawe et al. 2015). The β-strand region shows up very prominently. This is as expected from experimental data (Bellesia et al. 2010; Hovmoller et al. 2002). We put the ranking for this residue type in the order polyproline > α-helix ∼ β-strand ≫ 310 helix
Resembles E in many ways but now there is a clear gap between the β-strand/PP-helix region and the (favoured) α-helix region. The 310 helix region seems to be excluded (this may have significance for protein folding since 310 helices can play a role in this process). One very striking feature, with it shares only with S (see below) is that the large barrier to rotation of the φ angle (usually 120° < φ < 152°) is shifted to φ ≈ 108°
Similar to C and distinctly different from its position isomer I (qv). The absence of the −120° < ψ < −140° steric clash accounts for the different secondary structure propensities between L and I. In particular, L is amenable to the α-helix geometry while I is not. L does not seem to favour the PP-helix, and the β-strand region and α-helix regions are discontiguous
M is similar to F. For example, it is “α-helix friendly”. The 120° < φ < 152° forbidden zone shows up prominently. This restricted zone is due to the bulkiness of the side chain [in the interior of proteins, M often “behaves like” F (and W, Y) due partly to this bulkiness but also due to quantum chemical considerations concerning the somewhat similar behaviour of d-orbitals compared with the π-orbitals of aromatic side chains. These allow opportunities for orbital overlap which confers directionality]
N is similar to D favouring the PP-helix and 310 helix rather than the more famous α-helix and β-strand. But, while D is not regarded as being “α-helix preferring” exactly, N has a an even greater aversion and can be considered “α-helix forbidding”. The only difference between N and D is the amido-terminal group of the side chain instead of a carboxylate
The plot for P is necessarily restricted to a very narrow strip in the Φ dimension due to its cyclic structure. As expected, P favours PP-helix almost by definition. But α-helix is a good runner-up. The notion that P is “helix-breaking” needs to be revised. P can sit at the beginning of an α-helix and even in the middle of such a helix (Bywater et al. 2001), although there will be disruptions at the (i − 3)rd residue (so-called “kinks”). But for P (where only the −60° < Φ < −72° region is relevant for this residue type (see Fig. 5) one can clearly discern the order of preference as PP-helix > α-helix ⋙ anything else
One might expect this to be similar to E. But it isn’t. Compared to E there is almost no preference for α-helix. This has to be a most significant result. How can amidation of a side chain make such a difference? But it mirrors exactly the difference between N and D
One might expect R to resemble K. But, unlike K there is no divide between the α-helix and the PP-helix/β-strand region. These areas are effectively contiguous and 310 helices would be accessible
One might expect S to be similar to C (qv) but it isn’t. There is a much more pronounced 36° < φ < 136° zone and α-helix propensity is greatly diminished. The explanation probably has to do with intraresidue hydrogen-bonding. As noted with K (q.v.) the φ rotation barrier is shifted, this time to ≈96°
Similar to S in the 36° < φ < 136° zone and almost exactly like I (and V) (qv) in the −120° < ψ < −140° region, indicative of significant β-branching causing steric clashes. Enhanced α-helix propensity compared with S
As with I and T: the −120° < ψ < −140° region highly restricted. Greatly diminished α- and 310 helix propensity, β-strand dominant. An interesting incursion into the −80° < ψ < −120° region not really seen with any other residue types
Very similar to H (and Y, F, M) due to bulky side chain
Very similar to H (and W, F, M) due to bulky side chain
Our previous results, for residue types G, A, I and L provided support for established ideas (Lim and Spirin 1984; Lim and Spirin 1986) that the α-helix is a “default” conformation for the de novo generation of polypeptides on the ribosome but also demonstrated a clear alternative or rival. The PP-helix was given comparable, if not in some cases, greater prominence. We see further examples of that here, in the now extended repertoire of residue types. This is important because there is for any species only a single class of ribosome which has to cater for both globular (requiring α-helix and/or β-strand) and fibrous (strongly PP-helix preferring) proteins. Recent DFT studies on a restricted set of GXG model peptides (Ilawe et al. 2005) confirm the prominence of the PP-helix, while also finding a preference for β-strand. The latter is understandable since these authors were focusing on the X = I/V/L and the first two of these residue types are known (and shown here) to be β-strand preferring. All of these “preferential” states (α-helix, β-strand, PP-helix) must be regarded as at least potentially accessible for most amino acid types. I and V do turn up in α-helices, albeit less frequently than in β-strands. Note should be taken of the fact that while α- and PP-helix occupy a relatively small area of φ,ψ space these two structural types are characterised by very deep depressions which renders them enthalpically favored. The β-strand in contrast covers a wide area (alternatively: there is greater tolerance to distortions) although the depression is not as deep. Located between the α-helix, β-strand zones is a region that corresponds to the 2.27 ribbon structure. This was discussed at length in Carrascoza et al. 2014 and indeed, our results do not rule out that some of the amino acid types might dwell in that region. But it is not normally found in proteins and it is an unlikely contender as part of a biosynthesis process. Concerning the apparent propensities for an α-helix geometry, this has to be viewed in the light of the fact that we are considering dipeptides and a true α-helix will not actually form in stretches shorter than 4 residues, in which the first of the hydrogen bonds that stabilize the helix can be established. So this suggests that there is something that intrinsically favours this helix regardless of the assistance provided by hydrogen bonds. The answer almost certainly resides in the need to “remove bumps”, i.e., steric repulsions between the atoms at certain key side chain torsion angles. Similar remarks might be made about the β-strand. There is a very wide range of backbone torsion angles available to this geometry. Also in this case there are no stabilising hydrogen bonds, but in proteins, β-strands are always incorporated into β-sheets, held together by hydrogen bonds. These β-sheets exhibit, as mentioned above, a very large variety of “shapes” and contortions which are allowed because of the very wide range of torsion angles accessible to the constituent β-strands. Lastly, mention should be made of 310 helices. There are clear hints of distinct differences in their prevalence between different amino acid residue types and this can have repercussions for how protein folding takes place. Now that we have energy calculations for the entire set of 20 residue types this makes it easier to survey the whole family and see what patterns of secondary structure preferences might emerge.
The results presented here can be used by protein chemists as a guide to what the most likely secondary structure propensities are for each of the amino acid types. But certain caveats need to be issued. Firstly, the structures studied are not in the strict chemical sense “correct” structures for the dipeptides in gas phase or solution. This is anyway not an endeavor of compelling interest. Here, we have attempted to mimic an environment that the incipient polypeptide chain might encounter in the interstices of the ribosome, or indeed anywhere inside the cell which is known to be very “crowded”, but we can only do that with a very primitive solvation model. We do not know what the neighboring residues in contact with the newly synthesized peptide are and what the precise geometric arrangement is. We only allow the two backbone angles φ and ψ to change, Given the uncertainties about the environment, it does not make sense to allow all other angles to relax and to conduct energy minimizations of these structures. We think that by conducting things in the way we have has at least thrown some light on to the question of how each residue type behaves in comparison with the others, and some information concerning secondary structure propensities is provided. Obtaining structural information about longer peptides is of course also of great interest, but different methodologies are needed for that, molecular dynamics rather than quantum chemical methods, and recent work (Nilsson et al. 2015) reports the results of such cotranslational folding studies. These data do not in any way contradict our results, quite the converse, but the example given was of a small protein with a tendency to form α-helical structure. It would be interesting to see if any attempt is made to detect cotranslational folding of a fibrous protein, in which case the collagen PP helix would come into play.
There has been much interest in determining the structure of dipeptides. Usually these efforts have been restricted to the case of primitive dipeptides where the central residue type is glycine or alanine, and no account was made of the effect of solvent. Gas-phase conditions were assumed. Our previous work extended this coverage of the residue type repertoire to two further cases, that of leucine and its position isomer isoleucine. Simulated solvent conditions corresponding approximately to the water-poor environment and large particle size of a ribosome (or elsewhere in the crowded interstices of the cell) were applied. Already at that stage, major differences were seen between the four residue types, particularly between the two isomers. This encouraged further research into the entire set of 20 standard residue types. We have produced a compendium that protein chemists can use as a guide to the most likely secondary structure propensities for each of the amino acid residue types. Most amino acid residue types can access all three of the major secondary structures α-helix, β-strand, PP-helix but there are individual preferences which were known from experimental and bioinformatics studies. Our plots map out these preferences. In reference to ribosomes we recall that the same ribosomes have to cater for all 20 amino acid types but also enable both globular and fibrous proteins to be formed within and emerge from the peptide synthesis tunnel. We have not considered cotranslational folding as such, but our work should be helpful as a starting point for such studies.
Both authors made distinct but equivalent contributions to this work. RPB conducted bioinformatics work and constructed the many thousands of input structures. VV conducted the quantum mechanical calculations and produced the dihedral angle plots. The manuscript was largely written by RPB but the authors had joint control over its content. Both authors read and approved the final manuscript.
Elmar Krieger and Gert Vriend are thanked for kindly making the Yasara modelling program available under an academic license. The computations were performed on resources provided by the Swedish National Infrastructure for Computing (SNIC) at LUNARC supercomputer center.
All coordinates are available from authors on application.
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- Aquilante F, De Vico L, Ferré N, Ghigo G, Malmqvist PÅ, Neogrády P, Pedersen TB, Pitoňàk M, Reiher M, Roos BO, Serrano-Andrés L, Urban M, Veryazov V, Lindh R (2010) MOLCAS 7: the next generation. J Comput Chem 31:224–247. doi:https://doi.org/10.1002/jcc.21318 View ArticleGoogle Scholar
- Bellesia G, Jewett AI, Shea JE (2010) Sequence periodicity and secondary structure propensity in model proteins. Protein Sci 19:141–154. doi:https://doi.org/10.1002/pro.288 View ArticleGoogle Scholar
- Bywater RP, Veryazov V (2013) The preferred conformation of dipeptides in the context of biosynthesis. Naturwissenschaften 100:853–859. doi:https://doi.org/10.1007/s00114-013-1085-7 View ArticleGoogle Scholar
- Bywater RP, Thomas D, Vriend G (2001) A sequence and structural study of transmembrane helices. J Comput Aided Mol Des 15:533–552. doi:https://doi.org/10.1023/A%3A1011197908960 View ArticleGoogle Scholar
- Carrascoza F, Zaric S, Silaghi-Dumitrescu R (2014) Computational study of protein secondary structure elements: Ramachandran plots revisited. J Mol Graph Model 50:125–133. doi:https://doi.org/10.1016/j.jmgm.2014.04.001 View ArticleGoogle Scholar
- Gould R, Cornell WD, Hillier IH (1994) A quantum mechanical investigation of the conformational energetics of the alanine and glycine dipeptides in the gas phase and in aqueous solution. J Am Chem Soc 116:9250–9256. doi:https://doi.org/10.1021/ja00099a048 View ArticleGoogle Scholar
- Hollingsworth SA, Karplus PA (2010) A fresh look at the Ramachandran plot and the occurrence of standard structures in proteins. Biomol Concepts 1:271–283. doi:https://doi.org/10.1515/BMC.2010.022 View ArticleGoogle Scholar
- Hovmoller S, Zhou TP, Ohlson T (2002) Conformations of amino acids in proteins. Acta Cryst D58:768–776Google Scholar
- Ilawe NV, Raeber AE, Schweitzer-Stenner R, Toal SE, Wong BM (2005) Assessing backbone solvation effects in the conformational propensities of amino acid residues in unfolded peptides. PCCP 17(38):24917–24924. doi:https://doi.org/10.1039/c5cp03646a View ArticleGoogle Scholar
- Karlström G, Lindh R, Malmqvist P-Å, Roos BO, Ryde U, Veryazov V, Widmark P-O, Cossi M, Schimmelpfennig B, Neogrady P, Seijo L (2003) MOLCAS: a program package for computational chemistry. Comput Mat Sci 28:222View ArticleGoogle Scholar
- Krieger E, Koraimann G, Vriend G (2002) Increasing the precision of comparative models with YASARA NOVA—a self-parameterizing force field. Proteins 47:393–402. doi:https://doi.org/10.1002/prot.10104 View ArticleGoogle Scholar
- Liljas A, Liljas L, Piskur J, Lindblom G, Nissen P, Kjeldgaard M (2009) Textbook of structural biology. World Scientific, Singapore. ISBN 978-981-277-207-7View ArticleGoogle Scholar
- Lim VI, Spirin AS (1984) Stereochemistry of the transpeptidation reaction in the ribosome: the ribosome generates the a-helix during synthesis of the polypeptide chain of the protein. Doklady Akad Nauk 280:235–238Google Scholar
- Lim VI, Spirin AS (1986) Stereochemical analysis of ribosome conformation of nascent peptide. J Mol Biol 188:565–577View ArticleGoogle Scholar
- Nilsson OB, Hedman R, Marino J, Wickles S, Bischoff L, Johansson M, Müller-Lucks A, Trovato F, Puglisi JD, O’Brien EP, Beckmann R, Von Heijne G (2015) Cotranslational protein folding inside the ribosome exit tunnel. Cell Reports 12:1533–1540. doi:https://doi.org/10.1016/j.celrep.2015.07.065 View ArticleGoogle Scholar
- Pomelli CS, Tomasi J (1997) A new formulation of the PCM solvation method. Theor Chem Accounts 96:39–43. doi:https://doi.org/10.1007/s002140050201 View ArticleGoogle Scholar
- Ponder JW, Richards FM (1987) Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. J Mol Biol 193(4):775–791. doi:https://doi.org/10.1016/0022-2836(87)90358-5 View ArticleGoogle Scholar
- Pupo A, Moreno E (2009) Do rotamer libraries reproduce the side-chain conformations of peptidic ligands from the PDB? J Mol Graph Model 27:611–619. doi:https://doi.org/10.1016/j.jmgm.2008.10.002
- Wu H, Canfield A, Adhikari J, Huo S (2010) Quantum mechanical studies on model alpha-pleated sheets. J Comput Chem 31:1216–1223. doi:https://doi.org/10.1002/jcc.21408 View ArticleGoogle Scholar