Skip to main content

Mutational pressure dictates synonymous codon usage in freshwater unicellular α - cyanobacterial descendant Paulinella chromatophora and β - cyanobacterium Synechococcus elongatus PCC6301



Comparative study of synonymous codon usage variations and factors influencing its diversification in α - cyanobacterial descendant Paulinella chromatophora and β - cyanobacterium Synechococcus elongatus PCC6301 has not been reported so far. In the present study, we investigated various factors associated with synonymous codon usage in the genomes of P. chromatophora and S. elongatus PCC6301 and findings were discussed.


Mutational pressure was identified as the major force behind codon usage variation in both genomes. However, correspondence analysis revealed that intensity of mutational pressure was higher in S. elongatus than in P. chromatophora. Living habitats were also found to determine synonymous codon usage variations across the genomes of P. chromatophora and S. elongatus.


Whole genome sequencing of α-cyanobacteria in the cyanobium clade would certainly facilitate the understanding of synonymous codon usage patterns and factors contributing its diversification in presumed ancestors of photosynthetic endosymbionts of P. chromatophora.


Nucleotide triplet codons, differing only at the third site or rarely at second site but encoding same amino acid are termed as synonymous codons (Ermolaeva 2001). Synonymous mutations do not alter amino acid sequences, but usage of synonymous codons is not at uniform frequencies both within and between organisms, resulting in species specific codon usage bias (Grantham et al. 1980; Sharp et al. 1995). Synonymous codon usage (SCU) bias favours the usage of specific subset of certain codons (preferred codons) within each amino acid family (Agashe et al. 2013). Weak selection of preferred codons has been recognized as an important evolutionary force (Carlini et al. 2001) as SCU bias affects overall fitness of a cell by influencing the level of gene expression and various cellular processes such as RNA processing, translation of protein and protein folding (Parmley and Hurst 2007; Hershberg and Petrov 2008; Plotkin and Kudla 2011). Functional integrity of the genetic code is maintained by synonymous codons (Biro 2008). Population genetic studies reveal that evolution of biased codon usage is mainly either due to genome wide AT/GC biased mutational pressure or due to weak selection acting on specific subset of codons (preferred codons) (Bulmer 1991; Yang and Nielson 2008; Agashe et al. 2013). Other major factors include interaction between codons and anticodons (Kurland 1993), site-specific codon biases (Smith and Smith 1996), efficacy of replication (Deschavanne and Filipski 1995), usage of codon pairs (Irwin et al. 1995) and evolutionary time scale (Karlin et al. 1998).

Forces that influence evolution of SCU bias in various taxa has been extensively analyzed in various organisms (Ikemura 1982; Moriyama and Powell 1997; Nair et al. 2012; Seva et al. 2012; Sharp and Cowe 1991) as SCU bias has high significance in estimating evolutionary rates and phylogenetic reconstruction (Sarmer and Sullivan 1989; Wall and Herback 2003). Previous studies revealed that biased codon usage is stronger in highly expressed genes as selection pressure may be acting on those genes (Ikemura 1985). However, strength of selection appears to be varying among evolutionarily conserved amino acid residues that exhibit stronger bias. In contrast, evolutionarily variable residues often exhibit less or weaker bias (Akashi 1995; Drummond and Wilke 2008). Mutational pressure is another important factor, shaping SCU variations (Plotkin and Kudla 2011; Akashi 2001). Life style of prokaryotic organisms also play important role in SCU variations (Botzman and Margalit 2011). However, role of physiological processes in framing evolution of biased codon usage is yet to be unravelled (Agashe et al. 2013).

Endosymbiotic associations have significant impacts on cellular evolution and diversity (Bodyl et al. 2007). Extensive research on plastid genomes unravelled that a single primary endosymbiotic event in which a cyanobacteria was acquired by a unicellular eukaryote led to the evolution of plastids (Nowack et al. 2008). In endosymbiosis research, Paulinella chromatophora, a filose thecamoeba has been regarded as an outstanding model for primary plastid origin as P. chromatophora is the only known case of independent primary cyanobacterial acquisition (Chan et al. 2011; Marin et al. 2005; Yoon et al. 2006). Sequencing of chromatophore genome revealed the acquisition of photosynthesis by eukaryotes (Nowack et al. 2008). Chromatophores of P. chromatophora are monophyletic with α - cyanobacteria (Cyanobium clade) (Marin et al. 2007) unlike plastids that were evolved from β - cyanobacterial ancestor (Nowack et al. 2008).

SCU bias in various primary endosymbionts and plastid genomes were extensively studied (Nair et al. 2012; Morton 1993 1997 1998; Sablok et al. 2011). Various factors that frame SCU variations in phylogenetically close marine Prochlorococcus and Synechococcus clades in the PS clade (Prochlorococcus/Synechococcus) (Marin et al. 2007) were studied and found that SCU pattern of Proclorococcus was shaped by mutational pressure and nucleotide compositional constraints whereas in marine Synechococcus, translational selection determine the SCU pattern (Yu et al. 2012). However, no complete cyanobacterial genome has been reported from the Cyanobium clade (third major lineage of PS clade) so far (Figure 1). Hence, comparison of factors that frame SCU in chromatophore genome and its presumed ancestor could not be done. Since habitat of microorganisms play crucial role in SCU variation across genes (Botzman and Margalit 2011), unicellular freshwater β - cyanobacterium Synechococcus elongatus PCC6301 (SELONG clade) (Marin et al. 2007) was selected for comparing the SCU patterns and also to elicit the factors determining the SCU variations in evolutionarily young (P. chromarophora) and evolutionarily old (S. elongatus) genomes.

Figure 1
figure 1

Diagrammatic representation of three clades in the Prochlorococcus / Synechococcus clade. SCU variation in marine Synechococcus is shaped by selection but in marine Prochlorococcus, mutational pressure shapes the SCU pattern. SCU: Synonymous codon usage.


I. Compositional properties

a) Chromatophore genome of P. chromatophora

Comparison of total A, T, G, C contents in the genome of P. Chromatophora revealed higher content of A and T than G and C. Analysis of A3, T3, G3, C3 contents revealed that T3 content was highest and C3, the lowest of all with mean and S.D of 39.14% and 4.06% for T3 and 12.94% and 3.67% for C3. GC3 ranged from 16.15% to 54.38% with a mean and S.D of 27.40% and 4.69% respectively. Correlation analysis between total nucleotide contents and silent base contents revealed the stronger negative correlations between A3 and GC (Table 1). Similarly, high negative correlation was found between A and GC3 (Table 1). This suggests that A and GC contents play important role in SCU bias in the chromatophore genome. High positive correlation between C and G3 also might have profound effect in framing SCU patterns. However, no correlations were found between G and T3, and also for T and G3, suggesting no influence of individual T and G contents in codon usage bias. Since A3 content was in strong negative correlation with all total nucleotide contents (Table 1), it can be inferred that A3 content play an important role in shaping SCU patterns across 786 PCG in the chromatophore genome.

Table 1 Spearman’s rank correlation analysis of nucleotide contents in P. chromatophora

b) Genome of S. elongatus

Contrary to the observations with P. chromatophora, G and C contents were higher than A and T contents in the genome of S. elongatus. G3 and C3 contents were significantly higher than A3 and T3 contents. Among the silent base contents, C3 was highest and A3, the lowest of all with mean and S.D of 31.12% and 6.02% for C3, and 16.43% and 4.12% for A3. GC3 varied from 26. 12% to 76.90% with a mean and S.D of 60.19% and 7.45% respectively. Correlation analysis between total A, T, G, C contents and A3, T3, G3, C3 contents revealed that A3 was negatively correlated to G, C and GC. Similarly, T3 was in high negative correlation with G, C, GC3. GC composition at silent site was found negatively correlated with both A and T contents (Table 2). Hence, all silent base contents viz., A3, T3, G3 and C3 might be influencing SCU variations of protein coding genes (PCG) of S. elongatus.

Table 2 Spearman’s rank correlation analysis of nucleotide contents in S. elongatus

II. Characteristics of relative synonymous codon usage

a) Chromatophore genome of P. chromatophora

Overall codon usage patterns of 786 PCG in the chromatophore genome of P. chromatophora were analyzed (Table 3). All the amino acids were found to use A and T ending codons most frequently (codons with RSCU value greater than one) as chromatophore genome is rich in AT than GC. All C ending codons except AGC codes for Ser and CGC codes for Arg and all G ending codons except TTG for Leu were found rare (RSCU values less than 0.66). CTA codes for Leu was the only intermediate codon (RSCU value falls between 0.66 and 1) among the A ending codons. Among the 786 PCG in the chromatophore genome of P. chromatophora, ENC values ranged from 33.43 to 61 with a mean and S.D of 47.57 and 3.77 respectively, indicating considerable variation in codon usage among the genes of this organism. GC3 values ranged from 16.2% to 54.40% with mean and S.D of 27.40% and 4.69% respectively. Chi-square analysis of codon count revealed that 5% of the genes were placed on either side of axis 1, revealing 16 codons were statistically over represented (putative optimal codons) in genes located on the extreme left of the axis 1. Among these codons, ten A ending codons and six T ending codons were found to represent 62.5% A ending codons and 37.5% T ending codons. It is interesting to note that most of the over represented T ending codons were found in 2 codon families except for Glu in which CAA was over represented statistically. These result suggested that some other factors apart from compositional constraints might be influencing the codon usage in this organism.

Table 3 Overall codon usage in P. chromatophora

b) Genome of S. elongatus

Overall codon usage patterns of 2342 PCG in the genome of S. elongatus w ere analyzed (Table 4). All amino acids except two fold degenerate Phe, Glu, Asp and Lys used G or C ending codons most frequently whereas Phe used TTT, Glu used GAA, Asp used GAT and Lys used AAA most often. Rare codons were TTA, CTT and CTA for Leu, ATA for Ile, GTA for Val, ACA for Thr and GGA for Gly. Intermediate codons were found to be A or T ending predominantly except ACG for Thr, AAG for Lys, GAC for Asp, GAG for Glu, AGG for Arg and GGG for Gly. Among the 14 statistically over represented codons of genes in the extreme left of the axis 1, eight C (56.8 %) ending codons and six G (44.2 %) ending codons were present (Table 5). For 2342 PCG in S. elongatus g enome, E NC values varied from 39.80 to 56.65 with a mean and S.D of 51.29 and 2.14 respectively indicating marked variation in the codon usage of genes in the genome of S. elongatus. GC3 varied from 26.12% to 76.90% with a mean and S.D of 60.19% and 7.45% respectively, suggesting the major influence of GC compositional constraints in framing codon usage across genes in this genome.

Table 4 Overall codon usage in S. elongatus
Table 5 Putative optimal codons

II. Influence of GC composition on SCUO

a) Chromatophore genome of P. chromatophora

Overall GC content and local GC compositions (GC1, GC2, and GC3) of 786 PCG were estimated and plotted against corresponding SCUO (Figure 2). GC3 showed two horns (Figure 2d) whereas overall GC and other local GC compositions (GC1 and GC2) did not show any horns. The relationship between GC3 and SCUO was found to be linear (SCUO = −0.004 (GC3) + 0.324, r = −0.325, p < 0.001). It was also observed that GC2 content was significantly correlated with SCUO values (r = − 0.114, p < 0.001). These results suggested that GC3 was more important than GC, GC1, GC2 in shaping SCU bias. Thus, mutational bias has important role in SCU variation in chromatophore genome of P. chromatophora.

Figure 2
figure 2

Relationship between SCUO and GC composition in P. chromatophora . (a) Relationship between SCUO and the overall GC composition, (b) Relationship between SCUO and GC1, (c) Relationship between SCUO and GC2, (d) Relationship between SCUO and GC3. SCUO: Synonymous codon usage order.

b) Genome of S. elongatus

In the genome of S. elongatus, total GC content and GC compositions at three codon positions (GC1, GC2, and GC3) were calculated and plotted against corresponding SCUO (Figure 3). GC and GC3 showed two horns (Figures 3a and d). SCUO was positively correlated with GC (r =0.063, p < 0.01) and with GC3 (r = 0.308, p < 0.001), but negatively correlated with GC1 (r = −0.113, p < 0.001) and with GC2 (−0.08, p < 0.001), indicating the profound influence of GC1 and GC2 in SCU variations. In S. elongatus genome, relationship between SCUO and GC3 was found to be linear (SCUO = 0.001(GC3) + 0.052, r = 0.308, p < 0.001). It could be possible that GC3 has more influence in SCU variation than other local GC compositions as GC3 exhibited the highest correlation with SCUO. Hence, GC mutational pressure may be the key factor that shapes the SCU variation in S. elongatus genome.

Figure 3
figure 3

Relationship between SCUO and GC composition in S. elongatus. (a) Relationship between SCUO and the overall GC composition, (b) Relationship between SCUO and GC1, (c) Relationship between SCUO and GC2, (d) Relationship between SCUO and GC3.

IV. ENC Vs GC3 plot

a) Chromatophore genome of P. chromatophora

ENC Vs GC3 plots are generally used for analyzing SCU patterns across genes as axes of this plot are independent of the data and displays intraspecific and interspecific SCU patterns (Wright 1990). If a particular gene is under GC3 compositional constraints, it lie on or just below the expected GC3 curve. If the SCU pattern of a gene is influenced by translational selection, then it lie considerably below the GC3 curve (Wright 1990). ENC values of 786 PCG were plotted against corresponding GC3 values (Figure 4a) and majority of the genes were clustered on the left side of the curve. Though some genes lie on or just below the expected GC3 curve, most of the genes were clustered below the curve. This indicated the influence of certain forces other than GC3 compositional constraints in shaping SCU patterns in chromatophore genome of P. chromatophora. Significant correlation observed between GC12 and GC3 (r = 0.207, p < 0.001) in neutrality plot (Figure 5a) has nullified the influence of selection in framing the codon usage pattern of chromatophore genes. Further, influence of GC3 mutational pressure on PCG was analyzed using PR2 bias plot (Figure 6a) and observed that synonymous A, T and G, C contents were used proportionally (y = 0.182x + 0.362, r = 0.236), confirming the role of GC3 biased mutational pressure in shaping the SCU across 786 PCG in the chromatophore genome of P. chromatophora.

Figure 4
figure 4

ENC Vs GC 3 plots. (a) ENC Vs GC3 plot of 768 PCG in P. chromatophora. (b) ENC vs GC3 plot of 2342 PCG in S. elongatus genome. ENC: Effective number of codons.

Figure 5
figure 5

Neutrality plots. (a) Neutrality plot of 768 PCG in P. chromatophora. (b) Neutrality plot of 2342 PCG in S. elongatus.

Figure 6
figure 6

PR2 bias plots. (a) PR2 bias plot of 768 PCG in P. chromatophora. (b) PR2 bias plot of 2342 PCG in S. elongatus genome.

b) Genome of S. elongatus

Majority of the genes were grouped considerably below the expected GC3 curve (Figure 4b), indicating the influence of some other forces other than GC compositional constraints. In neutrality plot (Figure 5b), GC12 was significantly correlated with GC3, indicating that selection has only weak role in SCU variation. The influence of GC3 on SCU variation was analyzed by PR2 bias plot (Figure 6b) and revealed that A, T and G, C contents were used proportionally (y = 0.127 + 0.350, r = 0.140), reflecting the GC3 compositional constraints in SCU variation across 2342 PCG in the S. elongatus genome.

V. Correspondence analysis (COA)

a) Chromatophore genome of P. chromatophora

Axis 1, axis 2, axis 3, axis 4 and axis 5 accounted for 7.31%, 5.15%, 4.43%, 4.32% and 3.89% of total variations respectively (Figure 7). No single major explanatory axis was identified for explaining the variations. Spearman’s rank correlation analysis between five axes of COA and various indices of codon usage revealed that all axes except axis 3 and 5 were in significant correlation with silent base contents (Table 6). For instance, axis 1 with A3, G3, C3, axis 2 with A3, T3, and axis 4 with A3, T3, C3, GC3. Strong negative correlation existed between axes 1 and 2 with A3, and axis 4 with T3 suggested the influence of compositional constraints in shaping codon usage of chromatophore genes. Complex correlations were observed among 59 synonymous codons and five axes of COA. Interestingly, Cys codons (TGT and TGC) were found to have the highest correlation with axis 2 (Table 7). Thus, Cys codons may have high influence in separating PCG along axis 2. Axes 1 and 4 shown significant negative correlation with ENC and CAI. Hence, it could be assumed that genes, distributed along axes 1 and 4 might be influenced by some amount of selection. Length of CDS was found to be in correlation only with axis 1. Since axis 1 did not account for much of the variations, length of CDS could not be considered as an important factor that frames SCU across genes. Aromaticity and protein gravy scores were not correlated with any one of the axes, indicating no influence in shaping codon usage patterns of chromatophore genes in the P. chromatophora.

Figure 7
figure 7

Correspondence analysis. Correspondence analysis on RSCU values of 768 PCG in the chromatophore genome of P. chromatophora.

Table 6 Spearmen’s rank correlation analysis between COA axes and codon usage indices
Table 7 Correlation analysis between COA axes and synonymous codons

b) Genome of S. elongatus

Axis 1, axis 2, axis 3, axis 4 and axis 5 accounted for 12.22%, 7.93%, 5.24%, 4.80% and 4.30% of total variations respectively (Figure 8). None of the axes was found to contribute majority of variation. All PCG were found to be separated into three clusters along axis 2. All C ending codons were found to have strong negative correlation with axis 2. Clusters were formed based on the RSCU value of each C ending codons. Correlation analysis was performed between various axes of COA and codon usage indices (Table 8). However, axes 1, 2, 3, and 4 were in significant negative correlation with GC3. Interestingly, axes 1, 2 and 3 were negatively correlated with length of CDS. Thus GC3 compositional constraints and length of CDS might be influencing the SCU patterns across genes in the S. elongatus genome. Among the silent base contents and various axes of COA, positive correlation existed between axis 1 with A3 and T3, axis 2 with A3, T3, and G3, axis 3 with A3 and T3, axis 4 with A3, T3 and C3 and axis 5 with A3. This suggested the influence of nucleotide compositional constraints in SCU variation in S. elongatus genome. ENC was positively correlated with axes 1, 2, and 3 whereas CAI was in positive correlation with axis 1, but negatively correlated with axis 3. Thus, weak selection might influence the SCU of genes in S. elongatus. Axes 2 and 3 were positively correlated with protein gravy score, but axis 4 was negatively correlated, indicating the possible influence of hydropathic character of protein in SCU variation across genes in S. elongatus genome.

Figure 8
figure 8

Correspondence analysis. Correspondence analysis on RSCU values of 2342 PCG in S. elongatus.

Table 8 Correlation analysis between COA axes and codon usage indices


Chromatophore genome of P. chromatophora has typical cyanobacterial characteristics (Yoon et al. 2006) as P. chromatophora was diverged as sister to free living α - cyanobacteria (Marin et al. 2007). It was proposed that photosynthetic endosymbionts of P. chromatophora were evolved from cyanobium clade (Marin et al. 2007) which is paradoxical to the previous finding that chromatophores were evolved from the marine clade, consisting Prochlorococcus and Synechococcus (Marin et al. 2005). However, no complete cyanobacterial genome was reported so far from freshwater α-cyanobacteria in the cyanobium clade to compare various factors that shape SCU variation in photosynthetic endosymbionts (chromatophores) of P. chromatophora and its presumed ancestor genome. In this context, SCU patterns and factors contributing diversification in the genomes of chromatophore and freshwater unicellular β – cyanobacterium S. elongatus (SELONG clade) (Marin et al. 2007) were studied. The present findings revealed that mutational pressure due to GC compositional constraints frame the SCU patterns in both genomes but with varying intensity. Factors influencing SCU variation in marine Prochlorococcus and Synechococcus (Yu et al. 2012) from the PS clade (Marin et al. 2007) revealed that mutational pressure plays important role in SCU variation of Prochlorococcus but for Synechococcus, selection dictates the SCU pattern. In the present study, ENC Vs GC3 plots of chromatophore genes and genes of freshwater S. elongatus showed that majority of genes were clustered on or just below the expected curve as observed in the ENC Vs GC3 plot of genes of Prochlorococcus genome (Yu et al. 2012). Whereas, only few genes of marine Synechococcus genome were lying on or just below the expected curve indicating the influence of some additional factors in framing codon usage patterns (Yu et al. 2012). Variation of factors influencing SCU patterns in fresh water Synechococcus sp. and marine Synechococcus sp. reveals that life pattern of organisms may diversify the factors contributing SCU variation even within the same genus, supported by the previous observation that evolution of microbe is very often influenced either by environment or by life style (Botzman and Margalit 2011;Paul et al. 2010).

Putative optimal codons, detected in chromatophore and S. elongatus genome are of great importance as they improve expression of heterologous genes in host cells (Wang et al. 2013). Equilibrium between neutral mutational pressure and natural selection is important in maintaining the heterogeneity of codon usage among species (Sueoka 1988) and if significant correlation exists between GC12 and GC3, it can be assumed that codon usage pattern is mainly framed by mutational pressure and if no such correlation exists, translational selection would be the major force. In the present study, neutrality plot revealed significant correlations between GC12 and GC3 of genes from chromatophore and genome of S. elongatus. Most of the 786 PCG of chromatophore and 2342 PCG of S. elongatus were grouped on the upper left of the neutrality plot. Slope of the regression line in both plots were not close to zero, indicating that influence of specific evolutionary pressure such as selection is weak. Thus, it can be proposed that mutational pressure is the key factor that shapes the codon usage pattern of both chromatophore and S. elongatus genome. Moreover, in PR2 bias plot of these two genomes, synonymous A, T and G, C contents were found to be used proportionally indicating the influence of GC compositional constraints. Interestingly, in the PS clade, significant correlation between GC12 and GC3 was found only in Prochlorococcus (Yu et al. 2012). Thus, we can assume that freshwater P. chromatophora genome and S. elongatus genome are more similar to Marine Prochlorococcus than Marine Synechococcus in terms of factors that diversify SCU patterns.

Relationship between SCUO and GC3 formed a ‘U’ shape with two horns in both genomes as reported in unicellular microorganisms (Wan et al. 2004) and it reveals the influence of GC3 over SCU bias. In chromatophore genome, three axes of COA were found to show higher correlation with silent base contents, confirming the influence of genome wide compositional constraints. However, axes 1 and 4 were highly correlated with codon usage indices that indicate the level of gene expression such as ENC and CAI. Since there were no major explanatory axes, correlation with these indices cannot be linked with the influence of selection. Hydropathic character of protein (gravy score) was correlated with axes 2, 3 and 4 in S. elongatus genome, suggesting that silent sites may be affected by hydropathy levels of protein whereas in chromatophore genome, gravy score did not show any correlation with any of the axes of COA. Correlation between length of CDS and axes 1, 3 and 4 in S. elongatus genome indicate the influence of length of CDS in SCU variation but no such correlation was existed in P. chromatophora. In S. elongatus genome, negative correlation existed between GC3 and first four axes of COA confirms the GC3 consequence on SCU pattern. Indices indicating the level of gene expression such as ENC and CAI were correlated significantly with first three axes of COA reflect the weak selection may take part in SCU variation of S. elongatus. Formation of three clusters of PCG along axis 2 in S. elongatus genome indicating a trend associated with RSCU value of C ending codons, but not observed in chromatophore genome. Whereas in chromatophore genome, TGT and TGC codons (encoding Cys) influence separation of PCG along axis 2. Influence of Cys codons in shaping SCU pattern was already reported in Lactococcus lactis (Gupta et al. 2004) and Rhizobium (Wang et al. 2013). However, these results suggested that genome wide compositional constraints influence the SCU patterns of both chromatophore genome and S. elongatus genome.

SCU patterns of chromatophore genome of P. chromatophora and S. elongatus may be closely associated with living habitats. The adapted habitat of P. chromatophora is a submerged vegetation in freshwater. Mud loving nature of this organism protects it from potential extrinsic mutagens like UV-B radiation and which in turn causes genome wide mutation as reported in Prochlorococcus (Partensky et al. 1999). Freshwater β – cyanobacterium S. elongatus PCC6301 is less adaptive to varying environments as it resides strictly in euphotic zones, relatively with low nutrient contents at mesophilic temperature (Waterbury et al. 1986) unlike marine Synechococcus which is more adaptive to grow in varying nutrient conditions and temperatures (Moore et al. 1998). To make marine Synechococcus more adaptive to environment, translational selection shapes the codon usage patterns (Yu et al. 2012) but mutational pressure frames codon usage in less adaptive fresh water S. elongatus. Closely related species, living in distinct environments may exhibit considerable genomic diversity (Paul et al. 2010) that lead to differences in factors behind diversification of SCU patterns. Mutational pressure was found to be the major factor, influencing SCU pattern across PCG in strictly thermophilic cyanobacterium Thermosynechococcus elongatus BP-1 (Prabha et al. 2012) which is less adaptive to other temperature ranges as growth of thermophiles is restricted to particular environment at specific temperature (Botzman and Margalit 2011). These reports support our finding that SCU pattern of P. chromatophora and S. elongatus is dictated by mutational pressure due to their less adaptation to varying environments.


SCU pattern of photosynthetic endosymbiont (chromatophore) and S. elongatus genome is dictated mainly by genome wide GC mutational pressure. Living habitats of P. chromatophora and S. elongatus may also be influencing the SCU variations across genes of both genomes. However, complete genome sequencing of α-cyanobacteria from cyanobium clade would help further to understand SCU pattern and factors contributing diversification of SCU in presumed ancestors of photosynthetic endosymbionts of P. chromatophora.


Gene sequences

Complete coding sequences (CDS) of chromatophore genome (Genbank: NC_011087.1) of P. chromatophora (Nowack et al. 2008) and genome (Genbank: AP008231) of S. elongatus (Sugita et al. 2007) were retrieved from NCBI and CYORF (Cyanobacterial gene annotation database) respectively. CDS integrity was confirmed by checking the presence of START codon at the beginning and STOP codon at the end of each codon without any internal stop codons. To minimize the sampling errors, CDS with more than 300 nucleotides were chosen for analysis (Zhou and Li 2009;Sablok et al. 2011). Duplicate sequences were identified and excluded from the data set. Thus, the final data set of chromatophore genome consists 786 coding sequences that contain 2, 61,350 codons and 7, 84,050 nucleotides, whereas final data set of genome of S. elongates contains 2342 coding sequences that contain 7, 74, 810 codons and 23, 24, 430 nucleotides.

Indices of codon usage

a) Relative synonymous codon usage (RSCU)

To infer the features of SCU variations across PCG in the chromatophore genome by not taking amino acid compositional constraints into account, the RSCU values of all PCG were estimated according to Sharp et al. (1986).

b) Effective number of codons (ENC)

ENC is an index that is widely used for measuring the extent of synonymous codon usage bias (Wright 1990). It can take values from 20 (only one codon is used for each of the 20 aminoacids) to 61 (when all synonymous codons are equally used). If the calculated ENC value is beyond 61 due to more even distribution of codon usage, it is adjusted to 61 (Wright 1990). Selection of preferred codons and mutational pressures may reduce the ENC values. The expected ENC under random codon usage is approximated as a function of GC3 and calculated according to Wright (1990).

c) Codon adaptation index (CAI)

Codon adaptation index (CAI) is a measure of bias towards preferred codons in a PCG by defining the translationally optimal codons that are mostly represented in a reference set of highly expressed genes (Sharp and Li 1987). CAI value ranges from zero to one. Higher value indicates increased bias towards preferred codons. For this study, we used ribosomal protein coding genes as reference for estimating CAI values on the basis of equation, developed by Sharp and Li (1987).

d) Synonymous codon usage order (SCUO)

Synonymous codon usage order measurement was used to analyze the influence of GC composition at various codon positions on SCU. SCUO was computed using the following equation (Wan et al. 2004),

SCUO = 1 + p 2 log 2 p 2 + 1 p 2 log 2 1 p 2
p = G C 3

Sequence analysis

Nucleotide contents of all PCG were calculated using MEGA version 5.1 (Tamura et al. 2011). ENC values and CAI were calculated for all PCG by using online CodonW ( and CAI calculator 2 (Wu et al. 2005). SCUO was computed using standalone CodonO (Wan et al. 2004).

Correspondence analysis (COA)

COA is a multivariate statistical method used to identify major factors, shaping SCU patterns across genes and plot genes according to various influencing factors of SCU (Perriere and Thioulouse 2002). Multivariate statistical analysis method was often employed to plot PCGs according to RSCU values of the 59 synonymous codons (excluding 3 stop codons, Trp and Met codons) (RoyChoudhury and Mukherjee 2010). COA develops a series of orthogonal axes to define the major factors that frame the SCU patterns in accordance with the variation of data. In this study, complete coding regions of each PCG were represented as a 59 dimensional vector (excluding Met, Trp and stop codons). Each dimension corresponds to RSCU value of one sense codon (Mardia et al. 1979).

Statistical analysis

All correlations were made using Spearman’s rank correlation method as this measure of correlation does not require any distributional assumptions of the underlying data (Zhou and Li 2009). A Chi - square test involving 2 × 2 table was employed for 5% of genes distributed at extreme left and 5% of genes distributed at extreme right of axis 1 of COA to find out putative optimal codons. For each of 59 sense codons, First row contains the observed frequency of a codon and the second row contains total number of synonymous alternatives of that particular codon. The significance was calculated at the 5% level with one degree of freedom. All these analyses were done using Past version 2.12 (Hammer et al. 2001).



Protein coding genes


Relative synonymous codon usage


Synonymous codon usage


Effective number of codon usage


Codon adaptation index


Synonymous codon usage order


Parity rule 2


GC content at 3rd codon position


GC content at first and second position


Correspondence analysis.


  • Agashe D, Gomez NCM, Drummond DA, Marx CJ: Good codons, bad transcript: large reductions in gene expression and fitness arising from synonymous mutations in a key enzyme. Mol Biol Evol 2013, 30: 549-560. 10.1093/molbev/mss273 10.1093/molbev/mss273

    Article  Google Scholar 

  • Akashi H: Inferring weak selection from patterns of polymorphism and divergence at silent sites in Drosophila DNA. Genetics 1995, 139: 1067-1076.

    Google Scholar 

  • Akashi H: Gene expression and molecular evolution. Curr Opin Genet Dev 2001, 11: 660-666. 10.1016/S0959-437X(00)00250-1

    Article  Google Scholar 

  • Biro JC: Does codon bias have an evolutionary origin? Theor Biol Med Mode 2008, 5: 1-15. 10.1186/1742-4682-5-1

    Article  Google Scholar 

  • Bodyl A, Mackiewicz P, Stiller JW: The intracellular cyanobacteria of Paulinella chromatophora : endosymbionts or organelles? Trends Microbiol 2007, 15: 295-296. 10.1016/j.tim.2007.05.002

    Article  Google Scholar 

  • Botzman M, Margalit H: Variation in global codon usage bias among prokaryotic organisms is associated with their lifestyles. Genome Biol 2011. 10.1186/gb-2011-12-10-r109

    Google Scholar 

  • Bulmer M: The selection-mutation-drift theory of synonymous codon usage. Genetics 1991, 129: 897-907.

    Google Scholar 

  • Carlini DB, Chen Y, Stephan W: The relationship between third-codon position nucleotide content, codon bias, mRNA secondary structure and gene expression in the drosophilid alcohol dehydrogenase genes Adh and Adhr . Genetics 2001, 159: 623-633.

    Google Scholar 

  • Chan CX, Gross J, Yoon HS, Bhattacharya D: Plastid origin and evolution: new models provide insights into old problems. Plant Physiol 2011, 155: 1552-1560. 10.1104/pp.111.173500

    Article  Google Scholar 

  • Deschavanne I, Filipski J: Correlation of GC content with replication timing and repair mechanisms in weakly expressed E . coli genes. Nucleic Acids Res 1995, 23: 1350-1353. 10.1093/nar/23.8.1350

    Article  Google Scholar 

  • Drummond DA, Wilke CO: Mistranslation - induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 2008, 134: 341-352. 10.1016/j.cell.2008.05.042

    Article  Google Scholar 

  • Ermolaeva MD: Synonymous codon usage in bacteria. Curr Issues Mol Biol 2001, 3: 91-97.

    Google Scholar 

  • Grantham R, Gautier C, Gouy M, Mercier R, Pave A: Codon catalog usage and the genome hypothesis. Nucleic Acids Res 1980, 8: 49-62.

    Google Scholar 

  • Gupta SK, Bhattacharyya TK, Ghosh TC: Synonymous codon usage in Lactococcus lactis : mutational bias versus translational selection. J Biomol Struct Dyn 2004, 21: 527-536. 10.1080/07391102.2004.10506946

    Article  Google Scholar 

  • Hammer Q, Harper DAT, Ryan PD: PAST: paleontological statistics software package for education and data analysis. Palaeontol Electron 2001, 4: 1-9.

    Google Scholar 

  • Hershberg R, Petrov DA: Selection on codon bias. Annu Rev Genet 2008, 42: 287-299. 10.1146/annurev.genet.42.110807.091442

    Article  Google Scholar 

  • Ikemura T: Correlation between the abundance of yeast tRNAs and the occurrence of the respective codons in its protein genes. J Mol Biol 1982, 158: 573-579. 10.1016/0022-2836(82)90250-9

    Article  Google Scholar 

  • Ikemura T: Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol 1985, 2: 13-34.

    Google Scholar 

  • Irwin B, Heck JD, Hatfield GW: Codon pair utilization the biases influence translational elongation step times. J Biol Chem 1995, 270: 22801-22806. 10.1074/jbc.270.39.22801

    Article  Google Scholar 

  • Karlin S, Mrazek J, Campbell AM: Codon usages in different gene classes of the Escherichia coli genome. Mol Microbiol 1998, 29: 1341-1355. 10.1046/j.1365-2958.1998.01008.x

    Article  Google Scholar 

  • Kurland CG: Major codon preference theme and variations. Biochem Soc T 1993, 21: 841-846.

    Article  Google Scholar 

  • Mardia KV, Kent JT, Bibby JM: Multivariate analysis. New York: Academic; 1979.

    Google Scholar 

  • Marin B, Nowack ECM, Melkonian M: A plastid in the making: evidence for a secondary primary endosymbiosis. Protist 2005, 156: 425-432. 10.1016/j.protis.2005.09.001

    Article  Google Scholar 

  • Marin B, Nowack ECM, Glockner G, Melkonian M: The ancestor of the Paulinella chromatophore obtained a carboxysomal operon by horizontal gene transfer from a Nitrococcus -like γ-proteobacterium. BMC Evol Biol 2007. 10.1186/1471-2148-7-85

    Google Scholar 

  • Moore LR, Rocap G, Chisholm SW: Physiology and molecular phylogeny of coexisting Prochlorococcus ecotypes. Nature 1998, 393: 464-467. 10.1038/30965

    Article  Google Scholar 

  • Moriyama EN, Powell JR: Codon usage bias and tRNA abundance in Drosophila . J Mol Evol 1997, 45: 514-523. 10.1007/PL00006256

    Article  Google Scholar 

  • Morton BR: Chloroplast DNA codon use: evidence for selection at the psb A locus based on tRNA availability. J Mol Evol 1993, 37: 273-280.

    Article  Google Scholar 

  • Morton BR: Rates of synonymous substitution do not indicate selective constraints on the codon use of the plant psb A gene. Mol Biol Evol 1997, 14: 412-419. 10.1093/oxfordjournals.molbev.a025777

    Article  Google Scholar 

  • Morton BR: Selection on the codon bias of chloroplast and cyanelle genes in different plant and algal lineages. J Mol Evol 1998, 46: 449-459. 10.1007/PL00006325

    Article  Google Scholar 

  • Nair RR, Nandhini MB, Monalisha E, Murugan K, Sethuraman T, Nagarajan S, Rao NSP, Ganesh D: Synonymous codon usage in chloroplast genome of Coffea arabica . Bioinformation 2012, 8: 1096-1104. 10.6026/97320630081096

    Article  Google Scholar 

  • Nowack ECM, Melkonian M, Glockner G: Chromatophore genome sequence of Paulinella sheds light on acquisition of photosynthesis by eukaryotes. Curr Biol 2008, 18: 410-418. 10.1016/j.cub.2008.02.051

    Article  Google Scholar 

  • Parmley JL, Hurst LD: How do synonymous mutations affect fitness? Bioessays 2007, 29: 515-519. 10.1002/bies.20592

    Article  Google Scholar 

  • Partensky F, Hess WR, Vaulot D: Prochlorococcus , a marine photosynthetic prokaryote of global significance. Microbiol Mol Biol Rev 1999, 63: 106-127.

    Google Scholar 

  • Paul S, Dutta A, Bag SK, Das S, Dutta C: Distinct, ecotype-specific genome and proteome signatures in the marine cyanobacteria Prochlorococcus . BMC Genomics 2010. 10.1186/1471-2164-11-103

    Google Scholar 

  • Perriere G, Thioulouse J: Use and misuse of correspondence analysis in codon usage studies. Nucleic Acids Res 2002, 30: 4548-4555. 10.1093/nar/gkf565

    Article  Google Scholar 

  • Plotkin JB, Kudla G: Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet 2011, 12: 32-42. 10.1038/nrg2899

    Article  Google Scholar 

  • Prabha R, Singh DP, Gupta SK, Farooqi S, Rai A: Synonymous codon usage in Thermosynechococcus elongatus (cyanobacteria) identifies the factors shaping codon usage variation. Bioinformation 2012, 8: 622-628. 10.6026/97320630008622

    Article  Google Scholar 

  • RoyChoudhury S, Mukherjee D: A detailed comparative analysis on the overall codon usage pattern in herpesviruses. Virus res 2010, 148: 31-43. 10.1016/j.virusres.2009.11.018

    Article  Google Scholar 

  • Sablok G, Nayak KC, Vazquez F, Tatarinova TV: Synonymous codon usage, GC3, and evolutionary patterns across plastomes of three pooid model species: emerging grass genome models for monocots. Mol Biotechnol 2011, 49: 116-128. 10.1007/s12033-011-9383-9

    Article  Google Scholar 

  • Sarmer WT, Sullivan DT: A shift in the third-codon-position nucleotide frequency in alcohol dehydrogenase genes in the genus Drosophila . Mol Biol Evol 1989, 6: 546-552.

    Google Scholar 

  • Selva KC, Nair RR, Sivaramakrishnan KG, Ganesh D, Janarthanan S, Arunachalam M, Sivaruban T: Influence of certain forces on evolution of synonymous codon usage bias in certain species of three basal orders of aquatic insects. Mitochondr DNA 2012, 23: 447-460. 10.3109/19401736.2012.710203

    Article  Google Scholar 

  • Sharp PM, Cowe E: Synonymous codon usage in Saccharomyces cerevisiae . Yeast 1991, 7: 657-678. 10.1002/yea.320070702

    Article  Google Scholar 

  • Sharp PM, Li WH: The codon adaptation index - a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acid Res 1987, 15: 1281-1295. 10.1093/nar/15.3.1281

    Article  Google Scholar 

  • Sharp PM, Tuohy TMF, Mosurski KR: Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res 1986, 14: 8207-8211.

    Google Scholar 

  • Sharp PM, Averof M, Lloyd AT, Matassi G, Peden JF: DNA sequence evolution: the sounds of silence. Phil Trans R Soc B 1995, 349: 241-247. 10.1098/rstb.1995.0108

    Article  Google Scholar 

  • Smith MJ, Smith NH: Site - specific codon bias in bacteria. Genetics 1996, 142: 1037-1043.

    Google Scholar 

  • Sueoka N: Directional mutation pressure and neutral molecular evolution. Proc Natl Acad Sci USA 1988, 85: 2653-2657. 10.1073/pnas.85.8.2653

    Article  Google Scholar 

  • Sugita C, Ogata K, Shikata M, Jikuya H, Takano J, Furumichi M, Kanehisa M, Omata T, Sugiura M, Sugita M: Complete nucleotide sequence of the freshwater unicellular cyanobacterium Synechococcus elongatus PCC 6301 chromosome: gene content and organization. Photosynth Res 2007, 93: 55-67. 10.1007/s11120-006-9122-4

    Article  Google Scholar 

  • Tamura K, Peterson D, Peterson N, Steker G, Nei M, Kumar S: MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 2011, 28: 2731-2739. 10.1093/molbev/msr121

    Article  Google Scholar 

  • Wall DP, Herback JT: Evolutionary patterns of codon usage in the chloroplast gene rbc L. J Mol Evol 2003, 56: 673-688. 10.1007/s00239-002-2436-8

    Article  Google Scholar 

  • Wan XF, Xu D, Kleinhofs A, Zhou J: Quantitative relationship between synonymous codon usage bias and GC composition across unicellular genomes. BMC Evol Biol 2004. 10.1186/1471-2148-4-19

    Google Scholar 

  • Wang X, Zhu S, Zhao L, Wu L, An W, Zhou P, Chen Y: Analysis of synonymous codon usage patterns in the genus Rhizobium . World J Microb Biot 2013. 10.1007/s11274-013-1364-7

    Google Scholar 

  • Waterbury JB, Watson SW, Valois FW, Franks DG: Biological and ecological characterization of the marine unicellular cyanobacterium Synechococcus . Can B Fish Aquat Sci 1986, 214: 71-120.

    Google Scholar 

  • Wright F: The “effective number of codons” used in a gene. Gene 1990, 87: 23-29. 10.1016/0378-1119(90)90491-9

    Article  Google Scholar 

  • Wu G, Culley DE, Zhang W: Predicted highly expressed genes in the genomes of Streptomyces coelicolor and Streptomyces avermitilis and the implications for their metabolism. Microbiol 2005, 151: 2175-2187. 10.1099/mic.0.27833-0

    Article  Google Scholar 

  • Yang Z, Nielsen R: Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Mol Biol Evol 2008, 25: 568-579. 10.1093/molbev/msm284

    Article  Google Scholar 

  • Yoon HS, Prieto A, Melkonian M, Bhattacharya D: Minimal plastid genome evolution in the Paulinella endosymbionts. Curr Biol 2006, 16: 670-672. 10.1016/j.cub.2006.08.018

    Article  Google Scholar 

  • Yu T, Li J, Yang Y, Qi L, Chen B, Zhao F, Bao Q, Wu J: Codon usage patterns and adaptive evolution of marine unicellular cyanobacteria Synechococcus and Prochlorococcus . Mol Phylogenet Evol 2012, 62: 206-213. 10.1016/j.ympev.2011.09.013

    Article  Google Scholar 

  • Zhou M, Li X: Analysis of synonymous codon usage patterns in different plant mitochondrial genomes. Mol Biol Rep 2009, 36: 2039-2046. 10.1007/s11033-008-9414-1

    Article  Google Scholar 

Download references


All the authors wish to acknowledge Department of Science and Technology (Promotion of University Research and Scientific Excellence), Government of India, New Delhi, India for financial assistance. Part of the financial assistance was provided by University Grants Commission, New Delhi under the project ‘Establishment of genetic identity for Indian coffee germplasm using chloroplast genome sequences’ (F.No.41-583/2012 (SR).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ganesh Doss.

Additional information

Competing interests

Authors declare that they have no competing interest.

Authors’ contributions

RRN and GD conceptualized the study. RRN, TS and MBN performed data analyses. RRN and GD were involved in drafting the manuscript. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Nair, R.R., Nandhini, M.B., Sethuraman, T. et al. Mutational pressure dictates synonymous codon usage in freshwater unicellular α - cyanobacterial descendant Paulinella chromatophora and β - cyanobacterium Synechococcus elongatus PCC6301. SpringerPlus 2, 492 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Paulinella chromatophora
  • Synechococcus elongatus
  • Synonymous codon usage
  • Mutational pressure
  • Chromatophore