Mutational pressure dictates synonymous codon usage in freshwater unicellular α - cyanobacterial descendant Paulinella chromatophora and β - cyanobacterium Synechococcus elongatus PCC6301

Background Comparative study of synonymous codon usage variations and factors influencing its diversification in α - cyanobacterial descendant Paulinella chromatophora and β - cyanobacterium Synechococcus elongatus PCC6301 has not been reported so far. In the present study, we investigated various factors associated with synonymous codon usage in the genomes of P. chromatophora and S. elongatus PCC6301 and findings were discussed. Results Mutational pressure was identified as the major force behind codon usage variation in both genomes. However, correspondence analysis revealed that intensity of mutational pressure was higher in S. elongatus than in P. chromatophora. Living habitats were also found to determine synonymous codon usage variations across the genomes of P. chromatophora and S. elongatus. Conclusions Whole genome sequencing of α-cyanobacteria in the cyanobium clade would certainly facilitate the understanding of synonymous codon usage patterns and factors contributing its diversification in presumed ancestors of photosynthetic endosymbionts of P. chromatophora.


Background
Nucleotide triplet codons, differing only at the third site or rarely at second site but encoding same amino acid are termed as synonymous codons (Ermolaeva 2001). Synonymous mutations do not alter amino acid sequences, but usage of synonymous codons is not at uniform frequencies both within and between organisms, resulting in species specific codon usage bias (Grantham et al. 1980;Sharp et al. 1995). Synonymous codon usage (SCU) bias favours the usage of specific subset of certain codons (preferred codons) within each amino acid family (Agashe et al. 2013). Weak selection of preferred codons has been recognized as an important evolutionary force (Carlini et al. 2001) as SCU bias affects overall fitness of a cell by influencing the level of gene expression and various cellular processes such as RNA processing, translation of protein and protein folding (Parmley and Hurst 2007;Hershberg and Petrov 2008;Plotkin and Kudla 2011). Functional integrity of the genetic code is maintained by synonymous codons (Biro 2008). Population genetic studies reveal that evolution of biased codon usage is mainly either due to genome wide AT/GC biased mutational pressure or due to weak selection acting on specific subset of codons (preferred codons) (Bulmer 1991;Yang and Nielson 2008;Agashe et al. 2013). Other major factors include interaction between codons and anticodons (Kurland 1993), sitespecific codon biases (Smith and Smith 1996), efficacy of replication (Deschavanne and Filipski 1995), usage of codon pairs (Irwin et al. 1995) and evolutionary time scale (Karlin et al. 1998).
Forces that influence evolution of SCU bias in various taxa has been extensively analyzed in various organisms (Ikemura 1982;Moriyama and Powell 1997;Nair et al. 2012;Seva et al. 2012;Sharp and Cowe 1991) as SCU bias has high significance in estimating evolutionary rates and phylogenetic reconstruction (Sarmer and Sullivan 1989;Wall and Herback 2003). Previous studies revealed that biased codon usage is stronger in highly expressed genes as selection pressure may be acting on those genes (Ikemura 1985). However, strength of selection appears to be varying among evolutionarily conserved amino acid residues that exhibit stronger bias. In contrast, evolutionarily variable residues often exhibit less or weaker bias (Akashi 1995;Drummond and Wilke 2008). Mutational pressure is another important factor, shaping SCU variations (Plotkin and Kudla 2011;Akashi 2001). Life style of prokaryotic organisms also play important role in SCU variations (Botzman and Margalit 2011). However, role of physiological processes in framing evolution of biased codon usage is yet to be unravelled (Agashe et al. 2013).
Endosymbiotic associations have significant impacts on cellular evolution and diversity (Bodyl et al. 2007). Extensive research on plastid genomes unravelled that a single primary endosymbiotic event in which a cyanobacteria was acquired by a unicellular eukaryote led to the evolution of plastids (Nowack et al. 2008). In endosymbiosis research, Paulinella chromatophora, a filose thecamoeba has been regarded as an outstanding model for primary plastid origin as P. chromatophora is the only known case of independent primary cyanobacterial acquisition (Chan et al. 2011;Marin et al. 2005;Yoon et al. 2006). Sequencing of chromatophore genome revealed the acquisition of photosynthesis by eukaryotes (Nowack et al. 2008). Chromatophores of P. chromatophora are monophyletic with α -cyanobacteria (Cyanobium clade) (Marin et al. 2007) unlike plastids that were evolved from βcyanobacterial ancestor (Nowack et al. 2008).
SCU bias in various primary endosymbionts and plastid genomes were extensively studied Morton 1993Morton , 1997Morton , 1998Sablok et al. 2011). Various factors that frame SCU variations in phylogenetically close marine Prochlorococcus and Synechococcus clades in the PS clade (Prochlorococcus/Synechococcus) (Marin et al. 2007) were studied and found that SCU pattern of Proclorococcus was shaped by mutational pressure and nucleotide compositional constraints whereas in marine Synechococcus, translational selection determine the SCU pattern (Yu et al. 2012). However, no complete cyanobacterial genome has been reported from the Cyanobium clade (third major lineage of PS clade) so far ( Figure 1). Hence, comparison of factors that frame SCU in chromatophore genome and its presumed ancestor could not be done. Since habitat of microorganisms play crucial role in SCU variation across genes (Botzman and Margalit 2011), unicellular freshwater β -cyanobacterium Synechococcus elongatus PCC6301 (SELONG clade) (Marin et al. 2007) was selected for comparing the SCU patterns and also to elicit the factors determining the SCU variations in evolutionarily young (P. chromarophora) and evolutionarily old (S. elongatus) genomes.

Results
I. Compositional properties a) Chromatophore genome of P. chromatophora Comparison of total A, T, G, C contents in the genome of P. Chromatophora revealed higher content of A and T than G and C. Analysis of A 3 , T 3 , G 3 , C 3 contents revealed that T 3 content was highest and C 3 , the lowest of all with mean and S.D of 39.14% and 4.06% for T 3 and 12.94% and 3.67% for C 3 . GC 3 ranged from 16.15% to 54.38% with a mean and S.D of 27.40% and 4.69% respectively. Correlation analysis between total nucleotide contents and silent base contents revealed the stronger negative correlations between A 3 and GC (Table 1). Similarly, high negative correlation was found between A and GC 3 (Table 1). This suggests that A and GC contents play important role in SCU bias in the chromatophore genome. High positive correlation between C and G 3 also might have profound effect in framing SCU patterns. However, no correlations were found between G and T 3, and also for T and G 3, suggesting no influence of individual T and G contents in codon usage bias. Since A 3 content was in strong negative correlation with all total nucleotide contents (Table 1), it can be inferred that A 3 content play an important role in shaping SCU patterns across 786 PCG in the chromatophore genome.

b) Genome of S. elongatus
Contrary to the observations with P. chromatophora, G and C contents were higher than A and T contents in the genome of S. elongatus. G 3 and C 3 contents were significantly higher than A 3 and T 3 contents. Among the silent base contents, C 3 was highest and A 3 , the lowest of all with mean and S.D of 31.12% and 6.02% for C 3 , and 16.43% and 4.12% for A 3 . GC 3 varied from 26. 12% to 76.90% with a mean and S.D of 60.19% and 7.45% respectively. Correlation analysis between total A, T, G, C contents and A 3 , T 3 , G 3 , C 3 contents revealed that A 3 was negatively correlated to G, C and GC. Similarly, T 3 was in high negative correlation with G, C, GC 3. GC composition at silent site was found negatively correlated with both A and T contents (Table 2). Hence, all silent base contents viz., A 3 , T 3 , G 3 and C 3 might be influencing SCU variations of protein coding genes (PCG) of S. elongatus.
II. Characteristics of relative synonymous codon usage a) Chromatophore genome of P. chromatophora Overall codon usage patterns of 786 PCG in the chromatophore genome of P. chromatophora were analyzed (Table 3). All the amino acids were found to use A and T ending codons most frequently (codons with RSCU value greater than one) as chromatophore genome is rich in AT than GC. All C ending codons except AGC codes for Ser and CGC codes for Arg and all G ending codons except TTG for Leu were found rare (RSCU values less than 0.66). CTA codes for Leu was the only intermediate codon (RSCU value falls between 0.66 and 1) among the A ending codons. Among the 786 PCG in the chromatophore genome of P. chromatophora, ENC values ranged from 33.43 to 61 with a mean and S.D of 47.57 and 3.77 respectively, indicating considerable variation in codon usage among the genes of this organism. GC 3 values ranged from 16.2% to 54.40% with mean and S.D of 27.40% and 4.69% respectively. Chi-square analysis of codon count revealed that 5% of the genes were placed on either side of axis 1, revealing 16 codons were statistically over represented (putative optimal codons) in genes located on the extreme left of the axis 1. Among these codons, ten A ending codons and six T ending codons were found to represent 62.5% A ending codons and 37.5% T ending codons. It is interesting to note that most of the over represented T ending codons were found in 2 codon families except for Glu in which CAA was over represented statistically. These result suggested that some other factors apart from compositional constraints might be influencing the codon usage in this organism.

b) Genome of S. elongatus
Overall codon usage patterns of 2342 PCG in the genome of S. elongatus were analyzed (Table 4). All amino acids except two fold degenerate Phe, Glu, Asp and Lys used G or C ending codons most frequently whereas Phe used TTT, Glu used GAA, Asp used GAT and Lys used AAA most often. Rare codons were TTA, CTT and CTA for Leu, ATA for Ile, GTA for Val, ACA for Thr and GGA for Gly. Intermediate codons were found to be A or T ending predominantly except ACG for Thr, AAG for Lys, GAC for Asp, GAG for Glu, AGG for Arg and GGG for Gly. Among the 14 statistically over represented codons of genes in the extreme left of the axis 1, Figure 1 Diagrammatic representation of three clades in the Prochlorococcus/Synechococcus clade. SCU variation in marine Synechococcus is shaped by selection but in marine Prochlorococcus, mutational pressure shapes the SCU pattern. SCU: Synonymous codon usage.  II. Influence of GC composition on SCUO a) Chromatophore genome of P. chromatophora Overall GC content and local GC compositions (GC 1, GC 2, and GC 3) of 786 PCG were estimated and plotted against corresponding SCUO ( Figure 2). GC 3 showed two horns ( Figure 2d) whereas overall GC and other local GC compositions (GC 1 and GC 2) did not show any horns. The relationship between GC 3 and SCUO was found to be linear (SCUO = −0.004 (GC 3) + 0.324, r = −0.325, p < 0.001). It was also observed that GC 2 content was significantly correlated with SCUO values (r = − 0.114, p <  0.001). These results suggested that GC 3 was more important than GC, GC 1, GC 2 in shaping SCU bias. Thus, mutational bias has important role in SCU variation in chromatophore genome of P. chromatophora.

b) Genome of S. elongatus
In the genome of S. elongatus, total GC content and GC compositions at three codon positions (GC 1, GC 2, and GC 3) were calculated and plotted against corresponding SCUO ( Figure 3). GC and GC 3 showed two horns (Figures 3a and d). SCUO was positively correlated with GC (r =0.063, p < 0.01) and with GC 3 (r = 0.308, p < 0.001), but negatively correlated with GC 1 (r = −0.113, p < 0.001) and with GC 2 (−0.08, p < 0.001), indicating the profound influence of GC 1 and GC 2 in SCU variations. In S. elongatus genome, relationship between SCUO and GC 3 was found to be linear (SCUO = 0.001(GC 3) + 0.052, r = 0.308, p < 0.001). It could be possible that GC 3 has   more influence in SCU variation than other local GC compositions as GC 3 exhibited the highest correlation with SCUO. Hence, GC mutational pressure may be the key factor that shapes the SCU variation in S. elongatus genome.
IV. ENC Vs GC 3 plot a) Chromatophore genome of P. chromatophora ENC Vs GC 3 plots are generally used for analyzing SCU patterns across genes as axes of this plot are independent of the data and displays intraspecific and interspecific SCU patterns (Wright 1990). If a particular gene is under GC 3 compositional constraints, it lie on or just below the expected GC 3 curve. If the SCU pattern of a gene is influenced by translational selection, then it lie considerably below the GC 3 curve (Wright 1990). ENC values of 786 PCG were plotted against corresponding GC 3 values ( Figure 4a) and majority of the genes were clustered on the left side of the curve. Though some genes lie on or just below the expected GC 3 curve, most of the genes were clustered below the curve. This indicated the influence of certain forces other than GC 3 compositional constraints in shaping SCU patterns in chromatophore genome of P. chromatophora. Significant correlation observed between GC 12 and GC 3 (r = 0.207, p < 0.001) in neutrality plot (Figure 5a) has nullified the influence of selection in framing the codon usage pattern of chromatophore genes. Further, influence of GC 3 mutational pressure on PCG was analyzed using PR2 bias plot ( Figure 6a) and observed that synonymous A, T and G, C contents were used proportionally (y = 0.182x + 0.362, r = 0.236), confirming the role of GC 3 biased mutational pressure in shaping the SCU across 786 PCG in the chromatophore genome of P. chromatophora.

b) Genome of S. elongatus
Majority of the genes were grouped considerably below the expected GC 3 curve (Figure 4b), indicating the influence of some other forces other than GC compositional constraints. In neutrality plot (Figure 5b), GC 12 was significantly correlated with GC 3, indicating that selection has only weak role in SCU variation. The influence of GC 3 on SCU variation was analyzed by PR2 bias plot ( Figure 6b) and revealed that A, T and G, C contents were used proportionally (y = 0.127 + 0.350, r = 0.140), reflecting the GC 3 compositional constraints in SCU variation across 2342 PCG in the S. elongatus genome.
V. Correspondence analysis (COA) a) Chromatophore genome of P. chromatophora Axis 1, axis 2, axis 3, axis 4 and axis 5 accounted for 7.31%, 5.15%, 4.43%, 4.32% and 3.89% of total variations respectively (Figure 7). No single major explanatory axis was identified for explaining the variations. Spearman's rank correlation analysis between five axes of COA and various indices of codon usage revealed that all axes except axis 3 and 5 were in significant correlation with silent base contents (Table 6). For instance, axis 1 with A 3, G 3, C 3, axis 2 with A 3, T 3, and axis 4 with A 3, T 3, C 3, GC 3. Strong negative correlation existed between axes 1 and 2 with A 3, and axis 4 with T 3 suggested the influence of compositional constraints in shaping codon usage of chromatophore genes. Complex correlations were observed among 59 synonymous codons and five axes of COA. Interestingly, Cys codons (TGT and TGC) were found to have the highest correlation with axis 2 (Table 7). Thus, Cys codons may have high influence in separating PCG along axis 2. Axes 1 and 4 shown significant negative correlation with ENC and CAI. Hence, it could be assumed that genes, distributed along axes 1 and 4 might be influenced by some amount of selection. Length of CDS was found to be in correlation only with axis 1. Since axis 1 did not account for much of the variations, length of CDS could not be considered as an important factor that frames SCU across genes. Aromaticity and protein gravy scores were not correlated with any one of the axes, indicating no influence in shaping codon usage patterns of chromatophore genes in the P. chromatophora.

b) Genome of S. elongatus
Axis 1, axis 2, axis 3, axis 4 and axis 5 accounted for 12.22%, 7.93%, 5.24%, 4.80% and 4.30% of total variations respectively ( Figure 8). None of the axes was found to contribute majority of variation. All PCG were found to be separated into three clusters along axis 2. All C ending codons were found to have strong negative correlation with axis 2. Clusters were formed based on the RSCU value of each C ending codons. Correlation analysis was performed between various axes of COA and codon usage indices (Table 8). However, axes 1, 2, 3, and 4 were in significant negative correlation with GC 3. Interestingly, axes 1, 2 and 3 were negatively correlated with length of CDS. Thus GC 3 compositional constraints and length of CDS might be influencing the SCU patterns across genes in the S. elongatus genome. Among the silent base contents and various axes of COA, positive correlation existed between axis 1 with A 3 and T 3, axis 2 with A 3, T 3, and G 3, axis 3 with A 3 and T 3, axis 4 with A 3, T 3 and C 3 and axis 5 with A 3. This suggested the influence of nucleotide compositional constraints in SCU variation in S. elongatus genome. ENC was positively correlated with axes 1, 2, and 3 whereas CAI was in positive correlation with axis 1, but negatively correlated with axis 3. Thus, weak selection might influence the SCU of genes in S. elongatus. Axes 2 and 3 were positively correlated with protein gravy score, but axis 4 was negatively correlated, indicating the possible influence of hydropathic character of protein in SCU variation across genes in S. elongatus genome.

Discussion
Chromatophore genome of P. chromatophora has typical cyanobacterial characteristics (Yoon et al. 2006) as P. chromatophora was diverged as sister to free living αcyanobacteria (Marin et al. 2007). It was proposed that photosynthetic endosymbionts of P. chromatophora were evolved from cyanobium clade (Marin et al. 2007) which is paradoxical to the previous finding that chromatophores were evolved from the marine clade, consisting Prochlorococcus and Synechococcus (Marin et al. 2005). However, no complete cyanobacterial genome was      reported so far from freshwater α-cyanobacteria in the cyanobium clade to compare various factors that shape SCU variation in photosynthetic endosymbionts (chromatophores) of P. chromatophora and its presumed ancestor genome. In this context, SCU patterns and factors contributing diversification in the genomes of chromatophore and freshwater unicellular βcyanobacterium S. elongatus (SELONG clade) (Marin et al. 2007) were studied. The present findings revealed that mutational pressure due to GC compositional constraints frame the SCU patterns in both genomes but with varying intensity. Factors influencing SCU variation in marine Prochlorococcus and Synechococcus (Yu et al. 2012) from the PS clade (Marin et al. 2007) revealed that mutational pressure plays important role in SCU variation of Prochlorococcus but for Synechococcus, selection dictates the SCU pattern.
In the present study, ENC Vs GC 3 plots of chromatophore genes and genes of freshwater S. elongatus showed that majority of genes were clustered on or just below the expected curve as observed in the ENC Vs GC 3 plot of genes of Prochlorococcus genome (Yu et al. 2012).
Whereas, only few genes of marine Synechococcus genome were lying on or just below the expected curve indicating the influence of some additional factors in framing codon usage patterns (Yu et al. 2012). Variation of factors influencing SCU patterns in fresh water Synechococcus sp. and marine Synechococcus sp. reveals that life pattern of organisms may diversify the factors contributing SCU variation even within the same genus, supported by the previous observation that evolution of microbe is very often influenced either by environment or by life style (Botzman and Margalit 2011;Paul et al. 2010).
Putative optimal codons, detected in chromatophore and S. elongatus genome are of great importance as they improve expression of heterologous genes in host cells (Wang et al. 2013). Equilibrium between neutral mutational pressure and natural selection is important in maintaining the heterogeneity of codon usage among species (Sueoka 1988) and if significant correlation exists between GC 12 and GC 3, it can be assumed that codon usage pattern is mainly framed by mutational pressure and if no such correlation exists, translational selection would be the major force. In the present study, neutrality plot revealed significant correlations between GC 12 and GC 3 of genes from chromatophore and genome of S. elongatus. Most of the 786 PCG of chromatophore and 2342 PCG of S. elongatus were grouped on the upper left of the neutrality plot. Slope of the regression line in both plots were not close to zero, indicating that influence of specific evolutionary pressure such as selection is weak. Thus, it can be proposed that mutational pressure is the key factor that shapes the codon usage pattern of both chromatophore and S. elongatus genome. Moreover, in PR2 bias plot of these two genomes, synonymous A, T and G, C contents were found to be used proportionally indicating the influence of GC compositional constraints. Interestingly, in the PS clade, significant correlation between GC 12 and GC 3 was found only in Prochlorococcus (Yu et al. 2012). Thus, we can assume that freshwater P. chromatophora genome and    S. elongatus genome are more similar to Marine Prochlorococcus than Marine Synechococcus in terms of factors that diversify SCU patterns. Relationship between SCUO and GC 3 formed a 'U' shape with two horns in both genomes as reported in unicellular microorganisms (Wan et al. 2004) and it reveals the influence of GC 3 over SCU bias. In chromatophore genome, three axes of COA were found to show higher correlation with silent base contents, confirming the influence of genome wide compositional constraints. However, axes 1 and 4 were highly correlated with codon usage indices that indicate the level of gene expression such as ENC and CAI. Since there were no major explanatory axes, correlation with these indices cannot be linked with the influence of selection. Hydropathic character of protein (gravy score) was correlated with axes 2, 3 and 4 in S. elongatus genome, suggesting that silent sites may be affected by hydropathy levels of protein whereas in chromatophore genome, gravy score did not show any correlation with any of the axes of COA. Correlation between length of CDS and axes 1, 3 and 4 in S. elongatus genome indicate the influence of length of CDS in SCU variation but no such correlation was existed in P. chromatophora. In S. elongatus genome, negative correlation existed between GC 3 and first four axes of COA confirms the GC 3 consequence on SCU pattern. Indices indicating the level of gene expression such as ENC and CAI were correlated significantly with first three axes of COA reflect the weak selection may take part in SCU variation of S. elongatus. Formation of three clusters of PCG along axis 2 in S. elongatus genome indicating a trend associated with RSCU value of C ending codons, but not observed in chromatophore genome. Whereas in chromatophore genome, TGT and TGC codons (encoding Cys) influence separation of PCG along axis 2. Influence of Cys codons in shaping SCU pattern was already reported in Lactococcus lactis (Gupta et al. 2004) and Rhizobium (Wang et al. 2013). However, these results suggested that genome wide compositional constraints influence the SCU patterns of both chromatophore genome and S. elongatus genome.
SCU patterns of chromatophore genome of P. chromatophora and S. elongatus may be closely associated with living habitats. The adapted habitat of P. chromatophora is a submerged vegetation in freshwater. Mud loving nature of this organism protects it from potential extrinsic mutagens like UV-B radiation and which in turn causes genome wide mutation as reported in Prochlorococcus (Partensky et al. 1999). Freshwater βcyanobacterium S. elongatus PCC6301 is less adaptive to varying environments as it resides strictly in euphotic zones, relatively with low nutrient contents at mesophilic temperature (Waterbury et al. 1986) unlike marine Synechococcus which is more adaptive to grow in varying nutrient conditions and temperatures (Moore et al. 1998). To make marine Synechococcus more adaptive to environment, translational selection shapes the codon usage patterns (Yu et al. 2012) but mutational pressure frames codon usage in less adaptive fresh water S. elongatus. Closely related species, living in distinct environments may exhibit considerable genomic diversity (Paul et al. 2010) that lead to differences in factors behind diversification of SCU patterns. Mutational pressure was found to be the major factor, influencing SCU pattern across PCG in strictly thermophilic cyanobacterium Thermosynechococcus elongatus BP-1 (Prabha et al. 2012) which is less adaptive to other temperature ranges as growth of thermophiles is restricted to particular environment at specific temperature (Botzman and Margalit 2011). These reports support our finding that SCU pattern of P. chromatophora and S. elongatus is dictated by mutational pressure due to their less adaptation to varying environments.

Conclusions
SCU pattern of photosynthetic endosymbiont (chromatophore) and S. elongatus genome is dictated mainly by genome wide GC mutational pressure. Living habitats of P. chromatophora and S. elongatus may also be influencing the SCU variations across genes of both genomes. However, complete genome sequencing of α-cyanobacteria from cyanobium clade would help further to understand SCU pattern and factors contributing diversification of SCU in presumed ancestors of photosynthetic endosymbionts of P. chromatophora.

Gene sequences
Complete coding sequences (CDS) of chromatophore genome (Genbank: NC_011087.1) of P. chromatophora (Nowack et al. 2008) and genome (Genbank: AP008231) of S. elongatus (Sugita et al. 2007) were retrieved from NCBI and CYORF (Cyanobacterial gene annotation database) respectively. CDS integrity was confirmed by checking the presence of START codon at the beginning and STOP codon at the end of each codon without any internal stop codons. To minimize the sampling errors, CDS with more than 300 nucleotides were chosen for analysis (Zhou and Li 2009;Sablok et al. 2011). Duplicate sequences were identified and excluded from the data set. Thus, the final data set of chromatophore genome consists 786 coding sequences that contain 2, 61,350 codons and 7, 84,050 nucleotides, whereas final data set of genome of S. elongates contains 2342 coding sequences that contain 7, 74, 810 codons and 23, 24, 430 nucleotides.

Indices of codon usage a) Relative synonymous codon usage (RSCU)
To infer the features of SCU variations across PCG in the chromatophore genome by not taking amino acid compositional constraints into account, the RSCU values of all PCG were estimated according to Sharp et al. (1986).

b) Effective number of codons (ENC)
ENC is an index that is widely used for measuring the extent of synonymous codon usage bias (Wright 1990). It can take values from 20 (only one codon is used for each of the 20 aminoacids) to 61 (when all synonymous codons are equally used). If the calculated ENC value is beyond 61 due to more even distribution of codon usage, it is adjusted to 61 (Wright 1990). Selection of preferred codons and mutational pressures may reduce the ENC values. The expected ENC under random codon usage is approximated as a function of GC 3 and calculated according to Wright (1990).

c) Codon adaptation index (CAI)
Codon adaptation index (CAI) is a measure of bias towards preferred codons in a PCG by defining the translationally optimal codons that are mostly represented in a reference set of highly expressed genes (Sharp and Li 1987). CAI value ranges from zero to one. Higher value indicates increased bias towards preferred codons. For this study, we used ribosomal protein coding genes as reference for estimating CAI values on the basis of equation, developed by Sharp and Li (1987).

d) Synonymous codon usage order (SCUO)
Synonymous codon usage order measurement was used to analyze the influence of GC composition at various codon positions on SCU. SCUO was computed using the following equation (Wan et al. 2004),

Sequence analysis
Nucleotide contents of all PCG were calculated using MEGA version 5.1 (Tamura et al. 2011). ENC values and CAI were calculated for all PCG by using online CodonW (http://codonw.sourceforge.net) and CAI calculator 2 (Wu et al. 2005). SCUO was computed using standalone CodonO (Wan et al. 2004).

Correspondence analysis (COA)
COA is a multivariate statistical method used to identify major factors, shaping SCU patterns across genes and plot genes according to various influencing factors of SCU (Perriere and Thioulouse 2002). Multivariate statistical analysis method was often employed to plot PCGs according to RSCU values of the 59 synonymous codons (excluding 3 stop codons, Trp and Met codons) (RoyChoudhury and Mukherjee 2010). COA develops a series of orthogonal axes to define the major factors that frame the SCU patterns in accordance with the variation of data. In this study, complete coding regions of each PCG were represented as a 59 dimensional vector (excluding Met, Trp and stop codons). Each dimension corresponds to RSCU value of one sense codon (Mardia et al. 1979).

Statistical analysis
All correlations were made using Spearman's rank correlation method as this measure of correlation does not require any distributional assumptions of the underlying data (Zhou and Li 2009). A Chi -square test involving 2 × 2 table was employed for 5% of genes distributed at extreme left and 5% of genes distributed at extreme right of axis 1 of COA to find out putative optimal codons. For each of 59 sense codons, First row contains the observed frequency of a codon and the second row contains total number of synonymous alternatives of that particular codon. The significance was calculated at the 5% level with one degree of freedom. All these analyses were done using Past version 2.12 (Hammer et al. 2001).