Finding the undiscovered roles of genes: an approach using mutual ranking of coexpressed genes and promoter architecture-case study: dual roles of thaumatin like proteins in biotic and abiotic stresses

Regarding the possible multiple functions of a specific gene, finding the alternative roles of genes is a major challenge. Huge amount of available expression data and the central role of the promoter and its regulatory elements provide unique opportunely to address this issue. The question is that how the expression data and promoter analysis can be applied to uncover the different functions of a gene. A computational approach has been presented here by analysis of promoter regulatory elements, coexpressed gene as well as protein domain and prosite analysis. We applied our approach on Thaumatin like protein (TLP) as example. TLP is of group 5 of pathogenesis related proteins which their antifungal role has been proved previously. In contrast, Osmotin like proteins (OLPs) are basic form of TLPs with proved role only in abiotic stresses. We demonstrated the possible outstanding homolouges involving in both biotic and abiotic stresses by analyzing 300 coexpressed genes for each Arabidopsis TLP and OLP in biotic, abiotic, hormone, and light microarray experiments based on mutual ranking. In addition, promoter analysis was employed to detect transcription factor binding sites (TFBs) and their differences between OLPs and TLPs. A specific combination of five TFBs was found in all TLPs presenting the key structure in functional response of TLP to fungal stress. Interestingly, we found the fungal response TFBs in some of salt responsive OLPs, indicating the possible role of OLPs in biotic stresses. Thirteen TFBS were unique for all OLPs and some found in TLPs, proposing the possible role of these TLPs in abiotic stresses. Multivariate analysis showed the possibility of estimating models for distinguishing biotic and abiotic functions of TIPs based on promoter regulatory elements. This is the first report in identifying multiple roles of TLPs and OLPs in biotic and abiotic stresses. This study provides valuable clues for screening and discovering new genes with possible roles in tolerance against both biotic and abiotic stresses. Interestingly, principle component analysis showed that promoter regulatory elements of TLPs and OLPs are more variable than protein properties reinforcing the prominent role of promoter architecture in determining gene function alteration. Electronic supplementary material The online version of this article (doi:10.1186/2193-1801-1-30) contains supplementary material, which is available to authorized users.


Introduction
Although non-coding sequences play a key role in transcriptional regulation, most of the studies have focused on identifying the genes and predicting their function based on coding sequences. However, gene function is the outcome of upstream non-coding promoter region and downstream coding sequence. Transcription factor binding sites (TFBs or cis-regulatory elements) which identify the specific timing and location of transcriptional activity are placed in the long non-coding sequence upstream of a gene. Diverse cis-regulatory modules are required for a specific expression pattern (Su et al. 2010). Consequently, the identification of regulatory motifs and their organization modules is an important step to improve understanding of gene expression and regulation. Consequently, promoter analysis can open a new avenue in the field of genes with unknown function.
As many phenotypes are the result of complex genegene interactions, there is an increased interest identifying gene sets underlying the expression of a given phenotype (Fichlin and FaFFA 2010). Interaction relationships among genes have not been allocated by the individual gene. Sharing the genes between different networks (cross talk) is common in system biology; as a result, one gene can play different functions. For instance, a gene can play bifunctional roles in biotic and abiotic stresses. Huge amount of available expression data and recent advances in sequencing of promoter regions provide the valuable opportunity for prediction of gene functions. However, a defined reliable approach is highly required here.
Thus, expression data and computational analysis might reveal the coexpressed gene subsets which are described to be highly correlated under one condition but uncorrelated under another condition (Varadan and Anastassiou 2006). The coexpressed genes should be analyzed by gene subsets rather than individual genes. Identification of stress specific coexpressed gene subsets is very useful for finding unfamiliar gene role (Zhang et al. 2009). In this study, we defined a subset of coexpressed genes based on Mutual Rank (MR) index. For any given pair, gene A and gene B, the MR is calculated as an average of the rank of gene B in the coexpressed genes to gene A and the average of the rank of gene A to gene B. It has been documented that MR is the better measure of similarity than the correlation value in order to determine related genes (Obayashi et al. 2009). This is partly because even the gene pair with low expression similarities can work together if no other genes are highly coexpressed, as in some examples where one gene is highly coexpressed according to the MRs, although expression similarities are low (Obayashi et al. 2007).
In addition to promoter and coexpressed gene analysis, to reveal the function of proteins the use of protein sequence patterns, especially discovery of prosite signature, is becoming one of the vital tools of sequence analysis. Short well-conserved regions of proteins are adapted as prosite (Hulo et al. 2008). They are typically enzyme catalytic sites, prosthetic group-attachment sites (haem, pyridoxal phosphate, biotin, etc.), metal ionbinding amino acids, cysteines involved in disulfide bonds or regions involved in binding a molecule (Hulo et al. 2008). In our previous study, we employed motif and domain analysis to predict different subcellular locations of glutathione reductase proteins (Tahmasebi et al. 2012).
As example, we analyzed a family of plant defense genes in plants. Defense mechanisms of plants are induced by multiple genes during different stresses. Manipulation of multiple genes is needed to artificially confer resistance to plants which is a time-consuming and labor-intensive task. As a result, finding the genes which their transformation can up-regulate some resistant genes simultaneously is of a great interest. Except transcription factors, Thaumatin like proteins (TLPs) are one of the best candidates for this purpose (Breiteneder 2000). TLPs have been categorized as a family 5 of Pathogenesis Related Proteins (PRs) (Zhong and Shen 2004). The induction of TLPs in plants resistance mechanism during pathogen infection has been proved (Petre et al. 2011). For decades, TLPs switching on by pathogens such as bacteria, virus and fungi has been defined in many higher plants (Liu and Ekramoddoullah 2010;Mukherjee et al. 2010). Although TLPs mechanisms remain unclear (Petre et al. 2011), membrane permeability (Vigers and Selitrennikoff 1991), b-glucan binding and degradation (Sakamoto et al. 2006), inhibition of enzymes such as xylanases (Fierens et al. 2007), aamylase, or trypsin (Schimoler-O'Rourke andSelitrennikoff 2001), possessed to some TLPs antifungal activity. Moreover participation of TLPs in pathogen defense mechanism, Rajam et al. 2007 have reported other functional properties for protection against abiotic stresses (Rajam et al. 2007).
TLPs basic isoform, Osmotin like proteins (OLPs), with a molecular weight of 24 KDa have reported as osmoprotectant in the tobacco cells (Abada et al. 1996;Yun et al. 1997). OLP protein and genomic sequence hasve been isolated from treated tobacco with high NaCl concentration (Singh et al. 1985). Proline accumulation happens by upregulation of osmotin conferring tolerance to osmotic stress in transgenic tobacco (Barthakur SBVB 2001). Besides induction of OLPs during salt stress, evidences show that a broad range of fungal pathogens can activate these proteins (Abada et al. 1996;Yun et al. 1997).
Regarding the valuable role of TLPs in resistance to both biotic and abiotic stresses, deciphering the complex mechanism and function of these protein homologs is interesting. Bioinformatics provide valuable tools in elucidating the function of mysterious gene. In this research, promoter analysis, analysis of coexpressed genes, and prosite study were employed to shed light on diverse functions of TLPs. The nature of specific cis-elements as activators, repressors, enhancers and chromatin modifiers is detectors of gene activities and combinatorial transcriptional regulation in plants (Yu et al. 2003). However, the differences between the function of TLP and OLP promoters are remained unknown. This study discovers the key elements responsible for dual role of TLPs in both biotic and abiotic stresses by in silico TLP and OLP comparative model analysis based on promoter characteristics.
In this study, a variety of bioinformatics tools including coexpressed genes determination, in silico promoter analysis, as well as in silico domains and prosite discovery were used to provide clues for better understanding and prediction of these diverse functions of TLPs and OLPs in Arabidopsis (Arabidopsis thaliana) and Rice (Oryza sativa). Furthermore, a statistical approach has been developed for prediction and distinguishing different functions of genes based on Mutual Ranking of coexpressed genes and multivariate analysis of regulatory elements on promoter regions.

Promoter analysis
Analysis of 1500 bp promoter sequence of Arabidopsis and Rice in both TLPs and OLPs groups predicted 34 fundamental specific transcription factor binding sites (TFBs) in all promoters. Thirteen TFBs were detected by TLPs promoter analyses. In contrast, only 5 TFBS were shared between all TLP genes (Table 1).
Regarding the proved role of TLPs in fungal/biotic resistance, these 5 elements can be assumed as bioticdefense elements for TLPs function. Interestingly, these 5 biotic-defense TFBs were found on some of OLPs ( Table 2). As a result, theses OLPs can be expressed during salt abiotic stresses and biotic fungal stress making them as super resistance genes. It should be noted that identification of these genes by common laboratory techniques is a time-consuming and expensive method, while this rapid bioinformatics approach can provide a short list of potential outstanding homologs with dual resistance properties for further laboratory tests.
Rice OLP isoform (Os01g0839900) does not carry the shared elements of TLPs. In contrast, the majority of OLPs in Arabidopsis contain the shared biotic responsible elements of TLPs (Table 2). Consequently, these OLPs homologes may upregulate in both biotic and abiotic stresses. The sequences and the predicted ciselements of Rice OLP (Os01g0839900) and Rice TLP (Os04g0689900) have been presented in Figure 1 and Figure 2.
In silico promoter analysis of OLPs detected 21 TFBS which 13 of them were shared between all OLPs (Table 1). The function of these 13 TFBS mainly was related to salt stress. Some TLPs carry this structure similar to OLPs showing possible roles in salt/abiotic resistance as well as fungal/biotic resistance (Table 2). Some TLPs in Rice had the OLP-salt resistance elements except 3 of them showing the role in fungal stress (Table 2). With regard to the central role of the promoter and its regulatory elements, it seems that the most researchers have missed the outstanding advantages of promoter analysis in prediction of gene function and discovering the genes with similar function. Here, for the first time, we found the conserved combination model of regulatory elements on the promoter of TLP fungal resistance genes (ASRC/CCAF/L1BX/NCS1/WBXF) which can efficiently be used for screening the genes with unknown function and finding the new effective genes in fungal and biotic resistance. In the same road, a unique complex regulatory element combination (ABRE/CARM/ CNAC/GAGA/IDDF/LEGB/MIIG/NACF/OPAQ/SPF1/ WNAC) was found for screening the effective genes involved in abiotic salt stress ( Table 1).
The results revealed the dramatic differences between OLPs in rice with Arabidopsis. While most of Arabidopsis OLPs promoters carry the additional fungal response TFBs, Rice OLPs does not have this structure. In other words, opposite to Arabidopsis, Rice OLPs are mainly involved in salt stress. This finding highlights the crucial role of considering homolog source of gene and promoter at the time of gene isolation and transferring.

Coexpressed gene analysis
Another in silico analysis tool, which can provide valuable clues about different functions of a gene, is analysis of coexpressed genes with gene of interest using available transcriptomics data in databases. The analysis of coexpressed gene using deposited microarray data indicated the role of some Arabidopsis's TLPs in abiotic stresses and some OLPs in biotic stresses (Table 3, Additional file 1). We analyzed 300 coexpressed genes and selected some genes with MR < 10 for each TLPs and OLPs in biotic, abiotic, hormone and light microarray experiments by ATTED-II (http://atted.jp). Based on the function of each coexpressed gene in each experiment, we could suggest the outstanding role of some TLPs and OLPs in response to both biotic and abiotic stresses. As presented in Table 3, among 21 TLPs, just 2 of them (AT1G19320/AT4G36000) has no coexpressed gene with MR < 10 in abiotic experiments revealing that these two isoforms upregulate specificly in biotic experiments. This result identified that AT1G19320 and AT4G36000 can be activated solely in response to biotic stresses in plants. In contrast, other 19 isoforms of TLPs have coexpressed gene with MR < 10 in both biotic and abiotic stresses. This result suggests the bifunctional role of some TLPs homologs in response to biotic and abiotic stresses (Table 3, Additional file 1). In OLP group, AT2G28790 does not activate by biotic stresses because there is no coexpressed gene by MR < 10 in biotic experiments by this OLP homolog. In contrast, there are 7 genes (At3g12500/At1g02220/ At3g01420/At3g60140/At1g55020/At2g14620/At3g21500) in biotic microarray experiments which coexpressed by another isoform of OLP (At4g11650).    Interestingly, to some extent, the results of coexpressed analysis were confirmed by the results of promoter analysis. As example, we found fungal and salt response elements on At4g11650 promoter, and in the same line, coexpressed analysis proved the dual expressions of At4g11650 and its associated genes in both biotic and abiotic microarray experiments. This finding suggests that coexpressed gene selected by MR index can be used to justify the activation of in silico discovered promoter regulatory elements (TFBs) and uncovering the different functions of genes.

Domains and prosite analysis
Difference in the function of genes can be tracked in their coding sequences (which results in different protein structures) or in the promoter region (which results in different protein structure). In this part of study, domains and prosite of OLPs and TLPs homologs were extracted and compared. Domain analysis did not result in distinct differences between TLPs and OLPs as domain did not found in the majority of sequences (Additional file 2). Interestingly, prosite assay resulted in distinct differences between salt and fungal homologs (Figure 3, Additional file 3). Figure 3 shows that some prosites have different distributions between TLP and OLP. CK2_PHOSPHO_SITE Casein kinase II phosphorylation site (PS00006), PKC_PHOSPHO_SITE Protein kinase C phosphorylation site (PS00005), and ASN_ GLYCOSYLATION N-glycosylation site (PS00001) are more abundant in OLP compared to TLP homologs ( Figure 3).In contrast, THAUMATIN_2 Thaumatin family profile (PS51367) and CAMP_PHOSPHO_SITE cAMP-and cGMP-dependent protein kinase phosphorylation site (PS00004) are more frequent in TLP homologs ( Figure 3). It can be concluded that differences in gene function in protein level can be traced in prosites At3g12500/At1g02220/ At3g01420/At3g60140/ At1g55020 At2g14620/ At3g21500 At3g12500/At4g16260/ At3g04720 At3g12500/ At1g73260At5g43580/ At4g16260 265920_s_at/ At3g09220/At4g23700/ At3g04720/At2g45220/ At4g05200/At2g43510 Abiotic/ biotic Coexpressed genes were selected based on Mutual Rank (MR) < 10. which are biologically significant short sequences in comparison to domains. It should be noted that changing or adding domains (larger organization) needs more energy than prosite alteration.
In the next part of study, Discriminant Function Analysis (DFA) carried out to estimate models for separation of TLPs from OLPs based on TFBs of promoter regions. The following models were developed based on biotic promoter regulatory elements ( Table 1). As it can be inferred from the following formula, TLPs and OLPs have apparent different coefficients in WBXF and L1BX elements. In other words, WBXF and L1BX are main TFBs distinguishing specific TLPs from specific OLPs.
The mean value for discriminant value for TLP was -53.2, while this value was -28.6 for OLP homologs. Similar to PCA, Discriminant Function Analysis is a valuable technique, since the genes with intermediate values can pe proposed as genes with dual functional roles. Figure 4 compares classification of TLPs and OLPs based on both promoter regulatory elements and prosite motifs of proteins. As it can be inferred from Figure 4, promoter elements are more variable than prosite elements. It can be concluded that promoter elements play more key role in differentiation of TLPs from OLPs and assigning gene functions to a gene.

Importance of promoter elements in the success of genetic transformation
Commonly, in genetic transformation procedure, after cloning the gene, general promoters such as 35 S are used. However, regarding the key role of promoter for proper function, a special attention should be paid to cloning and transformation of outstanding promoter as well as gene to obtain satisfactory result we suggest that in new transformation activities a. As example, Kim et al., (2008) observed that seed-specific promoter is prerequisite for proper function of fatty acid desaturase genes in altering the unsaturated fatty acid content of oilseeds by genetic manipulation expression (Kim et al. 2008).
Up to now, the majority of researchers just considered individual gene to predict gene function. The approach employed in this research considering coexpressed gene with gene of interest and promoter analysis, as well as illustrating prosite structure can result in reveal valuable findings about protein function in different pathway. In particular, the unique regulatory elements (responding to different sorts of stresses) open a new avenue in genetic engineering trough manipulating of cis-acting regulatory elements on promoter region.

Conclusion
Here, for the first time, we demonstrated that promoter analysis of TLPs and OLPs can explain multiple roles of TLPs and OLPs in biotic and abiotic stresses. In addition, we showed that analysis of coexpressed genes with gene of interest analysis can provide valuable insight in dtertmination of diverse role of genes. In conclusion, our results revealed that, new computational tools such as coexpressed gene analysis, cis regulatory analysis and in silico protein analysis can identify the outstanding TLPs and OLPs homologue involving in response to biotic and abiotic stresses. Discovering the genes with dual resistance functions in biotic and abiotic stresses is a major advance in genetic transformation. Furthermore, the present methods can be efficiently employed in discovering the unknown function of genes.
To highlight the roles of specific TFBS in promoter activity, the general core promoter elements (such as TATA-box) were disregarded. The number and position of promoter regulatory elements, particularly hormonal, biotic and abiotic ones were compared between TLPs and OLPs.

Coexpressed genes analysis
All TLPs and OLPs locuses of Arabidopsis thaliana has been selected from TAIR database (www.arabidopsis. org). In order to analyze the coexpressed gene we used ATTED-II (http://atted.jp) was used. This database collects gene expression data in Arabidopsis from a wide range of microarray experiments. Three hundered coexpressed genes by each TLP and OLP locus were extracted from abiotic, biotic, hormone and light experiments in this database. To avoid discarding potentially important coexpressed gene pairs having low Pearson's correlation coefficient (PCCs), ATTED-II employs a new measure of gene coexpression, Mutual Rank (MR). Correlation rank is asymmetric, namely the rank of gene B from gene A is not the same as the rank of gene A from gene B. And thus, those two ranks are geometrically averaged, which we call Mutual Rank (MR). MR(AB) = √ (Rank(A ! B) x Rank(B ! A)).
For any given pair, gene A and gene B, the MR is calculated as an average of the rank of gene B in the coexpressed genes to gene A and the average of the rank of gene A to gene B. We selected the coexpressed gene in each experiment by MR < 10 (Additional file 1, Table 3).

Multivariate analysis
Principle Component Analysis and Discriminant Function Analysis by Minitab 16 package (www.minitab.com/). For performing the above mentioned analysis, different promoter regulatory elements and prosite motifs were used as variables (Table 1 and Additional file 3).