Although non-coding sequences play a key role in transcriptional regulation, most of the studies have focused on identifying the genes and predicting their function based on coding sequences. However, gene function is the outcome of upstream non-coding promoter region and downstream coding sequence. Transcription factor binding sites (TFBs or cis-regulatory elements) which identify the specific timing and location of transcriptional activity are placed in the long non-coding sequence upstream of a gene. Diverse cis-regulatory modules are required for a specific expression pattern (Su et al.). Consequently, the identification of regulatory motifs and their organization modules is an important step to improve understanding of gene expression and regulation. Consequently, promoter analysis can open a new avenue in the field of genes with unknown function.
As many phenotypes are the result of complex gene-gene interactions, there is an increased interest identifying gene sets underlying the expression of a given phenotype ([Fichlin and FaFFA 2010]). Interaction relationships among genes have not been allocated by the individual gene. Sharing the genes between different networks (cross talk) is common in system biology; as a result, one gene can play different functions. For instance, a gene can play bifunctional roles in biotic and abiotic stresses. Huge amount of available expression data and recent advances in sequencing of promoter regions provide the valuable opportunity for prediction of gene functions. However, a defined reliable approach is highly required here.
Thus, expression data and computational analysis might reveal the coexpressed gene subsets which are described to be highly correlated under one condition but uncorrelated under another condition ([Varadan and Anastassiou 2006]). The coexpressed genes should be analyzed by gene subsets rather than individual genes. Identification of stress specific coexpressed gene subsets is very useful for finding unfamiliar gene role ([Zhang et al. 2009]). In this study, we defined a subset of coexpressed genes based on Mutual Rank (MR) index. For any given pair, gene A and gene B, the MR is calculated as an average of the rank of gene B in the coexpressed genes to gene A and the average of the rank of gene A to gene B. It has been documented that MR is the better measure of similarity than the correlation value in order to determine related genes ([Obayashi et al. 2009]). This is partly because even the gene pair with low expression similarities can work together if no other genes are highly coexpressed, as in some examples where one gene is highly coexpressed according to the MRs, although expression similarities are low ([Obayashi et al. 2007]).
In addition to promoter and coexpressed gene analysis, to reveal the function of proteins the use of protein sequence patterns, especially discovery of prosite signature, is becoming one of the vital tools of sequence analysis. Short well-conserved regions of proteins are adapted as prosite ([Hulo et al. 2008]). They are typically enzyme catalytic sites, prosthetic group-attachment sites (haem, pyridoxal phosphate, biotin, etc.), metal ion-binding amino acids, cysteines involved in disulfide bonds or regions involved in binding a molecule ([Hulo et al. 2008]). In our previous study, we employed motif and domain analysis to predict different subcellular locations of glutathione reductase proteins ([Tahmasebi et al. 2012]).
As example, we analyzed a family of plant defense genes in plants. Defense mechanisms of plants are induced by multiple genes during different stresses. Manipulation of multiple genes is needed to artificially confer resistance to plants which is a time-consuming and labor-intensive task. As a result, finding the genes which their transformation can up-regulate some resistant genes simultaneously is of a great interest. Except transcription factors, Thaumatin like proteins (TLPs) are one of the best candidates for this purpose ([Breiteneder 2000]). TLPs have been categorized as a family 5 of Pathogenesis Related Proteins (PRs) ([Zhong and Shen 2004]). The induction of TLPs in plants resistance mechanism during pathogen infection has been proved ([Petre et al. 2011]). For decades, TLPs switching on by pathogens such as bacteria, virus and fungi has been defined in many higher plants ([Liu and Ekramoddoullah 2010];[Mukherjee et al. 2010]). Although TLPs mechanisms remain unclear ([Petre et al. 2011]), membrane permeability ([Vigers and Selitrennikoff 1991]), b-glucan binding and degradation ([Sakamoto et al. 2006]), inhibition of enzymes such as xylanases ([Fierens et al. 2007]), a-amylase, or trypsin (Schimoler-O’[Rourke and Selitrennikoff 2001]), possessed to some TLPs antifungal activity. Moreover participation of TLPs in pathogen defense mechanism,[Rajam et al. 2007] have reported other functional properties for protection against abiotic stresses ([Rajam et al. 2007]).
TLPs basic isoform, Osmotin like proteins (OLPs), with a molecular weight of 24 KDa have reported as osmoprotectant in the tobacco cells ([Abada et al. 1996];[Yun et al. 1997]). OLP protein and genomic sequence hasve been isolated from treated tobacco with high NaCl concentration ([Singh et al. 1985]). Proline accumulation happens by upregulation of osmotin conferring tolerance to osmotic stress in transgenic tobacco ([Barthakur SBVB 2001]). Besides induction of OLPs during salt stress, evidences show that a broad range of fungal pathogens can activate these proteins ([Abada et al. 1996];[Yun et al. 1997]).
Regarding the valuable role of TLPs in resistance to both biotic and abiotic stresses, deciphering the complex mechanism and function of these protein homologs is interesting. Bioinformatics provide valuable tools in elucidating the function of mysterious gene. In this research, promoter analysis, analysis of coexpressed genes, and prosite study were employed to shed light on diverse functions of TLPs. The nature of specific cis-elements as activators, repressors, enhancers and chromatin modifiers is detectors of gene activities and combinatorial transcriptional regulation in plants ([Yu et al. 2003]). However, the differences between the function of TLP and OLP promoters are remained unknown. This study discovers the key elements responsible for dual role of TLPs in both biotic and abiotic stresses by in silico TLP and OLP comparative model analysis based on promoter characteristics.
In this study, a variety of bioinformatics tools including coexpressed genes determination, in silico promoter analysis, as well as in silico domains and prosite discovery were used to provide clues for better understanding and prediction of these diverse functions of TLPs and OLPs in Arabidopsis (Arabidopsis thaliana) and Rice (Oryza sativa). Furthermore, a statistical approach has been developed for prediction and distinguishing different functions of genes based on Mutual Ranking of coexpressed genes and multivariate analysis of regulatory elements on promoter regions.