Finding the undiscovered roles of genes: an approach using mutual ranking of coexpressed genes and promoter architecture-case study: dual roles of thaumatin like proteins in biotic and abiotic stresses
© Deihimi et al.; licensee Springer. 2012
Received: 11 August 2012
Accepted: 27 September 2012
Published: 5 October 2012
Regarding the possible multiple functions of a specific gene, finding the alternative roles of genes is a major challenge. Huge amount of available expression data and the central role of the promoter and its regulatory elements provide unique opportunely to address this issue. The question is that how the expression data and promoter analysis can be applied to uncover the different functions of a gene. A computational approach has been presented here by analysis of promoter regulatory elements, coexpressed gene as well as protein domain and prosite analysis. We applied our approach on Thaumatin like protein (TLP) as example. TLP is of group 5 of pathogenesis related proteins which their antifungal role has been proved previously. In contrast, Osmotin like proteins (OLPs) are basic form of TLPs with proved role only in abiotic stresses. We demonstrated the possible outstanding homolouges involving in both biotic and abiotic stresses by analyzing 300 coexpressed genes for each Arabidopsis TLP and OLP in biotic, abiotic, hormone, and light microarray experiments based on mutual ranking. In addition, promoter analysis was employed to detect transcription factor binding sites (TFBs) and their differences between OLPs and TLPs. A specific combination of five TFBs was found in all TLPs presenting the key structure in functional response of TLP to fungal stress. Interestingly, we found the fungal response TFBs in some of salt responsive OLPs, indicating the possible role of OLPs in biotic stresses. Thirteen TFBS were unique for all OLPs and some found in TLPs, proposing the possible role of these TLPs in abiotic stresses. Multivariate analysis showed the possibility of estimating models for distinguishing biotic and abiotic functions of TIPs based on promoter regulatory elements. This is the first report in identifying multiple roles of TLPs and OLPs in biotic and abiotic stresses. This study provides valuable clues for screening and discovering new genes with possible roles in tolerance against both biotic and abiotic stresses. Interestingly, principle component analysis showed that promoter regulatory elements of TLPs and OLPs are more variable than protein properties reinforcing the prominent role of promoter architecture in determining gene function alteration.
KeywordsPromoter analysis Domain and prosite analysis Gene expression Multivariate analysis Thaumatin like proteins Stress
Although non-coding sequences play a key role in transcriptional regulation, most of the studies have focused on identifying the genes and predicting their function based on coding sequences. However, gene function is the outcome of upstream non-coding promoter region and downstream coding sequence. Transcription factor binding sites (TFBs or cis-regulatory elements) which identify the specific timing and location of transcriptional activity are placed in the long non-coding sequence upstream of a gene. Diverse cis-regulatory modules are required for a specific expression pattern (Su et al.). Consequently, the identification of regulatory motifs and their organization modules is an important step to improve understanding of gene expression and regulation. Consequently, promoter analysis can open a new avenue in the field of genes with unknown function.
As many phenotypes are the result of complex gene-gene interactions, there is an increased interest identifying gene sets underlying the expression of a given phenotype ([Fichlin and FaFFA 2010]). Interaction relationships among genes have not been allocated by the individual gene. Sharing the genes between different networks (cross talk) is common in system biology; as a result, one gene can play different functions. For instance, a gene can play bifunctional roles in biotic and abiotic stresses. Huge amount of available expression data and recent advances in sequencing of promoter regions provide the valuable opportunity for prediction of gene functions. However, a defined reliable approach is highly required here.
Thus, expression data and computational analysis might reveal the coexpressed gene subsets which are described to be highly correlated under one condition but uncorrelated under another condition ([Varadan and Anastassiou 2006]). The coexpressed genes should be analyzed by gene subsets rather than individual genes. Identification of stress specific coexpressed gene subsets is very useful for finding unfamiliar gene role ([Zhang et al. 2009]). In this study, we defined a subset of coexpressed genes based on Mutual Rank (MR) index. For any given pair, gene A and gene B, the MR is calculated as an average of the rank of gene B in the coexpressed genes to gene A and the average of the rank of gene A to gene B. It has been documented that MR is the better measure of similarity than the correlation value in order to determine related genes ([Obayashi et al. 2009]). This is partly because even the gene pair with low expression similarities can work together if no other genes are highly coexpressed, as in some examples where one gene is highly coexpressed according to the MRs, although expression similarities are low ([Obayashi et al. 2007]).
In addition to promoter and coexpressed gene analysis, to reveal the function of proteins the use of protein sequence patterns, especially discovery of prosite signature, is becoming one of the vital tools of sequence analysis. Short well-conserved regions of proteins are adapted as prosite ([Hulo et al. 2008]). They are typically enzyme catalytic sites, prosthetic group-attachment sites (haem, pyridoxal phosphate, biotin, etc.), metal ion-binding amino acids, cysteines involved in disulfide bonds or regions involved in binding a molecule ([Hulo et al. 2008]). In our previous study, we employed motif and domain analysis to predict different subcellular locations of glutathione reductase proteins ([Tahmasebi et al. 2012]).
As example, we analyzed a family of plant defense genes in plants. Defense mechanisms of plants are induced by multiple genes during different stresses. Manipulation of multiple genes is needed to artificially confer resistance to plants which is a time-consuming and labor-intensive task. As a result, finding the genes which their transformation can up-regulate some resistant genes simultaneously is of a great interest. Except transcription factors, Thaumatin like proteins (TLPs) are one of the best candidates for this purpose ([Breiteneder 2000]). TLPs have been categorized as a family 5 of Pathogenesis Related Proteins (PRs) ([Zhong and Shen 2004]). The induction of TLPs in plants resistance mechanism during pathogen infection has been proved ([Petre et al. 2011]). For decades, TLPs switching on by pathogens such as bacteria, virus and fungi has been defined in many higher plants ([Liu and Ekramoddoullah 2010];[Mukherjee et al. 2010]). Although TLPs mechanisms remain unclear ([Petre et al. 2011]), membrane permeability ([Vigers and Selitrennikoff 1991]), b-glucan binding and degradation ([Sakamoto et al. 2006]), inhibition of enzymes such as xylanases ([Fierens et al. 2007]), a-amylase, or trypsin (Schimoler-O’[Rourke and Selitrennikoff 2001]), possessed to some TLPs antifungal activity. Moreover participation of TLPs in pathogen defense mechanism,[Rajam et al. 2007] have reported other functional properties for protection against abiotic stresses ([Rajam et al. 2007]).
TLPs basic isoform, Osmotin like proteins (OLPs), with a molecular weight of 24 KDa have reported as osmoprotectant in the tobacco cells ([Abada et al. 1996];[Yun et al. 1997]). OLP protein and genomic sequence hasve been isolated from treated tobacco with high NaCl concentration ([Singh et al. 1985]). Proline accumulation happens by upregulation of osmotin conferring tolerance to osmotic stress in transgenic tobacco ([Barthakur SBVB 2001]). Besides induction of OLPs during salt stress, evidences show that a broad range of fungal pathogens can activate these proteins ([Abada et al. 1996];[Yun et al. 1997]).
Regarding the valuable role of TLPs in resistance to both biotic and abiotic stresses, deciphering the complex mechanism and function of these protein homologs is interesting. Bioinformatics provide valuable tools in elucidating the function of mysterious gene. In this research, promoter analysis, analysis of coexpressed genes, and prosite study were employed to shed light on diverse functions of TLPs. The nature of specific cis-elements as activators, repressors, enhancers and chromatin modifiers is detectors of gene activities and combinatorial transcriptional regulation in plants ([Yu et al. 2003]). However, the differences between the function of TLP and OLP promoters are remained unknown. This study discovers the key elements responsible for dual role of TLPs in both biotic and abiotic stresses by in silico TLP and OLP comparative model analysis based on promoter characteristics.
In this study, a variety of bioinformatics tools including coexpressed genes determination, in silico promoter analysis, as well as in silico domains and prosite discovery were used to provide clues for better understanding and prediction of these diverse functions of TLPs and OLPs in Arabidopsis (Arabidopsis thaliana) and Rice (Oryza sativa). Furthermore, a statistical approach has been developed for prediction and distinguishing different functions of genes based on Mutual Ranking of coexpressed genes and multivariate analysis of regulatory elements on promoter regions.
Result and discussion
Transcription factor binding sites on the promoter region of Thaumatin like proteins (TLPs) and Osmotin like proteins (OLPs)
Thirteen cis-acting regulatory elements which are shared between all OLPs
Five cis-acting regulatory elements which are shared between all TLPs
ABA inducible transcriptional activator
Circadian clock associated
Calmodulin binding NAC protein
Nodulin consensus sequence
(GA)n/(CT)n binding proteins
Intermediate zinc figure protein
Activator of flavonoid biosynthesis gene
Transcription factor binding to the iron deficiency-responsive element
SA induction of secreted gene
DNA binding protein that binds to beta amylase
NAC domain DNA binding factor
Screening the Thaumatin like proteins which can perform dual function against fungal (biotic) and salt (abiotic) stresses through presented promoter regulatory element model (TFBs) in this research for biotic and abiotic stresses
Primary resistance function
Extra regulatory elements related to another type of stress (biotic/abiotic)
Secondary predicted resistance function
In silico promoter analysis of OLPs detected 21 TFBS which 13 of them were shared between all OLPs (Table1). The function of these 13 TFBS mainly was related to salt stress. Some TLPs carry this structure similar to OLPs showing possible roles in salt/abiotic resistance as well as fungal/biotic resistance (Table2). Some TLPs in Rice had the OLP-salt resistance elements except 3 of them showing the role in fungal stress (Table2).
With regard to the central role of the promoter and its regulatory elements, it seems that the most researchers have missed the outstanding advantages of promoter analysis in prediction of gene function and discovering the genes with similar function. Here, for the first time, we found the conserved combination model of regulatory elements on the promoter of TLP fungal resistance genes (ASRC/CCAF/L1BX/NCS1/WBXF) which can efficiently be used for screening the genes with unknown function and finding the new effective genes in fungal and biotic resistance. In the same road, a unique complex regulatory element combination (ABRE/CARM/CNAC/GAGA/IDDF/LEGB/MIIG/NACF/OPAQ/SPF1/WNAC) was found for screening the effective genes involved in abiotic salt stress (Table1).
The results revealed the dramatic differences between OLPs in rice with Arabidopsis. While most of Arabidopsis OLPs promoters carry the additional fungal response TFBs, Rice OLPs does not have this structure. In other words, opposite to Arabidopsis, Rice OLPs are mainly involved in salt stress. This finding highlights the crucial role of considering homolog source of gene and promoter at the time of gene isolation and transferring.
Coexpressed gene analysis
Coexpressed genes with TLPs and OLPs loci in different biotic, abiotic, hormone and light microarray experiments
Coexpressed genes in abiotic microarray experiments
Coexpressed genes in biotic microarray experiments
Coexpressed genes in hormone microarray experiments
Coexpressed genes in light Microarray experiments
Interestingly, to some extent, the results of coexpressed analysis were confirmed by the results of promoter analysis. As example, we found fungal and salt response elements on At4g11650 promoter, and in the same line, coexpressed analysis proved the dual expressions of At4g11650 and its associated genes in both biotic and abiotic microarray experiments. This finding suggests that coexpressed gene selected by MR index can be used to justify the activation of in silico discovered promoter regulatory elements (TFBs) and uncovering the different functions of genes.
Domains and prosite analysis
Comparative multivariate analysis of promoter regulatory elements and prosite elements of TLP and OLP homologs
The mean value for discriminant value for TLP was -53.2, while this value was -28.6 for OLP homologs. Similar to PCA, Discriminant Function Analysis is a valuable technique, since the genes with intermediate values can pe proposed as genes with dual functional roles.
Figure4 compares classification of TLPs and OLPs based on both promoter regulatory elements and prosite motifs of proteins. As it can be inferred from Figure4, promoter elements are more variable than prosite elements. It can be concluded that promoter elements play more key role in differentiation of TLPs from OLPs and assigning gene functions to a gene.
Importance of promoter elements in the success of genetic transformation
Commonly, in genetic transformation procedure, after cloning the gene, general promoters such as 35 S are used. However, regarding the key role of promoter for proper function, a special attention should be paid to cloning and transformation of outstanding promoter as well as gene to obtain satisfactory result we suggest that in new transformation activities a. As example, ([Kim et al., 2008]) observed that seed-specific promoter is prerequisite for proper function of fatty acid desaturase genes in altering the unsaturated fatty acid content of oilseeds by genetic manipulation expression ([Kim et al. 2008]).
Up to now, the majority of researchers just considered individual gene to predict gene function. The approach employed in this research considering coexpressed gene with gene of interest and promoter analysis, as well as illustrating prosite structure can result in reveal valuable findings about protein function in different pathway. In particular, the unique regulatory elements (responding to different sorts of stresses) open a new avenue in genetic engineering trough manipulating of cis-acting regulatory elements on promoter region.
Here, for the first time, we demonstrated that promoter analysis of TLPs and OLPs can explain multiple roles of TLPs and OLPs in biotic and abiotic stresses. In addition, we showed that analysis of coexpressed genes with gene of interest analysis can provide valuable insight in dtertmination of diverse role of genes. In conclusion, our results revealed that, new computational tools such as coexpressed gene analysis, cis regulatory analysis and in silico protein analysis can identify the outstanding TLPs and OLPs homologue involving in response to biotic and abiotic stresses. Discovering the genes with dual resistance functions in biotic and abiotic stresses is a major advance in genetic transformation. Furthermore, the present methods can be efficiently employed in discovering the unknown function of genes.
Material and methods
Genome-wide collection of all genes encoding OLPs (acting against salt stress) (AT1G75800, AT2G28790, AT4G36010, ATOSM34 or AT4G11650.1, Os01g0839900) and TLPs (acting against fungal stress) (AT1G73620, AT1G77700, AT4G36010, AT4G38660.1, AT4G38660.2, AT5G02140, AT5G40020, AT1G18250, AT1G75030, OS04G0689900, Os10g0412700) in the Arabidopsis and Rice genomes carried out using Genomatix (http://www.genomatix.de/en/index.html) and TAIR (http://www.arabidopsis.org/) databases.
Cis-acting regulatory elements of each group of TLPs and OLPs were recognized by in silico promoter analysis using Genomatix (http://www.genomatix.de/en/index.html) and PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) databases.
To highlight the roles of specific TFBS in promoter activity, the general core promoter elements (such as TATA-box) were disregarded. The number and position of promoter regulatory elements, particularly hormonal, biotic and abiotic ones were compared between TLPs and OLPs.
Coexpressed genes analysis
All TLPs and OLPs locuses of Arabidopsis thaliana has been selected from TAIR database (http://www.arabidopsis.org). In order to analyze the coexpressed gene we used ATTED-II (http://atted.jp) was used. This database collects gene expression data in Arabidopsis from a wide range of microarray experiments. Three hundered coexpressed genes by each TLP and OLP locus were extracted from abiotic, biotic, hormone and light experiments in this database. To avoid discarding potentially important coexpressed gene pairs having low Pearson’s correlation coefficient (PCCs), ATTED-II employs a new measure of gene coexpression, Mutual Rank (MR). Correlation rank is asymmetric, namely the rank of gene B from gene A is not the same as the rank of gene A from gene B. And thus, those two ranks are geometrically averaged, which we call Mutual Rank (MR). MR(AB) = √ (Rank(A → B) x Rank(B → A)).
For any given pair, gene A and gene B, the MR is calculated as an average of the rank of gene B in the coexpressed genes to gene A and the average of the rank of gene A to gene B. We selected the coexpressed gene in each experiment by MR < 10 ( Additional file1, Table3).
Domains and prosites
In order to investigate all TLPs and OLPs protein structure, domains and prosites identification were applied. All 14 protein sequences of TLPs and OLPs (10 TLPs, and 4 OLPs) extracted from NCBI (http://www.ncbi.nlm.nih.gov/). Protein domains have been extracted from pfam database (http://pfam.sanger.ac.uk/) and prosites from NPS (PROSCAN) database (http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_proscan.html).
Principle Component Analysis and Discriminant Function Analysis by Minitab 16 package (http://www.minitab.com/). For performing the above mentioned analysis, different promoter regulatory elements and prosite motifs were used as variables (Table1 and Additional file3).
Thaumatin like protein
- PR proteins:
Pathogenesis related proteins
Osmotin like protein
Transcription factor binding site
We would like to thank School of Molecular & Biomedical Science of The University of Adelaide, Australia and Bioinformatics Research Group of Qom University, Iran for their valuable helps.
- Abada LRDUMP, Liua D, Narasimhan ML, Reuveni M, Zhua JK, Niua X, Singhb NK, Hasegawaa PM, Bressan RA: Antifungal activity of tobacco osmotin has specificity and involves plasma membrane permeabilization. Plant Sci 1996, 118: 11-23. 10.1016/0168-9452(96)04420-2View Article
- Barthakur SBVB KC: Over-expression of osmotin induces proline accumulation and confers tolerance to osmotic stress in transgenic tobacco. Plant Bioch Biotech 2001,10(1):31-37.View Article
- Breiteneder HEC: Molecular and biochemical classification of plant-derived food allergens. Allergy Clin Immunol 2000, 106: 27-36. 10.1067/mai.2000.106929View Article
- Varadan V, Anastassiou D: Inference of Disease-Related Molecular Logic from Systems-Based Microarray Analysis. PLoS Comput Biol 2006,2(6):e68. 10.1371/journal.pcbi.0020068View Article
- Fichlin SP, FaFFA L: The Association of Multiple Interacting Genes with Specific Phenotypes in Rice Using Gene Coexpression Networks. Plant Physiol 2010,154(1):13-24. 10.1104/pp.110.159459View Article
- Fierens ERS, Gebruers K, Goesaert H, Brijs K, Beaugrand J, Volckaert G, Van Campenhout S, Proost P, Courtin CM, Delcour JA: TLX1, a novel type of xylanase inhibitor from wheat (Triticum aestivum) belonging to the thaumatin family. Biochem J 2007, 403: 583-591. 10.1042/BJ20061291View Article
- Hulo NBA, Bulliard V, Cerutti L, Cuche BA, De Castro E, Lachaize C, Langendijk-Genevaux PS, Sigrist CJA: The 20 years of PROSITE. Nucleic Acids Res 2008,36(database):D245-D249.View Article
- Tahmasebi A, Aram F, Ebrahimi M, Mohammadi-Dehcheshmeh M, Ebrahimie E: Genome-wide analysis of cytosolic and chloroplastic isoforms of glutathione reductase in plant cells. Plant Omics 2012,5(2):94-102.
- Kim M, Go Y, Ahn S, Chung C-H, Suh M: Functional complementation of a periila ω3 fatty acid desaturase under the seed-specific SeFAD2 promoter. J Plant Biol 2008,51(3):174-179. 10.1007/bf03030695View Article
- Liu JJSR, Ekramoddoullah AKM: The superfamily of thaumatin-like proteins: its origin, evolution, and expression towards biological function. Plant Cell Rep 2010, 29: 419-436. 10.1007/s00299-010-0826-8View Article
- Mukherjee AKCM, Zuchman R, Ziv T, Horwitz BA, Gepstein S: Proteomics of the response of Arabidopsis thaliana to infection with Alternaria brassicicola. Proteomics 2010, 73: 709-720. 10.1016/j.jprot.2009.10.005View Article
- Obayashi THS, Saeki M, Ohta H, Kinoshita K: ATTED-II provides coexpressed gene networks for Arabidopsis. Nucleic Acids Res 2009,37(Database issue):D987-D991.View Article
- Obayashi THS, Shibaoka M, Saeki M, Ohta H, Kinoshita K: COXPRESdb: a database of coexpressed gene networks in mammals. Nucleic Acids Res 2007,36(Database issue):D77-D82.View Article
- Petre BMI, Rouhier N, Duplessis S: Genome-wide analysis of eukaryote thaumatinlike proteins (TLPs) with an emphasis on poplar. BMC Plant Biol 2011, 11: article 33.View Article
- Rajam MVCN, Saiprasad Goud P, Singh D, Kashyap V, Choudhary ML, Sihachakr D: Thaumatin gene confers resistance to fungal pathogen as well as tolerance to abiotic stresses in transgenic tobacco plants. Biol Plant 2007, 51: 135-141. 10.1007/s10535-007-0026-8View Article
- Sakamoto YWH, Nagai M, Nakade K, Takahashi M, Sato T: Lentinula edodes tlg1 Encodes a Thaumatin-Like Protein That Is Involved in Lentinan Degradation and Fruiting Body Senescence. Plant Physiol Biochem 2006, 141: 793-801.
- Schimoler-O’Rourke RRM, Selitrennikoff CP: Zeamatin Inhibits Trypsin and α-Amylase Activities. Appl Env Microbiol 2001, 67: 2365-2366. 10.1128/AEM.67.5.2365-2366.2001View Article
- Singh NK, Handa AK, Hasegawa PM, Bressan RA: Proteins Associated with Adaptation of Cultured Tobacco Cells to NaCl. Plant Physiol 1985, 79: 118-125. 10.1104/pp.79.1.118View Article
- Su CH, Shih CH, Chang TH, Tsai HK: Genome-wide analysis of the cis-regulatory modules of divergent gene pairs in yeast. Genomics 2010,96(6):352-361. 10.1016/j.ygeno.2010.08.008View Article
- Vigers AJRW, Selitrennikoff CP: A new family of plant antifungal proteins. Mol Plant Microbe Interact 1991, 4: 315-323. 10.1094/MPMI-4-315View Article
- Yu L, Niu JS, Ma ZQ, Chen PD, Liu DJ: Cloning, mapping and protein expression of wheat thaumatin protein gene (TaTLP1). Yi chuan xue bao =. Acta genetica Sinica 2003,30(1):49-55.
- Yun DJZY, Pardo JM, Narasimhan ML, Damsz B, Lee H, Abad LR, D’Urzo MP, Hasegawa P, Bressan RA: Stress proteins on the yeast cell surface determine resistance to osmotin, a plant antifungal protein. Natl Acad Sci 1997,94(13):7082-7087. 10.1073/pnas.94.13.7082View Article
- Zhang HSX, Wang H, Zhang X: MIClique: An Algorithm to Identify Differentially Coexpressed Disease Gene Subset from Microarray Data. Biomedicine and, Biotechnology; 2009.
- Zhong BX, Shen YW: Accumulation of pathogenesis-related type-5 like proteins in phytoplasma-infected garland chrysanthemum Chrysanthemum coronarium. Acta Biochim Biophys Sin 2004,36(11):773-779. 10.1093/abbs/36.11.773View Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.