Characterization of x-type high-molecular-weight glutenin promoters (x-HGP) from different genomes in Triticeae

The sequences of x-type high-molecular-weight glutenin promoter (x-HGP) from 21 diploid Triticeae species were cloned and sequenced. The lengths of x-HGP varied from 897 to 955 bp, and there are 329 variable sites including 105 singleton sites and 224 polymorphic sites. Genetic distances of pairwise X-HGP sequences ranged from 0.30 to 16.40% within 21 species and four outgroup species of Hordeum. All five recognized regulatory elements emerged and showed higher conservation in the x-HGP of 21 Triticeae species. Most variations were distributed in the regions among or between regulatory elements. A 22 bp and 50 bp insertions which were the copy of adjacent region with minor change, were found in the x-HGP of Ae. speltoides and Ps. Huashanica, and could be regarded as genome specific indels. The phylogeny of media-joining network and neighbour-joining tree both supported the topology were composed of three sperate clusters. Especially, the cluster I comprising the x-HGP sequences of Aegilops, Triticum, Henrardia, Agropyron and Taeniatherum was highly supporting by both network and NJ tree. As conferring to higher level and temporal and spatial expression, x-HGP can used as the source of promoter for constructing transgenic plants which allow endosperm-specific expression of exogenous gene on higher level. In addition, the x-HGP has enough conservation and variation; so it should be valuable in phylogenetic analyses of Triticeae family members.


Introduction
In wheat and its relatives, high-molecular-weight glutenin subunits (HMW-GSs) are one of the most important storage proteins in seed endosperm as their significant effects on wheat processing quality (Lawrence and Shepherd 1980;Payne 1987;Shewry et al. 1992). HMW-GSs are critical in determining wheat gluten and dough elasticity which promote the formation of the larger glutenin polymer (Shewry et al. 1995). The genes encoding for HMW-GSs are designated as Glu-1 loci locating on the long arms of the Group 1 chromosomes in bread wheat. Each Glu-1 locus consists of 2 tightly linked genes encoding an x-type subunit with a larger molecular weight and a y-type subunit with a smaller one, respectively (Payne 1987). Up to now, a lot of studies have been conducted in identifications and function analysis of HMW-GS genes from wheat and its wild relatives (Anderson and Greene 1989;Forde et al. 1985;Halford et al. 1987;Jiang et al. 2012a;Jiang et al. 2012b;Jiang et al. 2009;Liu et al. 2003Liu et al. , 2007Liu et al. , 2008Liu et al. , 2010Sugiyama et al. 1985; Thompson et al. 1985;Wan et al. 2005). HMW-GS genes and other seed protein encoding genes share similar expression pattern of tissue-specific and developmental regulation even though they have different regulatory elements (Lamacchia et al. 2001;Shewry and Halford 2002). Previous studies indicated that highmolecular-weight glutenin promoter (HGP) contains five recognized regulatory elements, they are transcription start site, TATA box, complete HMW enhancer, partial HMW enhancer, the prolamin box like element which is composed of two relatively conserved motifs: the endosperm motif (E motif) and the GCN4-like motif (N motif) (Hammond-Kosack et al. 1993;Müller and Knudsen 1993). Based on the regulation of these elements, the encoding genes of HMW-GS exhibit a higher expression level than those of other seed storage proteins (Lamacchia et al. 2001). The grasses of the Triticeae tribe include huge number of wheat and its relatives, which has been widely researched as genetic resource for wheat quality improvement programs. For example, previous reports revealed that wild species has abundant HMW-GS variants which confers to different structural feature and expression level from those of common wheat (Jiang et al. 2012a;Liu et al. 2010;Wan et al. 2002;2005).
In previous study, we have characterized y-type HGP and its cis regulatory elements from 25 Triticeae species (Jiang et al. 2010). In this study, we further reported the characterization of x-type high-molecular-weight glutenin promoter (x-HGP) in 21 diploid Triticeae species. The objective of this study is to investigate molecular information for x-HGP in 21 diploid species of Triticeae, and characterize regulatory elements, and explore phylogenetic relationship among x-HGP of different species of Triticeae.

Plant materials
Twenty-one diploid species of Triticeae were investigated in this study, and four Hordeum species were used as outgroup (Table 1). The accessions with PI numbers were kindly provided by USDA-ARS (http://www.arsgrin.gov/npgs/). The accessions with AS numbers were deposited at Triticeae Research Institute, Sichuan Agricultural University, China.
Isolation and sequencing of x-HGP from Triticeae species Genomic DNA was extracted from the leaves of twoweek-old single plant by using CTAB extraction method (Murray and Thompson 1980). To design x-type specific primers, we aligned the published sequences of HMW glutenin genes 1Ax1 (GenBank: X61009), 1Ax2* (GenBank: M22208), 1Bx7 (GenBank: X13927), 1Bx17 (GenBank: JC2099), 1Dx2 (GenBank: X03346), 1Dx5 (GenBank: X12928), 1Ay (GenBank: X03042) 1By9 (GenBank: X61026), 1Dy10 (GenBank: X12929), and 1Dy12 (GenBank: X03041). According to the results of alignment, a pair of primers (HGPF and HGPxR) was designed to specifically amplify x-HGP. The HGPF1 primer (5 0 -AGGGAAAGACAATGGACATG -3 0 ) was designed from the sequence which was highly conserved in the 5 0 upstream regions of both x-type and y-type HGP, whereas the HGPxR1 primer (5 0 -GTCTCGGAGC/TTG C/TTGGTC-3 0 ) was targeted to the sequence coding for six amino acid residues (DQQLRD) which appear only in the N-terminal domain of x-type HMW-GSs ( Figure 1). The amplification profile was 94°C for 5 min, followed by 35 cycles of 94°C for 45 sec, 60°C for 1 min, and 72°C for 2 min 30 sec, and a final extension step at 72°C for 10 min. High-fidelity LA Taq polymerase (Takara, Dalian, China) was used in the PCR reactions to avoid introducing errors into the sequence. The amplified products were separated by 1.0% agarose gels. Purified PCR products were then ligated into pMD19-T vector (Takara, Dalian China). The amplified products were purified and ligated into the pMD19-T vector (TaKaRa, Dalian, China). The cloned fragments were sequenced in both directions by a commercial company (Invitrogen, Shanghai, China). The sequencing results of three independent clones at least were used to determine the final nucleotide sequence of each species. All the DNA sequences have been deposited into the NCBI database with the GenBank accession numbers from KC478921 to KC478941 (Table 1).

Data analyses
The sequence prediction was performed by DNAman software package (Version 5. 2. 10; Lynnon Biosoft). The sequence alignment was carried out with Clustal W Version 1.83 (Thompson et al. 1994). The alignment was further improved by visual examination and manual adjustment. The y-HGP sequences of four Hordeum species were used as outgroup. The genetic distance was calculated by using the software Mega (Version 4.02) with the parameters, nucleotide model: Kimura 2-parameter, and substitution: Transitions + Transversions (Tamura et al. 2007). To enhance the comparison between wheat and its relatives, the sites with informative variations were used to construct media-joining network in program Network 4.6.1.1 (http:// www.fluxus-engineering.com) with the following parameters of weights = 10, epsilon = 0 and the transversions /transitions ratio was set to 3:1 (Allaby and Brown 2001; Bandelt et al. 1999). The media-joining network was calculated under the parameters of weights = 10, epsilon = 0 and the transversions /transitions ratio was set to 3:1 (Allaby and Brown 2001). The neighbour-joining (NJ) tree was constructed to estimate the possibility of phylogenetic clade, under the substitute model of Maximum Composite Likelihood; gaps were treated as missing data. To estimate the topological robustness, the bootstrap values were calculated based on 1000 replications.

Sequence variation and structural characteristics of x-HGP
In genomic PCR, there is only one fragment of approximately 1200 bp were amplified in each of 21 diploid Triticeae species by using the x-HGP specific primers HGPF1 + HGPxR1 ( Figure 2). The PCR fragments were cloned and sequenced. And the final x-HGP sequence of each species was assembled by at least three independent clones. The results of sequencing showed that the lengths of x-HGP from which the sequences encoding signal peptide and partial N-terminal varied from 897 to 955 bp. The x-HGP sequences were different from each other by substitutions, insertions and deletions of single or more nucleotides. Although there is difference in DNA sequences, the x-HGP exhibit higher conservation among different genomes of Triticeae. For all the sequences, there are 329 variable sites including 105 singleton sites and 224 polymorphic sites, of which 192sites were informative ( Figure 3). According to the sequence characteristics and location of identified elements, we characterized all five recognized regulatory elements and summarized their variations in Table 2. The sequences of these regulatory elements showed higher conservation, for example, the N motif share perfect identical sequences among all 21 species of Triticeae. The sequence variations of rest of elements only resulted from single or few base substitutions  Table 3. The transitions/transversions ratios of the x-HGP sequences varied from 0 to 21, showing the nucleotide substitution rates were unequal within Triticeae. Genetic distances of pairwise X-HGP sequences ranged from 0.30 to 16.40% within 21 species and four outgroup species of Hordeum (Table 3). The pairwise x-HGP divergence values were low and were coincided to higher conservation of x-HGP sequences in different genomes of Triticeae.

Phylogenetic analyses
The media-joining network analysis for 21 x-HGP and four y-HGP from different Triticeae genomes showed that the formed phylogeny is composed of three separate clusters ( Figure 5). In the cluster I, the HGP of Ae. bicornis, Ae. comosa, Ae. longissima, Ae. searsii, Ae. sharonensis, Ae. speltoides, Ae. tauschii, Ae. uniaristata, Ae. umbellulata, T. urartu, T. boeoticum, Henrardia persica, Agropyron cristatum and Taeniatherum caputmedusae were inclued. The x-HGP of all Aegilops formed the biggest subcluster around which two minor clade comprising Triticum, and Henrardia persica, Agropyron cristatum and Taeniatherum caput-medusae emerged (at the top of Figure 5). The second cluster is composed of   Table 1 for species abbreviations.
Secale sylvestre, Se. strictum, Se. cereale, Th. bessarabicum, Th. elongata and Eremopyrum bonaepartis, and the x-HGP of Secale and Thinopyrum species formed a separate clade, respectively (in the middle of Figure 5). For the third cluster, it's composed of y-HPG of four Hordeum, and this cluster was further divided into two clades, one includes species of H. bogdanii and H. brevisubulatum with genome H, the other contains H. bulbosum and H. spontaneaum with the genome I (at the bottom of Figure 5). The resulted neighbour-joining (NJ) trees showed highly identical topology to media-joining network (Figure 6), strongly supporting placement of three clusters. In addition, these clusters are also supported by high bootstrap values, indicating that strong statistic support for the reliability of phylogeny.
The structure variations and evolution of x-HGP HMW-GS genes are different from other prolamin genes at a higher expressional level. Under the regulation of high-molecular-weight glutenin promoter (HGP), single active HMW-GS gene encodes a subunit accounting for approximate 2% of total protein in mature wheat seed . This indicates that HGP confer to higher expression to HMW-GS gene. In our previous study of y-HGP from Triticeae, we found the regulatory element Partial Enhancer was deleted in eight species of T. urartu, T. boeotum, Ae. umbellulata, Ae. uniaristata, H. bulbosum, H. spontaneum, H. bogdanii and H. brevisubulatum (Jiang et al. 2010). In this study, the Partial Enhancer appeared in x-HGP of all 21 species of Triticeae (Figure 4a). The obvious variations were two large insertions in spacer region between regulatory elements within x-HGP of Ae. speltoides and Ps. huashanica. And the inserted fragments are the copy of adjacent region with minor variations (Figure 4c, d). The 85 bp-fragment deletion in the promoter region of inactive HMW subunit gene 1Ay had been regarded as the possible reason for silencing of this allele (Halford et al. 1989). Our previous study revealed that this fragment has also been deleted in the active 1Ay genes (Jiang et al. 2009). Previous study indicated that the 185 bp insertion in 1Bx7 promoter do not affect the expressions of HMW-GS genes (Harberd et al. 1987). We found that HMW-GS genes were usually disrupted by the variations in ORFs, such as premature stop codons, large transposon-like elements, etc. (Harberd et al. 1987;Jiang et al. 2012b;Jiang et al. 2009). Therefore, the 22 bp and 50 bp insertion located in the regions between elements may not affect the expressions of HMW-GS genes. We conclude that this high conservation of regulatory elements is coincided to keep the tissue specificity and expression level of HMW-GS gene.

Phylogenetic analysis of x-HGP among different species of Triticeae
There is only one D-hordein gene in Hordeum, which was orthologous of HMW-GS wheat and showing homology to y-type HMW-GS (Gu et al. 2003). Sequence analysis indicated that the y-HGP sequences of Hordeum shared homology in composition of regulatory elements with that of x-HGP of 21 Triticeae species, and have enough variations (supported by average distance of 12.60 among Hordeum and other species) among them. Therefore, using the sequences of Hordeum y-HGP as outgroups was suitable in phylogenetic analysis. The resulted media-joining network and neighbour-joining tree both supported the topology were composed of three sperate clusters. The cluster I, the biggest group, was highly supporting by both network and NJ tree, mainly including the x-HGP of all nine species of Aegilops, two species of Triticum, then He. persica, Ag. cristatum and Ta. caput-medusae were place aside. This group is high similar to the ones, Aegilops-Triticum complex and the Mediterranean clade identified in y-HGP and ITS phlylogenetic analysis, respectively (Hsiao et al. 1995;Jiang et al. 2010). It could be explained by their similar distribution in Mediterranean and neighbor regions. The x-HGP of Thinopyrum, Secale and Hordeum were clustered as subcluster according to their same genome. The genus Hordeum contains about 31 diploid and polyploid species, and four sections were determined by morphological characters (von Bothmer et al. 1995). Previous phylogenetic analysis by using ITS sequences has revealed four major clades that coincide with the four genome designations in Hordeum (Blattner, 2004(Blattner, , 2006. In our study, the x-HGP phylogenetic analysis also support the similar clades in Hordeum, respectively. Our results confirmed that x-HGP, like y-HGP and ITS, all can generate a good resolution to phylogenetic relationships within Triticeae. In conclusion, according to the results of x-HGP sequences from 21 species in Triticeae, we conclude the x-HGP would be beneficial: 1) to drive exogenous gene to expresson on temporal and spatial pattern; 2) to serve as a valuable candidate in phylogenetic analyses of Triticeae.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions JQT contributed to design and carry out the experiments and wrote the draft; WXY, WCS and CX did the cloning of HWM glutenin promoters; ZQZ, ZS and LXJ finished phylogenetic analysis; LZX conducted the analysis of the data and review the manuscript; ZYL contributed to improve research program; WYM revised the manuscript. All authors have read and approved the final manuscript. Figure 5 The media-joining network derived from the x-HGP sequences from 21 diploid species of Triticeae and four y-HPG sequences of Hordeum. The x-HGP of all Aegilops formed the biggest subcluster around which two minor clade comprising Triticum, and He. persica, Ag. cristatum and Ta. caput-medusae emerged at the top of network. The topology was cluster into three main separate groups with placing PSHU aside the group II.

Figure 6
The neighbor-joining (NJ) tree derived from x-HGP sequences from 21 diploid species of Triticeae. The NJ tree was constructed by using the substitute model of Maximum Composite Likelihood. The bootstrap values were calculated based on 1000 replications to estimate the topological robustness.