- Short report
- Open Access
Discovery of novel plastid phenylalanine (trn F) pseudogenes defines a distinctive clade in Solanaceae
© Poczai and Hyvönen; licensee Springer. 2013
- Received: 27 June 2013
- Accepted: 11 September 2013
- Published: 12 September 2013
The plastome of embryophytes is known for its high degree of conservation in size, structure, gene content and linear order of genes. The duplication of entire tRNA genes or their arrangement in a tandem array composed by multiple pseudogene copies is extremely rare in the plastome. Pseudogene repeats of the trn F gene have rarely been described from the chloroplast genome of angiosperms.
We report the discovery of duplicated copies of the original phenylalanine (trn FGAA) gene in Solanaceae that are specific to a larger clade within the Solanoideae subfamily. The pseudogene copies are composed of several highly structured motifs that are partial residues or entire parts of the anticodon, T- and D-domains of the original trn F gene.
The Pseudosolanoid clade consists of 29 genera and includes many economically important plants such as potato, tomato, eggplant and pepper.
- Chloroplast DNA (cpDNA)
- Gene duplications
- Plastome evolution
- Tandem repeats
- trn L-trn F
The plastid trn T-trn F region has been widely applied to resolve phylogeny of embryophytes (Quandt and Stech 2004; Zhao et al. 2011) and to address various questions of population genetics since the development of universal primers by Taberlet et al. (1991). This marker is located in the large single copy region of the chloroplast genome and contains a co-transcribed region consisting of three highly conserved exons that code the transfer RNA (tRNA) genes for threonine (UGU), leucine (UAA) and phenylalanine (GAA). The region is interspersed by two intergenic spacers and by a group I intron intercalated within the first and second exon of the trn L(UAA) gene. Phylogenetic results obtained with the trn T-trn F region (or part of it) should be treated with caution. This is due to the fact that some recent studies (e.g. Koch et al. 2005; Pirie et al. 2007; Schmikl et al. 2009; Vivjerberg and Bachmann 1999) have shown that there are clearly several copies of certain parts of this region. If this is ignored, it will easily lead to situations where basic requirement of homology of the characters used for phylogenetic analyses is compromised. This might lead to false hypotheses of phylogeny, especially when they are based on the analyses of only this region.
Larger structural changes (>50 bp) rarely occur in the plastome. However, duplications of the rpl 2 or rpl 23 genes (Bowman et al. 1988) or even the duplication of tRNAs (pseudogenes) are occasionally reported. The later are extremely rare in angiosperms and so far they have only been described from Asteraceae (Vijverberg and Bachmann 1999; Witzell 1999), Annonaceae (Pirie et al. 2007), Brassicaceae (Ansell et al. 2007; Koch et al. 2007; Tedder et al. 2010) and Juncaceae (Drábkova et al. 2004). In our recent study we reported a tandem repeat comprising of two to four pseudogene copies upstream of the original trn F gene in four Solanum (Solanaceae) species (Poczai and Hyvönen 2011a). We have characterized these structural duplications and shown that they consist of several highly structured motifs, which are partial residues, or entire parts of the anticodon, T- and D-domains of the original gene, but all lack the acceptor stems at the 5′ or 3′. We were further interested to evaluate the possible occurrence of complete or partial trn F pseudogenes in Solanaceae. This family contains many economically important plant species, e.g., potato (Solanum tuberosum L.), tomato (Solanum lycopersicum L.) and paprika (Capsicum annuum L.) and is under intensive phylogenetic investigation and the trn T-F plastid marker is commonly used in these studies. These sequences together with the results of molecular breeding programs provide large amount of data that is available in GenBank. During data mining we concentrated on a structured dataset generated in previous phylogenetic studies (Fukuda et al. 2001; Garcia and Olmstead 2003; Santiago-Valentin and Olmstead 2003; Bohns 2004; Clarkson et al. 2004; Levin and Miller 2005; Levin et al. 2005; Weese and Bohns 2007; Olmstead et al. 2008) that contained 195 taxa and 390 sequences. This dataset provided the basis for the latest robust phylogenetic hypothesis of the Solanaceae including 89 from the 98 (Olmstead and Bohns 2007) recognized genera. Manual search using the anticodon domain of the original trn F gene and automated tRNA recognition by CENSOR (Kohany et al. 2006) indicated the presence of pseudogene repeats in numerous genera of Solanaceae.
Distribution of trn F pseudogenes among Solanaceae and number of multiplicated trn F anticodon domains
6h , i
3g , i
The occurrence of pseudogenes provides strong evidence of relationships among some groups that had low support values in the previous analyses (e.g. Olsmtead et al. 2008). This event robustly separates the (1) Atropina (Hyoscyameae, Lycieae, Jabrosa, Latua, Nolana and Scleraphylax) and (2) Juanulloeae clades from the Pseudosolanoid clade composed by (3) Solaneae, Capsiceae, Physaleae and Datureae and (4) Salpichroina (Salpichroa Miers and Nectouxia Kunth). In clades (1) and (2) pseudogenes are absent while they appear at the basal node of clade (3) and (4). This lineage where pseudogene copies have been found includes 29 genera; here belongs also the clade of Solanum L. and Capsicum L. with many economically important plant species. However, sequence information was lacking for the genera Mellissia Hook. f. and Athenaea Adans. to confirm the presence of trn F pseudogenes. This is not surprising as available plant material of these taxa is very restricted. For example Mellissia is a genus with a single species, Mellissia begoniifolia (Roxb.) Hook. f. which is critically endangered and endemic to the island of Saint Helena. The larger clade of Solanoideae also includes several branches with low support values composed of small genera (Exodeconus Raf., Mandragora L., Nicandra (L.) Gaerten., Schultesianthus Hunz., Solandra Sw.) in the phylogeny proposed by Olmstead et al. (2008). These lineages are from the early diversification of the Solanoideae with no close relatives and all lack pseudogene repeats that could be informative to trace their ancestry.
The latest large scale phylogenetic analysis of the Solanaceae (Olmstead et al. 2008) established major clades of the family but sampling in some of the lineages can still be improved. Goldberg et al. (2010) analyzed a larger data set but they did not focus on taxonomic relationships but rather on the evolution of self-compatibility. Some studies have attempted to calibrate a molecular clock for various groups within Solanaceae, but all of these used the same (Paape et al. 2008; Poczai and Hyvönen 2011b), or only few fossil records (Dillon et al. 2009; Tu et al. 2010). Fossil record of the Solanaceae has not been reviewed recently. This urges for the re-assessment of the specimens and could potentially provide more robust calibration points for the family (Särkinen, personal communication). Latest current estimates show the age of the Pseudosolanoids to be approximately 20 My (Särkinen, personal communication), and thus the origin of the pseudogene duplications of Solanaceae to be approximately of the same Miocene age as in Brassicaceae (16–21 My; Koch et al. 2005).
Despite of the extensive studies based on sequence level characters the taxonomy of the Solanaceae is not yet completely understood. However, there is ongoing work on different levels by multiple groups to resolve phylogenetic relationships (Fukuda et al. 2001; Garcia and Olmstead 2003; Santiago-Valentin and Olmstead 2003; Bohs 2004; Clarkson et al. 2004; Levin and Miller 2005; Levin et al. 2005; Weese and Bohs 2007; Olmstead et al. 2008). There are a number of questions that should be answered regarding the discovery of trn F pseudogenes, for example: How did the duplications originate? Are the pseudogene copy numbers a useful character for phylogenetic inference? To what extent does the number of pseudogene copies vary within a single species? The evolution and structure of pseudogenic copies should be compared with others reported from different plant families especially from Brassicaceae. The potential of trn F pseudogenes as phylogenetic markers need to be investigated further in the future for better understanding of the evolution of Solanaceae. These investigations could answer what are the wider implications of the pseudogene repeats for Solanaceae studies that utilize the trn L-F spacer region.
Solanaceae sequence dataset
For the Solanaceae and several outgroups we used the trn L-F spacer data assembled by Olmstead et al. (2008). This dataset contained 195 taxa and 390 sequences generated in previous phylogenetic studies (Fukuda et al. 2001; Garcia and Olmstead 2003; Santiago-Valentin and Olmstead 2003; Bohs 2004; Clarkson et al. 2004; Levin and Miller 2005; Levin et al. 2005; Weese and Bohs 2007; Olmstead et al. 2008) and this was used to align and mask pseudogenic copies. The goal was to map the taxonomic distribution of pseudogenes at family level sampling as many genera as possible. This dataset and representative trees used in our study were previously deposited in TreeBASE (ID S2191). This alignment was also used to demonstrate copy number distribution corresponding to the published phylogenetic hypothesis that was not only based on the trn L-F spacer information but relied on sequence data from the ndh F region.
Recognition and copy number assessment of the trn F(GAA)pseudogenes
The complete chloroplast genome of Solanum bulbocastanum Dunal (DQ347958) was used to select the corresponding loci of the trn L-trn F spacer region (bp positions 48,854 to 49,382), to annotate ambiguous sequences regions, and to ensure that our interpretations are based on homologous positions. Putative pseudogene repeats were identified with screening using Repbase (Jurka 2000) with the “mask pseudogenes” and “report simple repeats” options of the online tool CENSOR (Kobany et al. 2006). This was done to identify repetitive elements by comparing our sequences to known eukaryotic repeats and prototypic sequences stored in Repbase utilizing WU-BLAST. A second search was conducted with FastPCR (Kalendar et al. 2009) using the repeat search option of the program. Under “type of repeats” we checked for simple, direct, inverted, direct antisense, and direct reverse repeats, respectively. Default values were used under a kMers repeat screening. After each search, repetitive motifs and sequences were recorded and compared with the results obtained from the Repbase search. After repeats were identified in the trn L-F IGS sequences, further structural trn F(GAA) gene elements or residues were annotated manually using the anticodon domain as reference. The annotated sequence alignment is shown in Additional file 1.
Sequence annotation and alignment
Masked pseudogenic copies were further edited using Geneious v.4.8.5 (Biomatters Ltd.). We used the Nicotiana tabacum L. complete chloroplast genome (NC001879; bp positions 49,840 to 50,318) for comparisons and to determine the subunits of pseudogenic repeats as this species lacks these gene duplications. Sequence break points were examined manually to determine the cut off points of pseudogenic copies and to identify bordering motifs. Identified copies were aligned with MUSCLE (Edgar 2004) as implemented in Geneious v.4.8.5 using default settings. The sequence alignment in FASTA format is available as Additional file 2.
PP gratefully acknowledges support from the Marie Curie Fellowship Grant (PIEF-GA-2011-300186) under the seventh framework program of the European Union. We thank Neil Bell for discussions on the manuscript.
- Ansell SW, Schneider H, Pedersen N, Grunmann M, Russell SJ, Vogel JC: Recombination diversifies chloroplast trnF pseudogenes Arabidopsis lyrata . J Evol Biol 2007, 20: 2400-2411. 10.1111/j.1420-9101.2007.01397.xView ArticleGoogle Scholar
- Bohs L: A chloroplast DNA phylogeny of Solanum section Lasiocarpa . Syst Bot 2004, 29: 177-187. 10.1600/036364404772974310View ArticleGoogle Scholar
- Bowman CM, Barker RF, Dyer TA: The location and possible evolutionary significance of small dispersed repeats in wheat ctDNA. Curr Genet 1988, 10: 931-941.View ArticleGoogle Scholar
- Clarkson JJ, Knapp S, Garcia VF, Olmstead RG, Leitch AR, Chase MW: Phylogenetic relationships in Nicotiana (Solanaceae) inferred from multiple plastid DNA regions. Mol Phylogenet Evol 2004, 33: 75-90. 10.1016/j.ympev.2004.05.002View ArticleGoogle Scholar
- Dillon MO, Tu T, Xie L, Quipuscoa Silvestre V, Wen J: Biogeographic diversification in Nolana (Solanaceae), a ubiquitous member of the Atacama and Peruvian Deserts along the western coast of South America. J Syst Evol 2009, 47: 457-476. 10.1111/j.1759-6831.2009.00040.xView ArticleGoogle Scholar
- Drábkova L, Kirschner J, Vlček Č, Paček V: TrnL-trnF intergenic spacer and trnL intron define major clades within Luzula and Juncus (Juncaceae): importance of structural mutations. J Mol Evol 2004, 59: 1-10. 10.1007/s00239-004-2598-7View ArticleGoogle Scholar
- Edgar RC: MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32: 1792-1797. 10.1093/nar/gkh340View ArticleGoogle Scholar
- Fukuda T, Yokoyama J, Ohashi H: Phylogeny and biogeography of the genus Lycium (Solanaceae): inferences from chloroplast DNA sequences. Mol Phylogenet Evol 2001, 19: 246-258. 10.1006/mpev.2001.0921View ArticleGoogle Scholar
- Garcia VF, Olmstead RG: Phylogenetics of tribe Anthocercideae (Solanaceae) based on ndhF and trnL/F sequence data. Syst Bot 2003, 28: 609-615.Google Scholar
- Goldberg EE, Kohn JR, Lande R, Robertson KA, Smith SA, Igic B: Species selection maintains self-incompatibility. Science 2010, 330: 459-460. 10.1126/science.1198063View ArticleGoogle Scholar
- Jurka J: Repbase update: a database and an electronic journal of repetitive elements. Trends Genet 2000, 9: 418-420.View ArticleGoogle Scholar
- Kalendar R, Lee D, Schulman AH: FastPCR software for PCR primer and probe design and repeat search. Genes Genomes Genomics 2009, 3: 1-14.Google Scholar
- Kanno A, Hirtai A: A transcription map of the chloroplast genome from rice ( Oryza sativa ). Curr Genet 1993, 23: 166-174. 10.1007/BF00352017View ArticleGoogle Scholar
- Koch MA, Dobeš C, Matschinger M, Bleeker W, Vogel J, Kiefer M, Mitchell-Olds T: Evolution of the trnF(GAA) gene in Arabidopsis relatives and the Brassicaceae family: monophyletic origin and subsequent diversification of a plastidic pseudogene. Mol Biol Evol 2005, 22: 1032-1043. 10.1093/molbev/msi092View ArticleGoogle Scholar
- Koch MA, Dobeš C, Kiefer C, Schmickl R, Klimeš L, Lysak MA: Supernetwork identifies multiple events of plastid trnF(GAA) pseudogene evolution in the Brassicaceae. Mol Biol Evol 2007, 24: 63-73.View ArticleGoogle Scholar
- Kohany O, Gentles AJ, Hankus L, Jurka I: Annotation, submission and screening of repetitive elements in repbase: RepbaseSubmitter and censor. BMC Bioinf 2006, 7: 474. 10.1186/1471-2105-7-474View ArticleGoogle Scholar
- Levin RA, Miller JS: Relationships within tribe Lycieae (Solanaceae): paraphyly of Lycium and multiple origins of gender dimorphism. Am J Bot 2005, 92: 2044-2053. 10.3732/ajb.92.12.2044View ArticleGoogle Scholar
- Levin RA, Watson K, Bohs L: A four-gene study of evolutionary relationships in Solanum section Acanthophora . Am J Bot 2005, 92: 603-612. 10.3732/ajb.92.4.603View ArticleGoogle Scholar
- Olmstead RG, Bohs L: A summary of molecular systematic research in Solanaceae: 1982–2006. In Solanaceae VI: genomics meets biodiversity. Proceedings of the sixth international Solanaceae conference. Edited by: Spooner DM, Bohs L, Giovannoni J, Olmstead RG, Shibata D. Leuven: Acta Horticulturae 745. International Society for Horticultural Science; 2007:255-268.Google Scholar
- Olmstead RG, Bohs L, Migid HA, Santiago-Valentin E, Garcia VF, Collier SM: A molecular phylogeny of the Solanaceae. Taxon 2008, 57: 1159-1181.Google Scholar
- Paape T, Igic B, Smith SD, Olmstead R, Bohs L, Kohn JR: A 15-myr-old genetic bottleneck. Mol Biol Evol 2008, 25: 655-663. 10.1093/molbev/msn016View ArticleGoogle Scholar
- Pirie MD, Vargas MPB, Botermans M, Bakker FT, Chatrou LW: Ancient paralogy in the cpDNA trnL-F region in Annonaceae: implications for plant molecular systematics. Am J Bot 2007, 94: 1003-1016. 10.3732/ajb.94.6.1003View ArticleGoogle Scholar
- Poczai P, Hyvönen J: Identification and characterization of plastid trnF(GAA) pseudogenes in four species of Solanum (Solanaceae). Biotechnol Lett 2011, 33: 2317-2323. 10.1007/s10529-011-0701-xView ArticleGoogle Scholar
- Poczai P, Hyvönen J: Phylogeny of kangaroo apples ( Solanum subg. Archaesolanum , Solanaceae). Mol Biol Rep 2011, 38: 5243-5259. 10.1007/s11033-011-0675-8View ArticleGoogle Scholar
- Quandt D, Stech M: Molecular evolution of the trn TUGU- trn FGAA region in bryophytes. Plant Biol 2004, 6: 545-554. 10.1055/s-2004-821144View ArticleGoogle Scholar
- Quandt D, Müller K, Stech M, Frahm J-P, Frey W, Hilu KW, Borsch T: Molecular evolution of the chloroplast trnL-F region in land plants. Monogr Syst Bot Missouri Bot Gard 2004, 98: 13-37.Google Scholar
- Santiago-Valentin E, Olmstead RG: Phylogenetics of the Antillean Goetzeoideae (Solanaceae) and their relationships within the Solanaceae based on chloroplast and ITS DNA sequence data. Syst Bot 2003, 28: 452-460.Google Scholar
- Schmickl R, Keifer C, Dobeš C, Koch MA: Evolution of trnF(GAA) pseudogenes in cruciferous plants. Plant Syst Evol 2009, 282: 229-240. 10.1007/s00606-008-0030-2View ArticleGoogle Scholar
- Taberlet P, Gielly L, Pautou G, Bouvet J: Universal primers for amplification of three non-coding regions of chloroplast DNA. Plant Mol Biol 1991, 17: 1105-1109. 10.1007/BF00037152View ArticleGoogle Scholar
- Tedder A, Hoebe PN, Ansell SW, Mable BK: Using chloroplast trnF pseudogenes for phylogeography in Arabidopsis lyrata . Diversity 2010, 2: 653-678. 10.3390/d2040653View ArticleGoogle Scholar
- Tu T, Volis S, Dillon MO, Sun H, Wen J: Dispersal of Hyoscyameae and Mandragoreae (Solanaceae) from the New World to Eurasia in the early Miocene and their biogeographic diversification within Eurasia. Mol Phyl Evol 2010, 57: 1226-1237. 10.1016/j.ympev.2010.09.007View ArticleGoogle Scholar
- Vijverberg K, Bachmann K: Molecular evolution of tandemly repeated trnF(GAA) gene in the chloroplast genomes of Microseris (Asteraceae) and the use of structural mutations in phylogenetic analysis. Mol Biol Evol 1999, 16: 1329-1340. 10.1093/oxfordjournals.molbev.a026043View ArticleGoogle Scholar
- Weese TL, Bohs L: A three-gene phylogeny of the genus Solanum (Solanaceae). Syst Bot 2007, 32: 445-463. 10.1600/036364407781179671View ArticleGoogle Scholar
- Wittzell H: Chloroplast DNA variation and reticulate evolution in sexual and apomictic sections of dandelions. Mol Ecol 1999, 8: 2023-2035. 10.1046/j.1365-294x.1999.00807.xView ArticleGoogle Scholar
- Zhao T, Wang Z-T, Branford-White CJ, Xu H, Wang C-H: Classification and differentiation of the genus Peganum indigenous to China based on chloroplast trnL-F and psbA-trnH sequences and seed coat morphology. Plant Biol 2011, 13: 940-947. 10.1111/j.1438-8677.2011.00455.xView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.