Skip to main content

Discovery of novel plastid phenylalanine (trn F) pseudogenes defines a distinctive clade in Solanaceae



The plastome of embryophytes is known for its high degree of conservation in size, structure, gene content and linear order of genes. The duplication of entire tRNA genes or their arrangement in a tandem array composed by multiple pseudogene copies is extremely rare in the plastome. Pseudogene repeats of the trn F gene have rarely been described from the chloroplast genome of angiosperms.


We report the discovery of duplicated copies of the original phenylalanine (trn FGAA) gene in Solanaceae that are specific to a larger clade within the Solanoideae subfamily. The pseudogene copies are composed of several highly structured motifs that are partial residues or entire parts of the anticodon, T- and D-domains of the original trn F gene.


The Pseudosolanoid clade consists of 29 genera and includes many economically important plants such as potato, tomato, eggplant and pepper.


The plastid trn T-trn F region has been widely applied to resolve phylogeny of embryophytes (Quandt and Stech 2004; Zhao et al. 2011) and to address various questions of population genetics since the development of universal primers by Taberlet et al. (1991). This marker is located in the large single copy region of the chloroplast genome and contains a co-transcribed region consisting of three highly conserved exons that code the transfer RNA (tRNA) genes for threonine (UGU), leucine (UAA) and phenylalanine (GAA). The region is interspersed by two intergenic spacers and by a group I intron intercalated within the first and second exon of the trn L(UAA) gene. Phylogenetic results obtained with the trn T-trn F region (or part of it) should be treated with caution. This is due to the fact that some recent studies (e.g. Koch et al. 2005; Pirie et al. 2007; Schmikl et al. 2009; Vivjerberg and Bachmann 1999) have shown that there are clearly several copies of certain parts of this region. If this is ignored, it will easily lead to situations where basic requirement of homology of the characters used for phylogenetic analyses is compromised. This might lead to false hypotheses of phylogeny, especially when they are based on the analyses of only this region.

Larger structural changes (>50 bp) rarely occur in the plastome. However, duplications of the rpl 2 or rpl 23 genes (Bowman et al. 1988) or even the duplication of tRNAs (pseudogenes) are occasionally reported. The later are extremely rare in angiosperms and so far they have only been described from Asteraceae (Vijverberg and Bachmann 1999; Witzell 1999), Annonaceae (Pirie et al. 2007), Brassicaceae (Ansell et al. 2007; Koch et al. 2007; Tedder et al. 2010) and Juncaceae (Drábkova et al. 2004). In our recent study we reported a tandem repeat comprising of two to four pseudogene copies upstream of the original trn F gene in four Solanum (Solanaceae) species (Poczai and Hyvönen 2011a). We have characterized these structural duplications and shown that they consist of several highly structured motifs, which are partial residues, or entire parts of the anticodon, T- and D-domains of the original gene, but all lack the acceptor stems at the 5′ or 3′. We were further interested to evaluate the possible occurrence of complete or partial trn F pseudogenes in Solanaceae. This family contains many economically important plant species, e.g., potato (Solanum tuberosum L.), tomato (Solanum lycopersicum L.) and paprika (Capsicum annuum L.) and is under intensive phylogenetic investigation and the trn T-F plastid marker is commonly used in these studies. These sequences together with the results of molecular breeding programs provide large amount of data that is available in GenBank. During data mining we concentrated on a structured dataset generated in previous phylogenetic studies (Fukuda et al. 2001; Garcia and Olmstead 2003; Santiago-Valentin and Olmstead 2003; Bohns 2004; Clarkson et al. 2004; Levin and Miller 2005; Levin et al. 2005; Weese and Bohns 2007; Olmstead et al. 2008) that contained 195 taxa and 390 sequences. This dataset provided the basis for the latest robust phylogenetic hypothesis of the Solanaceae including 89 from the 98 (Olmstead and Bohns 2007) recognized genera. Manual search using the anticodon domain of the original trn F gene and automated tRNA recognition by CENSOR (Kohany et al. 2006) indicated the presence of pseudogene repeats in numerous genera of Solanaceae.

We used the core trn L-F dataset to map the occurrence of pseudogenic repeats on the phylogenetic tree of Solanaceae. As presented in Figure 1 the distribution of pseudogenic duplications is in congruence with the previously published phylogeny of the Solanaceae (Olmstead et al. 2008), and it is obvious that the first pseudogenic copy evolved only once at the base of a highly supported clade within the subfamily Solanoideae. Among the members of this lineage, referred here as the Pseudosolanoid clade, the anticodon domain of the trn F gene exhibits extensive gene duplications with one to seven tandemly repeated copies in close 5′-proximity of the original functional gene (Table 1). The size of each pseudogenic copy ranged between 32 and 73 bp and the anticodon domain was identified as the most conserved element. A common ATT(G)n motif is of particular interest and its modifications were found to border the 5′ of the duplicated regions in the same way as found in Brassicaceae (Ansell et al. 2007; Koch et al. 2005 and 2007; Schmikl et al. 2009; Tedder et al. 2010). Other motifs were partial residues or entire parts of the T- and D-domains. The residues of the 3′ and 5′ acceptor stems were rarely found among the copies (see Table 1). The D-domain was more conserved than the T-domain among the copies and other internal repeats (AT, AAT, ATT, AATCC) were intercalated within this region for example in genus Lycianthes (Dunal.) Hassl. In addition to these newly discovered pseudogenes we were also able to characterize putative promoter motifs showing high similarity to a sigma70-type bacterial promoter. These two elements (−35 TTGACA/-10 GAGGAT) are consistently found in the trn L-F spacer region of embryophytes, and they are believed to represent the ancient and original trn F gene promoter (Quandt et al. 2004). Interestingly, pseudogenic repeats were found to be exclusively inserted after such motifs in Solanaceae, contrary to Brassicaceae, where similar pseudogenic repeats were found only between promoter motifs in the trn L-F intergenic spacer region (Koch et al. 2005). The later finding lead Koch et al. (2005) to support the conclusion by Kanno and Hirtai (1993) that these elements should be non-functional due to the intercalated position of pseudogenes between promoters. However, this may be challenged by the position of Solanaceae pseudogenes following the −10 and −35 promoters, which are also variable in number and composition.

Figure 1
figure 1

Phylogeny of Solanaceae and the distribution and schematic structure of trn F pseudogene copies. a) Suprageneric groups recognized are indicted to the right on the tree, while major clades are collapsed at the base node and their names follow Olmstead et al. (2008). The new Pseudosolanoid clade united by the presence of pseudogenic trn F gene duplication is marked with ‘ψ’ in the Solanoideae subfamily. b) The schematic representation of the plastidic trn L-F spacer region in Solanaceae and the intercalated pseudogene copies (PSC) in the intergenic spacer region close to 5′ of the trn F gene. Pseudogene repeats are variable in number and structure and are found after the putative promoter motifs that are also variable among species. The spacer region between the first PSC and promoter motifs consists of intergenic repeats of variable length. Each PSC is separated by a common bordering motif (ATTG) at the 5′end.

Table 1 Distribution of trn F pseudogenes among Solanaceae and number of multiplicated trn F anticodon domains

The occurrence of pseudogenes provides strong evidence of relationships among some groups that had low support values in the previous analyses (e.g. Olsmtead et al. 2008). This event robustly separates the (1) Atropina (Hyoscyameae, Lycieae, Jabrosa, Latua, Nolana and Scleraphylax) and (2) Juanulloeae clades from the Pseudosolanoid clade composed by (3) Solaneae, Capsiceae, Physaleae and Datureae and (4) Salpichroina (Salpichroa Miers and Nectouxia Kunth). In clades (1) and (2) pseudogenes are absent while they appear at the basal node of clade (3) and (4). This lineage where pseudogene copies have been found includes 29 genera; here belongs also the clade of Solanum L. and Capsicum L. with many economically important plant species. However, sequence information was lacking for the genera Mellissia Hook. f. and Athenaea Adans. to confirm the presence of trn F pseudogenes. This is not surprising as available plant material of these taxa is very restricted. For example Mellissia is a genus with a single species, Mellissia begoniifolia (Roxb.) Hook. f. which is critically endangered and endemic to the island of Saint Helena. The larger clade of Solanoideae also includes several branches with low support values composed of small genera (Exodeconus Raf., Mandragora L., Nicandra (L.) Gaerten., Schultesianthus Hunz., Solandra Sw.) in the phylogeny proposed by Olmstead et al. (2008). These lineages are from the early diversification of the Solanoideae with no close relatives and all lack pseudogene repeats that could be informative to trace their ancestry.

The latest large scale phylogenetic analysis of the Solanaceae (Olmstead et al. 2008) established major clades of the family but sampling in some of the lineages can still be improved. Goldberg et al. (2010) analyzed a larger data set but they did not focus on taxonomic relationships but rather on the evolution of self-compatibility. Some studies have attempted to calibrate a molecular clock for various groups within Solanaceae, but all of these used the same (Paape et al. 2008; Poczai and Hyvönen 2011b), or only few fossil records (Dillon et al. 2009; Tu et al. 2010). Fossil record of the Solanaceae has not been reviewed recently. This urges for the re-assessment of the specimens and could potentially provide more robust calibration points for the family (Särkinen, personal communication). Latest current estimates show the age of the Pseudosolanoids to be approximately 20 My (Särkinen, personal communication), and thus the origin of the pseudogene duplications of Solanaceae to be approximately of the same Miocene age as in Brassicaceae (16–21 My; Koch et al. 2005).


Despite of the extensive studies based on sequence level characters the taxonomy of the Solanaceae is not yet completely understood. However, there is ongoing work on different levels by multiple groups to resolve phylogenetic relationships (Fukuda et al. 2001; Garcia and Olmstead 2003; Santiago-Valentin and Olmstead 2003; Bohs 2004; Clarkson et al. 2004; Levin and Miller 2005; Levin et al. 2005; Weese and Bohs 2007; Olmstead et al. 2008). There are a number of questions that should be answered regarding the discovery of trn F pseudogenes, for example: How did the duplications originate? Are the pseudogene copy numbers a useful character for phylogenetic inference? To what extent does the number of pseudogene copies vary within a single species? The evolution and structure of pseudogenic copies should be compared with others reported from different plant families especially from Brassicaceae. The potential of trn F pseudogenes as phylogenetic markers need to be investigated further in the future for better understanding of the evolution of Solanaceae. These investigations could answer what are the wider implications of the pseudogene repeats for Solanaceae studies that utilize the trn L-F spacer region.


Solanaceae sequence dataset

For the Solanaceae and several outgroups we used the trn L-F spacer data assembled by Olmstead et al. (2008). This dataset contained 195 taxa and 390 sequences generated in previous phylogenetic studies (Fukuda et al. 2001; Garcia and Olmstead 2003; Santiago-Valentin and Olmstead 2003; Bohs 2004; Clarkson et al. 2004; Levin and Miller 2005; Levin et al. 2005; Weese and Bohs 2007; Olmstead et al. 2008) and this was used to align and mask pseudogenic copies. The goal was to map the taxonomic distribution of pseudogenes at family level sampling as many genera as possible. This dataset and representative trees used in our study were previously deposited in TreeBASE (ID S2191). This alignment was also used to demonstrate copy number distribution corresponding to the published phylogenetic hypothesis that was not only based on the trn L-F spacer information but relied on sequence data from the ndh F region.

Recognition and copy number assessment of the trn F(GAA)pseudogenes

The complete chloroplast genome of Solanum bulbocastanum Dunal (DQ347958) was used to select the corresponding loci of the trn L-trn F spacer region (bp positions 48,854 to 49,382), to annotate ambiguous sequences regions, and to ensure that our interpretations are based on homologous positions. Putative pseudogene repeats were identified with screening using Repbase (Jurka 2000) with the “mask pseudogenes” and “report simple repeats” options of the online tool CENSOR (Kobany et al. 2006). This was done to identify repetitive elements by comparing our sequences to known eukaryotic repeats and prototypic sequences stored in Repbase utilizing WU-BLAST. A second search was conducted with FastPCR (Kalendar et al. 2009) using the repeat search option of the program. Under “type of repeats” we checked for simple, direct, inverted, direct antisense, and direct reverse repeats, respectively. Default values were used under a kMers repeat screening. After each search, repetitive motifs and sequences were recorded and compared with the results obtained from the Repbase search. After repeats were identified in the trn L-F IGS sequences, further structural trn F(GAA) gene elements or residues were annotated manually using the anticodon domain as reference. The annotated sequence alignment is shown in Additional file 1.

Sequence annotation and alignment

Masked pseudogenic copies were further edited using Geneious v.4.8.5 (Biomatters Ltd.). We used the Nicotiana tabacum L. complete chloroplast genome (NC001879; bp positions 49,840 to 50,318) for comparisons and to determine the subunits of pseudogenic repeats as this species lacks these gene duplications. Sequence break points were examined manually to determine the cut off points of pseudogenic copies and to identify bordering motifs. Identified copies were aligned with MUSCLE (Edgar 2004) as implemented in Geneious v.4.8.5 using default settings. The sequence alignment in FASTA format is available as Additional file 2.


  • Ansell SW, Schneider H, Pedersen N, Grunmann M, Russell SJ, Vogel JC: Recombination diversifies chloroplast trnF pseudogenes Arabidopsis lyrata . J Evol Biol 2007, 20: 2400-2411. 10.1111/j.1420-9101.2007.01397.x

    Article  Google Scholar 

  • Bohs L: A chloroplast DNA phylogeny of Solanum section Lasiocarpa . Syst Bot 2004, 29: 177-187. 10.1600/036364404772974310

    Article  Google Scholar 

  • Bowman CM, Barker RF, Dyer TA: The location and possible evolutionary significance of small dispersed repeats in wheat ctDNA. Curr Genet 1988, 10: 931-941.

    Article  Google Scholar 

  • Clarkson JJ, Knapp S, Garcia VF, Olmstead RG, Leitch AR, Chase MW: Phylogenetic relationships in Nicotiana (Solanaceae) inferred from multiple plastid DNA regions. Mol Phylogenet Evol 2004, 33: 75-90. 10.1016/j.ympev.2004.05.002

    Article  Google Scholar 

  • Dillon MO, Tu T, Xie L, Quipuscoa Silvestre V, Wen J: Biogeographic diversification in Nolana (Solanaceae), a ubiquitous member of the Atacama and Peruvian Deserts along the western coast of South America. J Syst Evol 2009, 47: 457-476. 10.1111/j.1759-6831.2009.00040.x

    Article  Google Scholar 

  • Drábkova L, Kirschner J, Vlček Č, Paček V: TrnL-trnF intergenic spacer and trnL intron define major clades within Luzula and Juncus (Juncaceae): importance of structural mutations. J Mol Evol 2004, 59: 1-10. 10.1007/s00239-004-2598-7

    Article  Google Scholar 

  • Edgar RC: MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32: 1792-1797. 10.1093/nar/gkh340

    Article  Google Scholar 

  • Fukuda T, Yokoyama J, Ohashi H: Phylogeny and biogeography of the genus Lycium (Solanaceae): inferences from chloroplast DNA sequences. Mol Phylogenet Evol 2001, 19: 246-258. 10.1006/mpev.2001.0921

    Article  Google Scholar 

  • Garcia VF, Olmstead RG: Phylogenetics of tribe Anthocercideae (Solanaceae) based on ndhF and trnL/F sequence data. Syst Bot 2003, 28: 609-615.

    Google Scholar 

  • Goldberg EE, Kohn JR, Lande R, Robertson KA, Smith SA, Igic B: Species selection maintains self-incompatibility. Science 2010, 330: 459-460. 10.1126/science.1198063

    Article  Google Scholar 

  • Jurka J: Repbase update: a database and an electronic journal of repetitive elements. Trends Genet 2000, 9: 418-420.

    Article  Google Scholar 

  • Kalendar R, Lee D, Schulman AH: FastPCR software for PCR primer and probe design and repeat search. Genes Genomes Genomics 2009, 3: 1-14.

    Google Scholar 

  • Kanno A, Hirtai A: A transcription map of the chloroplast genome from rice ( Oryza sativa ). Curr Genet 1993, 23: 166-174. 10.1007/BF00352017

    Article  Google Scholar 

  • Koch MA, Dobeš C, Matschinger M, Bleeker W, Vogel J, Kiefer M, Mitchell-Olds T: Evolution of the trnF(GAA) gene in Arabidopsis relatives and the Brassicaceae family: monophyletic origin and subsequent diversification of a plastidic pseudogene. Mol Biol Evol 2005, 22: 1032-1043. 10.1093/molbev/msi092

    Article  Google Scholar 

  • Koch MA, Dobeš C, Kiefer C, Schmickl R, Klimeš L, Lysak MA: Supernetwork identifies multiple events of plastid trnF(GAA) pseudogene evolution in the Brassicaceae. Mol Biol Evol 2007, 24: 63-73.

    Article  Google Scholar 

  • Kohany O, Gentles AJ, Hankus L, Jurka I: Annotation, submission and screening of repetitive elements in repbase: RepbaseSubmitter and censor. BMC Bioinf 2006, 7: 474. 10.1186/1471-2105-7-474

    Article  Google Scholar 

  • Levin RA, Miller JS: Relationships within tribe Lycieae (Solanaceae): paraphyly of Lycium and multiple origins of gender dimorphism. Am J Bot 2005, 92: 2044-2053. 10.3732/ajb.92.12.2044

    Article  Google Scholar 

  • Levin RA, Watson K, Bohs L: A four-gene study of evolutionary relationships in Solanum section Acanthophora . Am J Bot 2005, 92: 603-612. 10.3732/ajb.92.4.603

    Article  Google Scholar 

  • Olmstead RG, Bohs L: A summary of molecular systematic research in Solanaceae: 1982–2006. In Solanaceae VI: genomics meets biodiversity. Proceedings of the sixth international Solanaceae conference. Edited by: Spooner DM, Bohs L, Giovannoni J, Olmstead RG, Shibata D. Leuven: Acta Horticulturae 745. International Society for Horticultural Science; 2007:255-268.

    Google Scholar 

  • Olmstead RG, Bohs L, Migid HA, Santiago-Valentin E, Garcia VF, Collier SM: A molecular phylogeny of the Solanaceae. Taxon 2008, 57: 1159-1181.

    Google Scholar 

  • Paape T, Igic B, Smith SD, Olmstead R, Bohs L, Kohn JR: A 15-myr-old genetic bottleneck. Mol Biol Evol 2008, 25: 655-663. 10.1093/molbev/msn016

    Article  Google Scholar 

  • Pirie MD, Vargas MPB, Botermans M, Bakker FT, Chatrou LW: Ancient paralogy in the cpDNA trnL-F region in Annonaceae: implications for plant molecular systematics. Am J Bot 2007, 94: 1003-1016. 10.3732/ajb.94.6.1003

    Article  Google Scholar 

  • Poczai P, Hyvönen J: Identification and characterization of plastid trnF(GAA) pseudogenes in four species of Solanum (Solanaceae). Biotechnol Lett 2011, 33: 2317-2323. 10.1007/s10529-011-0701-x

    Article  Google Scholar 

  • Poczai P, Hyvönen J: Phylogeny of kangaroo apples ( Solanum subg. Archaesolanum , Solanaceae). Mol Biol Rep 2011, 38: 5243-5259. 10.1007/s11033-011-0675-8

    Article  Google Scholar 

  • Quandt D, Stech M: Molecular evolution of the trn TUGU- trn FGAA region in bryophytes. Plant Biol 2004, 6: 545-554. 10.1055/s-2004-821144

    Article  Google Scholar 

  • Quandt D, Müller K, Stech M, Frahm J-P, Frey W, Hilu KW, Borsch T: Molecular evolution of the chloroplast trnL-F region in land plants. Monogr Syst Bot Missouri Bot Gard 2004, 98: 13-37.

    Google Scholar 

  • Santiago-Valentin E, Olmstead RG: Phylogenetics of the Antillean Goetzeoideae (Solanaceae) and their relationships within the Solanaceae based on chloroplast and ITS DNA sequence data. Syst Bot 2003, 28: 452-460.

    Google Scholar 

  • Schmickl R, Keifer C, Dobeš C, Koch MA: Evolution of trnF(GAA) pseudogenes in cruciferous plants. Plant Syst Evol 2009, 282: 229-240. 10.1007/s00606-008-0030-2

    Article  Google Scholar 

  • Taberlet P, Gielly L, Pautou G, Bouvet J: Universal primers for amplification of three non-coding regions of chloroplast DNA. Plant Mol Biol 1991, 17: 1105-1109. 10.1007/BF00037152

    Article  Google Scholar 

  • Tedder A, Hoebe PN, Ansell SW, Mable BK: Using chloroplast trnF pseudogenes for phylogeography in Arabidopsis lyrata . Diversity 2010, 2: 653-678. 10.3390/d2040653

    Article  Google Scholar 

  • Tu T, Volis S, Dillon MO, Sun H, Wen J: Dispersal of Hyoscyameae and Mandragoreae (Solanaceae) from the New World to Eurasia in the early Miocene and their biogeographic diversification within Eurasia. Mol Phyl Evol 2010, 57: 1226-1237. 10.1016/j.ympev.2010.09.007

    Article  Google Scholar 

  • Vijverberg K, Bachmann K: Molecular evolution of tandemly repeated trnF(GAA) gene in the chloroplast genomes of Microseris (Asteraceae) and the use of structural mutations in phylogenetic analysis. Mol Biol Evol 1999, 16: 1329-1340. 10.1093/oxfordjournals.molbev.a026043

    Article  Google Scholar 

  • Weese TL, Bohs L: A three-gene phylogeny of the genus Solanum (Solanaceae). Syst Bot 2007, 32: 445-463. 10.1600/036364407781179671

    Article  Google Scholar 

  • Wittzell H: Chloroplast DNA variation and reticulate evolution in sexual and apomictic sections of dandelions. Mol Ecol 1999, 8: 2023-2035. 10.1046/j.1365-294x.1999.00807.x

    Article  Google Scholar 

  • Zhao T, Wang Z-T, Branford-White CJ, Xu H, Wang C-H: Classification and differentiation of the genus Peganum indigenous to China based on chloroplast trnL-F and psbA-trnH sequences and seed coat morphology. Plant Biol 2011, 13: 940-947. 10.1111/j.1438-8677.2011.00455.x

    Article  Google Scholar 

Download references


PP gratefully acknowledges support from the Marie Curie Fellowship Grant (PIEF-GA-2011-300186) under the seventh framework program of the European Union. We thank Neil Bell for discussions on the manuscript.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Péter Poczai.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

PP conceived the study and drafted the early version of the manuscript and performed database search. JH commented on the manuscript, revised the text and structure, and outlined it several times together with PP. Both authors approved the final manuscript.

Electronic supplementary material


Additional file 1: Annotated sequence alignment of pseudogene repeats found in Solanaceae. Major parts of the trn F gene are marked as D- and T-domains and anticodon in the middle together with bordering 5′ and 3′ acceptor stems. The trn F gene of Nicotiana tabacum is used as a reference sequence to align different pseudogenes. (PDF 4 MB)

Additional file 1: Sequence alignment of pseudogene copies.(FASTA 41 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Poczai, P., Hyvönen, J. Discovery of novel plastid phenylalanine (trn F) pseudogenes defines a distinctive clade in Solanaceae. SpringerPlus 2, 459 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Chloroplast DNA (cpDNA)
  • Gene duplications
  • Phylogeny
  • Plastome evolution
  • Tandem repeats
  • trn L-trn F
  • Solanaceae