Differentiation of the Chinese minority medicinal plant genus Berchemia spp. by evaluating three candidate barcodes

The genus Berchemia comprises important Chinese plants with considerable medicinal value; however, these plants are often misidentified in the herbal medicinal market. To differentiate the various morphotypes of Berchemia species, a proficient method employing the screening of universal DNA barcodes was used in this work. Three candidate barcoding loci, namely, psbA-trnH, rbcL, and the second internal transcribed spacer (ITS2), were used to identify an effective DNA barcode that can differentiate the various Berchemia species. Additionally, PCR amplification, efficient sequencing, intra- and inter-specific divergences, and DNA barcoding gaps were employed to assess the ability of each barcode to identify these diverse Berchemia plants authentically; the species were differentiated using the Kimura two-parameter and maximum composite likelihood methods. Sequence data analysis showed that the ITS2 region was the most suitable candidate barcode and exhibited the highest interspecific divergence among the three DNA-barcoding sequences. A clear differentiation was observed at the species level, in which a maximum distance of 0.264 was exhibited between dissimilar species. Clustal analysis also demonstrated that ITS2 clearly differentiated the test species in a more effective manner than that with the two other barcodes at both the hybrid and variety levels. Results indicate that DNA barcoding is ideal for species-level identification of Berchemia and provides a foundation for further identification at the molecular level of other Rhamnaceae medicinal plants.


Background
Berchemia is a genus of plants in the Rhamnaceae family, which comprises 32 deciduous woody plants located in Asia, South America, and Africa (Huxley and Griffiths 1999). In China, Berchemia consists of 19 native species (Chen and Dong 2006), which are primarily distributed in the south, southwestern, and eastern regions (Sinicae 1988). These species include climbing plants or small-to medium-sized trees, several of which are endangered but offer significant medicinal values; these important plants include B. lineatai (Shen et al. 2010) and B. berchemiafolia (Kitamura and Murata 1984;Fu and Jin 1992;Ohwi 1984). In Japan, the roots, stems, and leaves of Berchemia plants are used to treat liver diseases, neuralgia, and gall stones; furthermore, these parts are utilized in traditional Chinese medicine (Mukhtar et al. 2004).
The characteristics, transection structure, and powder properties of Berchemia species demonstrate obvious distinct features that can be used for microscopic identification. In particular, B. lineata, B. polyphylla, and B. polyphylla var. leioclada are closely related in terms of microstructure and microscopic characteristics. However, these three species can be distinguished on the basis of the characteristics of their leaf edge; the leaf edge cells of B. lineata are round, and the cell walls are not thickened, or thickening is not obvious. The leaf edge cells of B. polyphylla are square and rectangle, and the cell walls are obviously thickened. The leaf edge cells of B. polyphylla var. leioclada are round, and the cell walls show obvious thickening Ye et al. 2013). These distinctions can provide a basis for the pharmacognostical identification of Berchemia species.
Berchemia species are highly similar in terms of apparent vegetative morphology and thus often misidentified. In Chinese herb markets, different species under the same name are sold as dried roots. Distinguishing these species merely by sight is impossible to the untrained eye. Although all the Berchemia species provide medicinal value, consuming a wrong one reduces drug efficiency and causes ill effects after prolonged usage. Therefore, the development of an accurate method to prove the authenticity of plant raw materials is necessary because traditional methods, including organoleptic trait evaluation and phytochemical and pharmacognostic methods, cannot accurately identify species .
DNA barcoding is a rapidly developing frontier technology that is gaining worldwide attention. This novel technology uses a standardized genomic DNA sequence from a standard locus as a species identification tool (Kress et al. 2005) and has become popular in species identification (Gregory 2005;Miller 2007). Barcoding is a convenient tool to identify species for nonprofessional users, such as traditional drug producers, forensic specialists, and customs officers (Xue and Li 2011). Numerous DNA barcodes exist in plants and animals, which can be used to identify species. CO1, which is used as a DNA barcode, is a powerful tool for the discrimination of closely related species in most animals (Hebert et al. 2003). In 2009, the Plant Working Group of the Consortium for the Barcode of Life (CBOL) recommended that the loci rbcL + matK can be used as core barcodes to identify plants (CBOL Plant Working Group, 2009). The psbA-trnH intergenic spacer and internal transcribed spacer (ITS)/ITS2 were also suggested as barcodes for plant identification at the Third International Barcode Conference in Mexico City Kress et al. 2005). Yao et al. (2010) proposed that the ITS2 locus, a popular phylogenetic marker, should be used as a universal DNA barcode and a complementary locus for CO1 to identify plants and animals, respectively . Pang et al. (2012) suggested that the tmH-psbA + ITS2 combination performs better or equally well in taxonomic groups, as compared with other combinations, such as matK + rbcL (Pang et al. 2012).
The present work aimed to distinguish different Berchemia species by screening three candidate loci, namely, rbcL, psbA-trnH, and ITS2, as the core barcodes and by identifying the most suitable barcode to accurately identify the members of the Berchemia genus. Furthermore, this study aimed to provide drug safety references for current medical fields.

Amplification and sequence analysis
Genomic DNA was extracted from 55 samples belonging to seven species of Berchemia. The regions ITS2, psbA-trnH, and rbcL underwent effective amplification for all the selected samples. All PCR products corresponding to these three barcodes were successfully sequenced, and high-quality bidirectional sequences were obtained. The PCR amplification size for ITS2, psbA-trnH, and rbcL ranged within 491-561, 364-470, and 729-757 bp, respectively. Table 1 shows that the amplification efficiency of ITS2 and rbcL was 100 %, and that of psbA-trnH was 92 %. These results indicated that the three barcodes were applicable for the following analysis. ITS2 presented variable sites in 17/226 bp of the aligned sites, of which 11 were parsimonyinformative, whereas psbA-trnH and rbcL showed very low variations of 6/430 bp and 5/551 bp, respectively ( Table 2).

Pairwise distance analysis
The mean interspecific genetic distances of the evaluated DNA regions are listed in Table 2. In the ITS2 region, the Berchemia interspecific distance mean was 0.026; however, the distance means of the two other candidate barcodes were 0.001 (rbcL) and 0.002 (psbA-trnH). The sequence data were further considered for pairwise distance analysis, and the ITS2 gene region was proven the most suitable for species differentiation (  (Table 3).

Clustal analysis
In this study, 55 ITS2, 51 psbA-trnH, and 55 rbcL sequences were obtained from seven selected species. The five other sequences, two ITS2 sequences (B. hirtella HG004838; B. discolor AY626455), two rbcL sequences (B. hirtella KF181534; B. discolor JF265302), and a psbA-trnH sequence (B. hirtella HG005084) were downloaded from GenBank. To evaluate the feasibility of the three candidate barcodes to differentiate the species, Clustal analysis was conducted using the neighbor-joining (NJ) method, and Ziziphus jujube belonging to Rhamnaceae was employed as outgroup. Following the phylogenetic analysis, the ITS2 region was clearly differentiated among all eight species. Overall, 21 selected ITS2 sequences from seven species and an ITS2 sequence belonging to another species obtained from the NCBI database were aligned in the NJ tree. As shown in Fig. 1a, each of the same species was divided into one group at the species level. Only the subspecies B. polyphylla var. leioclada was clustered to B. lineata. As shown in Fig. 1b (Fig. 2).

Barcoding gap
To determine whether barcoding gap existed, we assessed the distribution of divergences in Berchemia (Fig. 3). The distribution and mean of intraspecific differences were lower than the interspecific divergences, with the highest significance found for ITS2. No obvious barcoding gaps were observed in psbA-trnH and rbcL. Thus, ITS2 can distinguish among Berchemia species.
Berchemia is a folk tradition medicinal plant with wide geographic distribution in Southwest China. The roots of B. lineata and other Berchemia species have been used as folk medicines to dispel wind and dampness, as well as invigorate blood circulation and analgesia. Additionally, these plants exhibit antitumor, anti-rheumatic, antimicrobial, hepatoprotective, and anti-inflammatory properties (Shen et al. 2010). Currently, B. lineata and B. polyphylla var. leioclada are the two primary varieties in herbal medicine markets, and they are traditionally called "Tiebaojin" in specific areas   Fig. 1 Evolutionary history inferred using the neighbor-joining method. a ITS2, b psbA-trnH, and c rbcL CBOL was excluded because of its low amplification rate. Thus, we used the ITS2, psbA-trnH, and rbcL regions of nuclear ribosomal DNA to examine a total of 56 samples belonging to eight Berchemia species (B. discolor sequences were obtained from the NCBI nucleotide database). Among the candidate DNA barcodes, the rate of successful identification with ITS2 was 100 % at the species level. Our results highlighted the advantages of using the ITS2 region as a DNA barcode; these advantages include universality, small intraspecific variation but high interspecific divergence, and a small fragment length of approximately 200 bp ). These advantages lead to easy amplification and sequencing (Sun and Chen 2013). Our study suggested the ITS2 region as the most ideal for Berchemia species identification. Pairwise distance analysis validated Berchemia, irrespective of the morphological similarities of several subspecies; nevertheless, the analysis failed to validate all the subspecies. Among the varieties B. lineata and B. polyphylla var. leioclada, a distance value of zero showed that the ITS2 region cannot differentiate the varieties of these species. Therefore, as indicated by the high degree of sequence variation, the pairwise distance analysis was proven  Relative distribution of inter-specific divergence and intra-specific variation of three barcodes a ITS2, b psbA-trnH, and c rbcL, Blue color intraspecific, Red color inter-specific useful in Berchemia identification but only up to the species level.
The NJ tree is useful in the identification of most of the species through the formation of monophyletic groups; this tool is also helpful in studying the ancestry and taxonomic positions of some species (Zhou et al. 2008). An issue of concern involves plant taxonomy because B. polyphylla var. leioclada belongs to the B. polyphylla subspecies, but it groups with B. lineata. We assumed that B. polyphylla may be a variation of B. lineata because they demonstrate a very close phylogenetic relationship. This phenomenon indicates that although the two species exhibit a similar morphological appearance, they may not present a close phylogenetic relationship. Hence, species identification at the molecular level is more convenient and efficient.
Clustal analysis is an essential tool used in barcoding (Higgins et al. 1992). In this study, deletion and variable site analysis showed that no barcode was able to differentiate among Berchemia spp. at the variation species level, even the ITS2 sequence. With regard to the variation in Berchemia spp., in which identification cannot also be achieved via morphological means, other methods can be attempted, such as phytochemical analysis. As previously reported, the quercetin and rutin levels differed between B. lineata and B. polyphylla var. leioclada. Specifically, B. lineata contains more quercetin and less rutin than B. polyphylla var. leioclada (Guo et al. 2012).
Ideally, barcodes must exhibit a barcoding gap between interspecific and intraspecific divergences (Meyer and Paulay 2005;Newmaster et al. 2006). To determine the existence of a gap, we assessed the distribution of divergences in classes of 0.001 distance units. The distribution and mean of intraspecific differences were lower than those of interspecific divergences; the highest significance levels were found for ITS2, followed by psbA-trnH and rbcL. The differential efficiency of ITS2 was more effective than that of psbA-trnH and rbcL and more suitable for Berchemia spp. in barcode identification. Phylogenetic analysis also showed that rbcL and psbA-trnH were not ideal barcodes in this identification process. The markers mentioned above all belong to the chloroplast genome, hence indicating that these chloroplast genome barcodes may not be suitable for Berchemia species identification. Whether this principle can be applied to the identification of other Rhamnaceae plants should be further determined.

Conclusions
This study demonstrated that DNA barcoding is an effective and useful tool to identify and track various raw materials of Berchemia medicinal plants in a cost-effective and efficient manner. This finding also elucidates several taxonomic conflicts among morphologically similar species in the Chinese herb market and provides candidate barcodes for further identification of other Chinese medicinal plants.

Sampling of plant materials
A total of 55 samples belonging to seven species (Fig. 4),  namely, B. floribunda, B. polyphylla, B. sinica, B. kulingensis, B. polyphylla var. leioclada, B. hirella, and B. lineata, were sampled from the Guangxi, Guizhou, and Yunnan provinces in China (Table 4). We collected at least three samples for every species. The voucher samples were deposited in the herbarium at the Guangxi Institute of Minority Medicine, Nanning, China. In addition, two Berchemia raw material samples were purchased from a local supermarket and pharmacy. Two  species. a B. floribunda, b B.  polyphylla, c B. sinica, d B. kulingensis, e B. polyphylla var. leioclada,   additional sequences belonging to B. dicolor were downloaded from the NCBI GenBank and used for comparative studies with the omission of accessions for identical sequence information. All the samples were identified by Liu Shou-yang, a botanist from Guangxi University of Chinese Medicine.

DNA extraction
Total genomic DNA was extracted from approximately 30-40 mg of dried leaves or 60-70 mg of roots, which were homogenized at 30 Hz with two stainless steel ball bearings in a 2.0 centrifuge tube by using the Plant Genomic DNA Kit (Tiangen Biotech Co., Beijing, China) in accordance with the manufacturer's protocol. The sample powder was incubated at 65 °C in 750 µL of GP1 buffer. The incubation time was extended from 20 min to 1 h for dried leaves or up to 5 h for roots and rhizomes. The remaining steps followed the manufacturer's instructions.

DNA amplification and sequencing
PCR was performed using the universal barcode forward and reverse primers for the ITS2, psbA-trnH, and rbcL regions (Table 5) (Kress et al. 2005;Lahaye et al. 2008;Sass et al. 2007;Song et al. 2009). General PCR conditions were adopted, as shown in Table 5 Sui et al. 2011). Individual amplifications were performed in 25 µL of a reaction mixture containing 2 × Tag PCR Master Mix (12.5 µL, Aidlab Biotechnologies Co., Beijing, China), 1 µL of each primer (2.5 μmol/L), and double-distilled water (8.5 µL). Approximately 4 µL of PCR products were examined by 1.0 % agarose gel electrophoresis (Fig. 5) and purified using the TIANgel Midi Purification Kit (Tiangen Biotech Co., Beijing, China). The purified PCR products were sequenced using an ABI3730XL sequencer (Applied Biosystems, Foster City, CA) with the amplification primers. All sequence data were submitted to NCBI, and accession numbers were obtained (Table 6).

Data analysis
Sequence assembly and generation of consensus sequences were completed using CodonCode Aligner v3.7 (CodonCode Corp., Dedham, MA, USA). The traces were assembled into bidirectional contigs, primer sequences were removed, and all ambiguous base calls were checked manually. Contigs were compared using the MUSCLE multiple sequence alignment algorithms supplemented with the CodonCode Aligner. Genetic variations were analyzed with a Kimura 2-parameter distance matrix, which was constructed using MEGA5.0 software (Ma et al. 2014) and ClustalW (Sun and Chen 2013). A phylogenetic tree was created using the NJ method. Bootstrap test with 1000 replicates was applied to assess the reliability of the phylogenetic trees (Tamura   Kress et al. (2005) and Song et al. (2009). The obtained sequences were also compared with the existing Berchemia species sequences obtained from the NCBI database through BLASTn test Ross et al. 2008).