An efficient identification strategy of clonal tea cultivars using long-core motif SSR markers

Microsatellites, or simple sequence repeats (SSRs), especially those with long-core motifs (tri-, tetra-, penta-, and hexa-nucleotide) represent an excellent tool for DNA fingerprinting. SSRs with long-core motifs are preferred since neighbor alleles are more easily separated and identified from each other, which render the interpretation of electropherograms and the true alleles more reliable. In the present work, with the purpose of characterizing a set of core SSR markers with long-core motifs for well fingerprinting clonal cultivars of tea (Camellia sinensis), we analyzed 66 elite clonal tea cultivars in China with 33 initially-chosen long-core motif SSR markers covering all the 15 linkage groups of tea plant genome. A set of 6 SSR markers were conclusively selected as core SSR markers after further selection. The polymorphic information content (PIC) of the core SSR markers was >0.5, with ≤5 alleles in each marker containing 10 or fewer genotypes. Phylogenetic analysis revealed that the core SSR markers were not strongly correlated with the trait ‘cultivar processing-property’. The combined probability of identity (PID) between two random cultivars for the whole set of 6 SSR markers was estimated to be 2.22 × 10−5, which was quite low, confirmed the usefulness of the proposed SSR markers for fingerprinting analyses in Camellia sinensis. Moreover, for the sake of quickly discriminating the clonal tea cultivars, a cultivar identification diagram (CID) was subsequently established using these core markers, which fully reflected the identification process and provided the immediate information about which SSR markers were needed to identify a cultivar chosen among the tested ones. The results suggested that long-core motif SSR markers used in the investigation contributed to the accurate and efficient identification of the clonal tea cultivars and enabled the protection of intellectual property.

these areas (Anesini et al. 2008;Alkan et al. 2009;Basu Majumder et al. 2010;Sae-Lee et al. 2012). In 2013, 3.52 million hectares of tea plants were harvested, producing 5.34 million tons of tea (FAO, http://faostat.fao.org/). The clonal tea cultivars are characterized by a regular and uniform development of shoots and leaves period, leading to a stable tea quality, and improved tea yield (Wachira et al. 1995;Fang et al. 2012;Yao et al. 2011). In recent years, tea acreage and production have increased continuously, partially as a result of the release and extension of clonal tea cultivars (Bandyopadhyay 2011).
Tea plant is a woody perennial characterized by a large diploid genome (~4 Gb, 2n = 30, very few are triploid), which has not been sequenced so far. It is self-incompatible and highly heterozygous. It has a long juvenile phase (more than 20 years), therefore tea cultivar breeding is a very long and expensive process (Chen et al. 2007;Tan et al. 2013). Tea tree is capable of multiplying by vegetative propagation of its shoots, as a result, the phenomenon of infringement of clonal tea cultivar breeders' rights is extremely common. Therefore, to safeguard the protection of intellectual property, it is crucial to establish a fast, scientific, and practical method to identify them.
The traditional method of morphological identification failed to effectively identify several clonal tea cultivars, due to the effect of environmental factors on phenotypic traits. By contrast, DNA molecular markers have proved to be a powerful tool for fingerprinting of crop cultivars (Patzak et al. 2007;Jian et al. 2010;Divashuk et al. 2011). SSR markers are characterized by codominance, polymorphism, and high stability, and therefore, represent a superior choice among all the molecular markers developed for crop cultivar identification (Hasnaoui et al. 2012;Karaagac et al. 2014).
Recent advances in SSR for tea came from the deep sequencing of the tea plant transcriptome (Wu et al. 2012;Tan et al. 2013;Wang et al. 2013), which provided an increased number of SSR markers for tea cultivar identification. Several studies have investigated tea cultivars with SSR markers (Kaundun and Matsumoto 2004;Ujihara et al. 2009;Bhardwaj et al. 2013), nevertheless, these studies would be not so straightforward but they would be done using a little more time to analyze the fingerprinting data in discriminating tea cultivars. With taking advantage of the suggested necessary SSR markers, a pratical strategy for efficient identification of plants rely on a new way of recording DNA fingerprints of genotyped plants called cultivar identification diagram (CID), which can be used for a quick identification of specific plant cultivars (Huo et al. 2013). In view of this, the CID method can be used as a practical way in identifying clonal tea cultivars.
The fingerprinting data should be supposed to repeatedly, so the accuracy of which were of great importance. SSR with long-core motifs (tri-, tetra-, penta-, and hexanucleotide) are preferred since neighbor alleles are more easily separated and identified from each other. Short-core motifs (di-nucleotide) are not desirable mainly because of the lower separation of neighbor alleles and the high degree of stuttering, which render the interpretation of electropherograms and the true alleles less reliable (Cipriani et al. 2008). SSR with long-core motifs were adopted in human genetics (Ruitberg et al. 2001;Butler et al. 2004;Butler 2006;Hellmann et al. 2006), but were exclusively used for genetic analyses only in few crops (Dettori et al. 2015). In tea plant, there have been reported that eight core SSRs with the larger repeat motifs (3-6 bp) selected to fingerprint 128 Chinese clonal tea cultivars (Tan et al. 2015), nevertheless, it would require a little more time to analyze the fingerprinting data using these SSR markers in discriminating tea cultivars, furthermore, these markers would be not enough to exclusively identify tea cultivars, especially when more new cultivars would be released in future.
Therefore, for the sake of providing a practical method of identification of the clonal tea cultivars, thus ensuring the protection of intellectual property, we aimed to obtain a new set of long-core motif SSR markers, and to establish a cultivar identification diagram (CID) based on the suggested necessary SSR markers and the genotyping data revealed, fully reflecting the identification process and providing the immediate information about which SSR markers are needed to identify a cultivars chosen among the tested ones.

Plant material
A total of 66 elite clonal tea cultivars were tested in this study. They were collected during the tea germplasm collection at the Institute of Tea, Fujian Academy of Agricultural Sciences, Fuan, Fujian, China. Young leaves of the 66 clonal tea cultivars were collected twice independently and frozen in liquid nitrogen, and stored at −80 °C. The names of these clonal tea cultivars are listed in Table 1.

DNA extraction
Total genomic DNA of each cultivar was extracted twice from young leaves using the CTAB method (Reitz et al. 1972). The genomic DNA was diluted to a final concentration of 30 ng/μL using TE buffer and stored at −20 °C until use, and 0.8 % agarose gels were used to check the quality of the DNA.

PCR and SSR fragment detection
The 33 SSR markers with long-core motifs (tri-, tetra-, penta-, hexa-nucleotides) from a tea plant genetic map (Ma et al. 2014), were initially selected and labeled at the 5′ end of each forward primer pair with fluorescent dyes. To the best of our knowledge, this set of markers has not been used in identifying tea cultivars so far. The selection criteria were follows: (1) two or more alleles detected in a preliminary screening with eight cultivars (data not shown); (2) alleles distinct on 10 % silver-stained polyacrylamide gels with no random bands; and (3) markers with an even coverage of 15 linkage groups of tea plant genome. The details about the markers used in this study were showed in Table 2. PCR amplification was carried out in a volume of 30 μL, containing 2 μL of (30 ng/μL) genomic DNA, 1.5 μL (10 μM) of each primer, 1.0 μL of Taq DNA polymerase (0.5 U/ μL), 2 μL (25 mM) of MgCl 2 , 3 μL dNTP (10 mM), 3 μL 10 × Buffer, and 16 μL of doubledistilled water. Amplification reactions were performed using Huayue Biometra Thermal Cycler under the following conditions: initial denaturation for 5 min at 94 °C, 35 cycles at 94 °C for 30 s, Ta °C for 30 s, and at 72 °C for 1 min, and a final extension step at 72 °C for 20 min plus a hold at 4 °C. The Tm (°C) of each primer was reported in Table 2. After mixing 1µL of each PCR product with 9 µL of the standard molecular weight mixture ROX500 in a 96-well-plate, it was gently vortexed, and centrifuged at 3000 rpm for 2 min. The mixture was denatured at 95 °C for 3 min and left in ice for 5 min, and loaded into the Applied Biosystems (ABI) 3730 sequencer for fragment analysis. Both PCR amplification and SSR fragment detection were performed twice independently.

Data analysis
The data obtained were analyzed using Genemapper software version 4.0. PowerMarker (Liu and Muse 2005) was used to calculate the key genetic statistics of the markers, including major allele frequency (MAF), number of alleles (NA), number of genotype (NG), observed heterozygosity (Ho), polymorphism information content (PIC), and Nei's genetic distances (Nei et al. 1983).

Power of fingerprinting
To assess the fingerprinting potential of the SSR markers, probability of identity (PID) for each marker was calculated. PID represents the average probability of two random individuals in a population sharing the same genotype, and is calculated as follows: where p i is the frequency of the ith allele at a locus (Taberlet and Luikart 1999).

Core SSR marker selection
Additional criteria used for selection of core SSR markers include: (1) PID < 0.198 (average of the 33 markers), such that a combination of a few markers provided enough discriminant power; (2) high degree of polymorphism, with a PIC > 0.5 (Hoda et al. 2010;Pan et al. 2010); and (3) finally, the number of alleles ≤5, and the number of genotypes ≤10. Primers providing higher numbers of alleles and of genotypes were not chosen because they were not deemed to be easily manageable.

Phylogenetic analysis
For represented all the variability in the tested tea cultivars, using the set of core SSR markers selected, a phylogeny tree of the tested clonal tea cultivars was constructed based on Nei's genetic distances and the UPGMA method and viewed with MEGA 4.0 (Tamura et al. 2007).

Construction of CID
The CID was established as reported previously (Liu et al. 2014) with a few modifications. The method was based on classification of cultivars into different groups according to the genotypes amplified by each core SSR marker selected: (1) a cultivar with a unique genotype generated from a single primer pair, was already identified and occupied a group by itself; (2) cultivars sharing the same genotype were placed in the same group; (3) additional core SSR marker primer pairs were used to identify the cultivars sharing the same group; and (4) the order of core markers selected to construct the CID was chosen at descending PID values.

Stability of detection and data analysis
The allele size was the same in the two independent scoring, which inicated the high quality and stability of the DNA fragments amplification and detection. All the 33 SSR loci were polymorphic among the tested cultivars. The NA ranged from 2 to 16 (average 6), and the NG ranged from 2 to 33 (average 11.2). The Ho ranged from 0.136 to 0.864 (average 0.583), and the PIC ranged from 0.119 to 0.864 (average 0.553). Details are displayed in Table 3. The average values of NA, Ho and PIC were all lower than those of Tan reported (Tan et al. 2015), where NA, Ho and PIC was 10.4, 0.701, 0.704, respectively, which mainly owing to the 33 SSR markers used herein were all long-core motif ones.

Selection of core SSR markers
According to the above selection criterion, 6 SSR markers with long-core motifs were further selected from the 33 SSR markers, and used as a set of core primer pairs to identify the clonal tea cultivars tested (Tables 2, 3).
Assuming that all loci segregate independently, the probability of finding two random individuals with identical genotypes at all the 33 loci was an estimated 1.42 × 10 −32 , while the probability of the set comprising 6 core SSR markers was 2.22 × 10 −5 , which provided enough discriminant power to identify the tested clonal tea cultivars.

Genotypes of the tested cultivars
The genotypes of the 66 cultivars generated from the set of 6 core SSR markers are shown in Table 4, by which we can establish the CID to rapidly distinguish the tested clonal tea cultivars.

Phylogenetic analysis
The genetic relationships among the tesrted 66 tea cultivars are presented in the phylogeny tree (Fig. 1). All of them were grouped according to their genetic backgrounds.   (Li Zao Xiang), 65th (Feng Huang Dan Cong), 59th (Qian Nian Xue). The third cluster was comprised of the maximum number of materials, including the rest 46 ones of the tested cultivars. From the phylogeny tree, we could find the clustering result based on the genetic background (Nei's genetic distances) was not well corresponded with that based on the trait 'cultivar processing-property' . This phenomenon showed that the core SSR markers were not strongly correlated with the trait 'cultivar processing-property' .

Creation of CID
Based on the descending order of PID values of the six core markers, the primer pair with the highest PID, TM442 (PID = 0.192), was used to identify the genotypes of all the tested cultivars. It allowed to classify the cultivars into nine groups corresponding to the nine genotypes. The first group contained only the 15th cultivar (Bai Ji Guan, genotype '+292'), and the second group contained only the first cultivar (Tie Guan Yin, genotype '+268,292'). The remaining 64 cultivars were distributed into seven groups, containing each more than two cultivars, requiring other markers for separation. The third group including two clonal tea cultivars, namely, the 56th (Yin Shuang) and the 65th (Feng Huang Dan Cong), which were separated by the genotype generated by the  Table 1 second primer pair TM324 (PID = 0.185), '+171' and '+171, 183' , respectively. Similarly, the other groups of clonal tea cultivars were fully separated by other primer pairs, and the CID was established as shown in Figs. 2, 3 and 4.
Being the information about markers used contained in the CID, it can make the identification of the cultivars represented more straightforward. In the following example two random cultivar, A and B, belonging to the CID, were screened by TM442, the first primer used in CID construction. The genotypes of the two cultivars were both '+268, +286' , placing them on the third group of the CID. After that, TM324 showed that ' A' genotype was '+171' , and 'B' was '+171, +183' thus identifying ' A' as the 56th(Yin Shuang), and 'B' as the 65thcultivar (Feng Huang Dan Cong). Details are shown in Figs. 5 and 6. Using this procedure, the two cultivars were quickly and successfully identified with the suitable combination of two primers (TM442 and TM324).    Table 1 Upon release of new clonal tea cultivars, the set of six core SSR markers could still be used to amplify their genomes and locate them on the CID; furthermore, this could be achieved using less than the six primers currently included in the core set. For example, if a new cultivar displayed the TM442 '+268, 292' genotype and TM324 was '+183' , the CID construction or the identification process could be finished using only two primer pairs. On the contrary, if the six primer pairs would fail to provide a full identification of the new cultivars, additional core SSR primer pairs could be added. With the identification of new cultivars, a larger clonal tea cultivar CID would be developed.

Discussion
Recently the single nucleotide polymorphism (SNP) has rapidly become a well-considered marker choice for genetic studies, due to their low cost, high genotyping efficiency, genome-wide coverage and analytical simplicity. SNP markers have been used in tea plant for cultivar identification , genetic diversity analysis (Yang et al. 2016), genetic map construction (Ma et al. 2015), nevertheless, it does not mean we cannot use SSR markers anymore. SSRs are still served as excellent markers in tea plant genetic analyses. As useful genetic markers, SSRs have been provided with several advantages for their co-dominant, hyper-variability, polymorphism, ease and reliability of scoring. SSRs have been used extensively for analysis of genetic diversity, population genetics, linkage mapping and association analysis (Verma et al. 2012). Furthermore, the high PIC value of SSRs (up to three fold higher than SNPs), coupled with high heterozygosity values makes them useful for assessment of genetic relatedness and map based cloning (Yang et al. 2011). We reported the combined PID value of only 6 long-core motif SSRs herein was 2.22 × 10 −5 , while that of 60 polymorphic SNPs was about 1.0 × 10 −5 . That was to say, the discriminating power of only 6 long-core motif SSRs reached approximately half to that of 60 SNPs. In comparison, the advantage of SSRs over SNPs for individual fingerprinting is obvious.
The successes of DNA fingerprinting greatly rely on the markers' quality and the accuracy of genotyping data, thus supposed to be perfectly repeatable in every lab. In the present work, a great deal of attention have been paid to the marker selection step. The 33 markers initially selected from a tea plant genetic map were all long-core motif ones, which render the interpretation of electropherograms and the true alleles more reliable. Furthermore, owing to additional criteria used for selection, the new set of 6 core SSR markers were all provided with enough discriminant power, and were highly polymorphic, easily manageable, which helped in improving the identification efficiency. The capillary electrophoresis conducted in ABI 3730 sequencer was capable of providing high detection sensitivity of amplified DNA fragments, which has been proved to be a powerful and efficient technique for automated and accurate estimation of allele sizes (Brunings et al. 2010;Li et al. 2014). There were no differences between the two independent replicates, which both detected by the capillary electrophoresis technology, showed the detection results held stability and reproducibility. The set of 6 core SSR markers in this investigation were valuable resources, and were of great importance in tea cultivar fingerprinting.
The phylogeny tree was incapable of providing us which information could be used for the identification of the tested cultivars, although it represented all the variability in the tested cultivars. By using the genotypes of the tested cultivars (Table 4 showed), everyone could decide which primers to use to identify two cultivars, nevertheless, it would be not so straightforward but it would be done using a little more time to analyze the fingerprinting data in discriminating the tested tea cultivars. The CID directly allowed separation of cultivar sample at each step, and the whole identification process was displayed, which differed from the phylogeny tree and was an extension of the fingerprinting data. For these reasons, the CID method was a kind of useful complements to the phylogeny tree and fingerprinting data when used to quickly identify tea cultivars.
From the phylogeny tree, we can find that the core SSR markers are not strongly correlated with the trait 'cultivar processing-property' . In the near future, we will focus on the study of the linkage disequilibrium (LD)-based association analysis (Gupta et al. 2005) through SSR scanning of diverse tea cultivars (or germplasms), to detect SSR markers strongly correlated with target traits to help marker assistant selection in tea-breeding programs.

Conclusions
The 6 core SSR markers with long-core motif selected in the study on the bases of the degree of gene polymorphism and of genotype frequencies revealed, of easy and stable allele separation and scoring, enabled the full identification of 66 tested clonal tea cultivars.
The tea plant CID based on the suggested core SSR markers and genotyping data revealed, was a useful complement to the phylogeny tree and the fingerprinting data, provides help in quickly identifying the clonal tea cultivars and, consequently, in protecting the plant breeders' rights.