Population genetic structure analysis and forensic evaluation of Xinjiang Uigur ethnic group on genomic deletion and insertion polymorphisms

Background The Uigur ethnic minority is the largest ethnic group in the Xinjiang Uygur Autonomous Region of China, and valuable resource for the study of ethnogeny. The objective of this study was to estimate the genetic diversities and forensic parameters of 30 insertion-deletion loci in Uigur ethnic group from Xinjiang Uigur Autonomous Region of China and to analyze the genetic relationships between Xinjiang Uigur group and other previously published groups based on population data of these loci. Results All the tested loci were conformed to Hardy–Weinberg equilibrium after Bonferroni correction. The observed and expected heterozygosity ranged from 0.3750 to 0.5515; and 0.4057 to 0.5037, respectively. The combined power of discrimination and probability of exclusion in the group were 0.99999999999940 and 0.9963, respectively. We analyzed the DA distance, interpopulation differentiations and population structure, conducted principal component analysis and neighbor-joining tree based on our studied group and 21 reference groups. The present results indicated that the studied Xinjiang Uigur group (represented our samples from the whole territory of Xinjiang Uigur Autonomous Region) had a close relationships with Urumchi Uigur (represented previously reported samples from Urumchi of Xinjiang) and Kazak groups. Conclusions The present study may provide novel biological information for the study of population genetics, and can also increase our understanding of the genetic relationships between Xinjiang Uigur group and other groups. Electronic supplementary material The online version of this article (doi:10.1186/s40064-016-2730-3) contains supplementary material, which is available to authorized users.

of advantages properties shared with the similar binary variation of SNPs, for example, smaller amplicons, lower mutation rates than STRs and widely distribute in the human genome (Phillips et al. 2007;Fondevila et al. 2012;Shi et al. 2015;Romanini et al. 2012). At present, InDels have been applied in forensic genetic applications including individual identification (Pereira et al. 2009), inferring biogeographic ancestry (Yang et al. 2005) and population genetic studies et al. (Zaumsegel et al. 2013).
The Investigator DIPplex ® kit (Qiagen, Hilden, Germany) contains the following components for the simultaneous amplification of Amelogenin and 30 autosomal InDels (the genomic information regarding chromosomal localization of the 30 InDel loci was shown in Table 1). The allele length variations of the InDels range from 4 to 22 bp and all amplicons are shorter than 160 bp, which make them more suitable for highly degraded DNA samples in forensic caseworks. To date, several populations' genetic data have been published based on this kit, e.g. Japanese, Poland and Korean groups, and so on (Nunotani et al. 2015;Pepinski et al. 2013;Kim et al. 2014).
Xinjiang Uigur Autonomous Region is located in the northwest border of China with the land of 1.6649 million square kilometers and account for one-sixth of China's total area (Fig. 1). It lies in the heart of the ancient Silk Road which has historically experienced migration of many groups of Eastern and Western Eurasians. The Uigur, as the main nationality of Xinjiang Uigur Autonomous Region, has a population of 10.06 million in 2010 (http://www.stats.gov.cn/tjsj/pcsj/rkpc/6rp/indexch.htm). The Uigurs mainly live in Kashi which is located in the south of Tianshan Mountain, and others are scattered in Ili and Urumchi area. Uigurs have their own language and words and their language belongs to the Turkic branch of Altaic language family. The belief of the Uigurs is Islamism which has a great influence on Uigurs' culture and custom (Shan and Deng 2012). In the present study, we obtained the population genetic data and calculated the forensic parameters of 30 InDels in the studied Xinjiang Uigur group. We also collected the population data from previously reported groups to analysis their genetic relationships including Uigurs living in different area, other groups in China, Asian, European and Amerindian groups.

Sample collection and DNA extraction
A total of 136 bloodstain samples were collected from Xinjiang Uigur Autonomous Region. All volunteers resided in Xinjiang Uigur Autonomous Region for more than three generations and signed the informed consents before being involved in the study. This study was approved by Institutional Ethics Committee, Xinjiang Medical University, China. Genomic DNA was extracted from bloodstained samples using the Chelex-100 method according to Walsh et al. (1991).

Amplification and genotyping
Amplification of 30 InDel loci was performed using the Investigator DIPplex ® kit on GeneAmp PCR System 9700 Thermal Cycler (Applied Biosystems, Foster City, CA, USA) according to the Investigator DIPplex handbook instruction. Amplification products were separated via capillary electrophoresis on an ABI3500 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) according to manufacturer's instruction. The control DNA 9948 (Promega, Madison, WI, USA) was analyzed as positive control. Genotyping results were obtained using the software GeneMapper v3.2 (Applied Biosystems, Foster City, CA, USA) by comparing to allelic ladder.

Quality control
We strictly followed International Society for Forensic Genetics (ISFG) recommendations on the analysis of the DNA polymorphisms (Schneider 2007).

Statistical analysis
Allele frequencies and forensic parameters including observed heterozygosity (Ho), Hardy-Weinberg equilibrium (HWE), match probability (MP), polymorphic information content (PIC), power of exclusion (PE), discrimination power (PD) and typical paternity index (TPI) were estimated by the modified Powerstat v1.2 spreadsheet (Promega, Madison, WI, USA). Expected heterozygosity (He) was calculated according to the formula: He = n n−1 1 − k i=1 p 2 i (Nei 1978), p i was the allele frequency of allele i, k was the number of alleles and n was the number of samples. The pairwise Fst and p Principal component analysis (PCA) based on allele frequencies was evaluated in MAT-LAB2007a (MathWorks Inc., USA). Linkage disequilibrium (LD) analysis was performed using the SNP Analyzer V2.0 (Istech, South Korea) (Yoo et al. 2008). The D A distances were obtained using the DISPAN program (Ota 1993). According to the D A distances the neighbor-joining (NJ) tree was conducted. Population structure analysis was conducted by the STRUCTURE program (version 2.2) using Admixture Model with parameters adjusted to: burn-in-period, 100,000; run time, 100,000 steps in the Markov Chain; K values, 2-7; and iteration time, 15 (Pritchard et al. 2003;Jakobsson and Rosenberg 2007).

Forensic parameter analysis
All studied loci were found to be in accordance with HWE in Xinjiang Uigur group after Bonferroni correction when the significance level was adjusted to 0.0017 (p = 0.05/30). The allele frequencies and forensic parameters of 30 InDel loci in Xinjiang Uigur group were shown in Table 1; and the raw genotyping data were shown in Additional file 1: Table S1. The Ho and He ranged from 0.3750 (HLD56 and HLD84) to 0.5515 (HLD83, HLD92 and HLD131); 0.4057 (HLD64) to 0.5037 (HLD101), respectively. The PIC, TPI, PD and PE values ranged from 0.3216 to 0.3750; 0.8000 to 1.1148; 0.5563 to 0.6513 and 0.0994 to 0.2366, respectively. The highest and lowest MP were 0.4437 (HLD64) and 0.3487 (HLD125), respectively. The combined power of discrimination (CPD) and probability of exclusion (CPE) in the group were 0.99999999999940 and 0.9963, respectively. The high CPD value demonstrates that the panel of 30 InDel loci had potential in forensic individual identification.

Linkage disequilibrium analysis
Linkage disequilibrium has been tested for all possible combinations between each locus. The linkage disequilibrium pattern revealed by r 2 values between each locus was shown in Additional file 2: Table S2, The results showed that there was no linkage disequilibrium observed among all the loci with the values of r 2 less than 0.1, which indicated those genetic markers were relatively independent for subsequent comparison among 22 groups.

Clustering analysis
Before conducting the comparison, we had re-read the references and made sure that loci in all reference populations showed no deviation from HWE and linkage equilibrium. We analyzed the population structures of Xinjiang Uigur group (represented our samples from the whole territory of Xinjiang Uigur Autonomous Region) and 21 referenced groups and the results were shown in Fig. 2. The Asian groups were separated from both Amerindian groups and European groups at K = 2, the 5 European groups and 6 Amerindian groups constituted almost entirely by green component while 8 Asian groups by red; The Kazak, Urumchi Uigur and Xinjiang Uigur groups displayed admixture constitution of both green and red components. At K = 4, we could clearly separate Amerindian groups from European groups. Uigurs and Kazaks were much better separated from both Europeans and Asians by K = 6.

Principal component analysis
A PCA was constructed to analyze the relationships between the Xinjiang Uigur group and other 21 groups. The result was shown in Fig. 3. The first and second component accounted for 58.95 and 23.23 %, respectively; and the cumulative contribution of the first two principal components defined 82.18 % of the total variance. In the plot figure, 5 European groups and 6 Amerindian groups located in the left part, while the 8 Asian groups located in the right part and the 3 Eurasian groups (Kazak, Urumchi Uigur and Xinjiang Uigur groups) in the central part. The Xinjiang Uigur group had short distance with the Urumchi Uigur and Kazak groups in PCA plot, which indicated the Xinjiang Uigur group had closer genetic relationships with those two groups.

Table 1 Allele frequencies and forensic parameters for 30 InDels in Uigur group from Xinjiang Uigur Autonomous Region (n = 136)
HLD human locus deletion/insertion polymorphism, DIP− frequency of short allele, DIP+ frequency of long allele, Ho observed heterozygosity, He expected heterozygosity, MP matching probability, PD power of discrimination, PE probability of exclusion, PIC Polymorphic information contents, TPI typical paternity index, HWE probability value of the exact test for Hardy-Weinberg equilibrium, p the short arm of a chromosome, q the long arm of a chromosome

Interpopulation differentiations
We estimated pairwise Fst and p-values utilizing analysis of molecular variance method between Uigur group and previously published groups at the 30 InDel loci, which were given in Additional file 3: Table S3. The results showed that the least differences were found between the Xinjiang Uigur group and the Urumchi Uigur and Kazak groups, with significant differences at one and three loci, respectively; whereas differences were observed between Xinjiang Uigur group and other groups at 5-20 loci. The results indicated that the distribution of allele frequencies in different groups were different. Therefore, InDel would be a useful tool to study the migration patterns, geneflow, admixture and ancestry with the discovery of more available loci (Hefke et al. 2015).

D A distance
The D A distance was calculated to elucidate the genetic distance. The D A distances between Xinjiang Uigur group and other reference groups were shown in Table 2.
According to the D A distances, the Xinjiang Uigur group was closest to the Urumchi Uigur group (D A = 0.0012), and followed by the Kazak (D A = 0.0019) group, both of them belongs to Altaic language family. The greatest distance was detected when comparing the Xinjiang Uigur group with Yucatan Mexican (D A = 0.0353) and Mexican Amerindian (D A = 0.0473) groups.

Phylogenetic analysis
A NJ-tree was constructed based on D A distances as presented in Fig. 4, the NJ tree showed that the Xinjiang Uigur group was first clustered with the Urumchi Uigur and Kazak groups. The result was consistent with the above mentioned results of STRUC-TURE, D A distance and PCA. According to the relevant historical records, Uigurs were the descendants of ancient Uighur and with large proportion of the descent from Caucasian. Uigurs and Kazaks have common religious belief which indicated that they were likely having the same or similar origin in the process of the formation and development (Palstra et al. 2015;Xu et al. 2006). Therefore, the genetic distances could be relatively close among them. Yuan et al. (2015) studied the genetic polymorphism of 38 STR loci in Uigur group from Southern Xinjiang of China; their Fst distance results (21 loci) indicated the Uigur group was closest to Kazak, and our result was similar to theirs.

Conclusions
In summary, the 30 InDel loci showed relatively high forensic-efficacy in the Xinjiang Uigur group and could be used in forensic individual identification, and also be used as complement for STR loci in forensic paternity testing. The result of D A distance, STRUCTURE, PCA and NJ tree indicated that the studied Xinjiang Uigur group had a close relationship with Urumchi Uigur and Kazak groups. This study provided valuable data for analysis of genetic relationship and forensic application.