Genetic polymorphism study at 15 autosomal locus in central Indian population

The analysis of 15 autosomal STR locus (TH01, D3S1358, vWA, D21S11, TPOX, D7S820, D19S433, D5S818, D2S1338, D16S539, CSF1PO, D13S317, FGA, D18S51, D8S1179) was done in 582 healthy unrelated individuals (Male-366, Female-216) originating from the various geographical regions of Madhya Pradesh, India. All locus fall under Hardy–Weinberg equilibrium except TPOX. These STR loci were highly informative and discriminating with combined power of discrimination (CPD) >0.99999. Locus wise allele frequencies of the studied population were compared with the other published populations. Also the Clustering pattern and genetic distance of studied populations is compared and presented with various populations. The studied population showed the genetic proximity with geographically close populations of India and significant genetic variation with distant populations which is also evident by clustering pattern of the NJ tree and the PCA plot.


Background
After almost 30 years since the first formal application of DNA technology (Jeffreys et al. 1985), short tandem repeats (STR's) based DNA analysis (Edwards et al. 1992) was accepted as a core method in forensics, it is still being routinely used in cases of simple paternity testing (Zupanic Pajnic et al. 2001), identification of human remains testing (Zupanic Pajnic et al. 2010) and in complicated criminal casework analysis, including rape and mass rape. STR's form approximately 3 % of the total human genome and on an average are present once in every 10,000 nucleotides (Butler 2005). Due to ease of use due to multiplexing, these markers are routinely used in forensic, anthropological and medical studies. With the growing number of laboratories using STR analysis technology, more and more population STR data have been reported (Tandon et al. 2002;Sarkar and Kashyap 2002;Sahoo and Kashyap 2002;Gaikwad and Kashyap 2002;Rajkumar and Kashyap 2004;Narkuti et al. 2008;Dubey et al. 2009;Ghosh et al. 2011;Chaudhari and Dahiya 2014;Giroti and Talwar 2010;Shrivastava et al. 2015a;. India is the largest secular country with a polygenetic population. Various known religions are found in India and the Indian population belongs to various linguistic and ethnic groups of different castes and tribes and it is said to be the melting pot of various ethnic groups ). Human diversity in India is defined by 4693 differently documented population groups that include 2205 major communities, 589 segments and 1900 territorial units spread across the country (Singh 1998). Major population migrations, social structure and caste endogamy has influenced the genetic structure of Indian populations.
Madhya Pradesh, a state in Central India is the second largest state in the country by area. Population of Madhya Pradesh is 72,597,565 comprising 37,612,920 males and 34,984,645 females, contributing 6 percent to India's total population (Census of India 2011). With these rationales 15 highly polymorphic autosomal microsatellite markers including 13 core forensic loci, have been analyzed and the distribution of alleles across various populations is compared with the previously published data on the same markers from different parts of India (caste specific available data) and other area specific reported data only one from India and rest from other parts of world, in order to decipher genetic delineation amongst the populations  (Tables 1, 2). As the genetic data being reported here is area specific therefore, besides comparing the data with the population geographically close (caste specific) to the population of Madhya Pradesh and other parts of India, the data was also compared with the area specific available data.

The population and DNA extraction
The population sample consisted of 582 healthy, unrelated individuals (Male-366, Female-216) originating from different geographical regions of Madhya Pradesh. Samples were taken from routine casework performed by the first author at the DNA fingerprinting Unit, State Forensic Science Laboratory, Sagar, Madhya Pradesh, India from the period of 2007 to 2013 after written informed consent. Only fathers and mothers were selected from paternity trios and unrelated individuals were taken into consideration from complex kinship analyses. DNA was extracted from the peripheral blood samples by automated DNA extraction system 12 GC (Precision System Science Co., Ltd., Matsudo, Japan).

DNA quantitation
Real Time PCR ABI 7000 (Applied Biosystems, Foster City, CA, USA) was used for quantification of the isolated DNA using the Quantifiler DNA Quantification Kit (Applied Biosystems, Foster City, CA, USA) as per the recommended protocol by the manufacturer.
Amplification 1 ng of DNA template was used to simultaneously amplify 15 STR locus including 13 CODIS (D3S1358, TH01, D21S11, D18S51, D5S818, D13S317, D7S820, D16S539, CSF1PO, vWA, D8S1179, TPOX, FGA) and 2 additional loci (D2S1338 and D19S433), as well as the gender determining locus Amelogenin using AmpFlSTR Identifiler or AmpFlSTR Identifiler Plus kit (Applied Biosystems, Foster City, CA, USA). Similar amount of DNA was used in all PCR reactions. Amplification was carried out according to the manufacturer's recommended protocol, with a modification of decreasing the total volume of each reaction to 12.5 μL. The PCR amplification was carried out in AB Gene Amp PCR System 9700 Thermal Cycler (Applied Biosystems, Foster City, CA, USA).

Typing
Multicapillary electrophoresis of the amplification products was performed on an ABI Prism 3100 Avant Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) using LIZ 500 size standard (Applied Biosystems, Foster City, CA, USA) provided with the kit and the data was analysed using GeneMapper ™ 3.5 Software (Applied Biosystems, Foster City, CA, USA). All steps were done according to the Laboratory's internal control standards and respective kit controls, according to the IFSH recommendations (DNA recommendations 1994).

Quality control
Passed Proficiency testing of the GITAD, Spain http:// gitad.ugr.es/principal.htm). Also, laboratory internal control standards and kit controls used.

Analysis of data
Allele frequency of the 15 STR loci was calculated by GenAlEx 6.5 software (Peakall and Smouse 2006). Several forensic parameters, i.e., polymorphism information content (PIC), power of discrimination (PD), power of exclusion (PE), matching probability (P m ) and paternity index (PI) was calculated using the PowerStatsV1.2 spreadsheet program (Tereba 1999). Observed heterozygosity (H obs ), Expected Heterozygosity (H exp ) and Hardy-Weinberg equilibrium (HWE) using exact test was calculated using Arlequin v3.5 (Excoffier et al. 2005). Allele frequencies of studied population were compared with other published populations using Fst pair wise distance by Arlequin v3.5 software (Excoffier et al. 2005). Nei's genetic distances (Nei 1972) among compared populations were derived and subsequently used to generate a Neighbour joining (NJ) dendrogram using POPTREE2 program (Takezaki et al. 2009). The robustness of the phylogenetic relationship established by the NJ tree was assessed using bootstrap analysis with 1000 replications. Graphical representation of genetic distances was also performed based on Principle component analysis (PCA) plot using PAST 3.02a software (Hammer et al. 2001).

Results and discussion
The genetic variations in the allele-frequency distribution at 15 STR loci and statistical analysis of forensic parameters for the studied populations are shown in Table 3. In total, 158 alleles were observed in the central Indian population with corresponding allele frequencies ranging from 0.001 to 0.381 (Table 3). In which CSF1PO locus was found to have a maximum allele frequency with allele 12 (0.381) being the most frequent allele in this population. Locus wise distribution of the most common and least common allele in studied population is summarized and presented in Table 4. The peak high threshold was 50 RFU for heterozygous and 200 RFU for homozygous alleles. The combined power of exclusion (CPE) and  combined power of discrimination (CPD) for all 15 STR locus were 0.9999 and greater than 0.99999 respectively in studied population. The combined matching probability was found to be 1.51 × 10 18 . Among all the studied locus, no significant deviations from Hardy-Weinberg expectations were observed even after Bonferroni correction (Bland and Altman 1995) except at locus TPOX (p < 0.003). At TPOX locus all the homozygotic peaks were found with a peak height of more than 200 RFU, thus removing the possibility of any heterozygous peak.