Genetic diversity and relationship of cattle populations of East India: distinguishing lesser known cattle populations and established breeds based on STR markers

India has 34 recognized breeds of cattle in addition to many more not characterized and accredited so far. It is imperative to characterize all the cattle germplasm of the country so as to have better breeding and conservation options. Thus, present study was planned for assessing genetic diversity and relationship between three local cattle populations (Gangatiri, Shahabadi and Purnea) and two established cattle breeds (Bachaur and Siri) of eastern India by using 21 FAO and ISAG recommended microsatellite markers. A total of 243 unrelated DNA samples of five cattle populations were collected from respective habitats. A total of 304 microsatellite alleles were identified with number of alleles at one locus ranging from 5 to 29. The average observed heterozygosity lie within the narrow range of 0.681 ± 0.04 in Purnea to 0.721 ± 0.03 in Siri. Mean estimates of observed and expected heterozygosity over all loci and breeds were 0.704 ± 0.02 and 0.720 ± 0.01, respectively. In the overall population, the homozygote excess (FIT) of 0.073 ± 0.02, was partly due to the homozygote excess within breeds (FIS = 0.026 ± 0.02) and to a larger extent due to genetic differentiation among breeds (FST = 0.048 ± 0.01). The genetic distance, STRUCTURE and Principal Component Analyses concluded that the Siri cattle are most distinct among the investigated cattle populations. Furthermore the analysis of genetic structure indicated that the most probable number of clusters is four. All analysis showed that a significant amount of genetic variation is maintained in local cattle populations of which Shahabadi and Purnea are distinct from the recognized breeds of the area and needs recognition as breeds.


Introduction
A large and divergent range of agro-ecological zones in India has helped to develop number of cattle populations. Cattle are the largest livestock species in India and constitute 37.5% of its total livestock population (BAHS-Basic Animal Husbandry Statistics 2012). However, it is surprising to note that only 11.6% of total cattle (199.08 million) belong to pure indigenous breeds, whereas, 69.7% were classified as non-descript indigenous animals (BAHS-Basic Animal Husbandry Statistics 2012). Nondescript population includes the cross-bred populations, populations which are mixture of different breeds or the Non-descript population is greatly contributing to the total Gross Domestic Product from livestock sector. In addition, indigenous animal genetic resources is known for heat and draught tolerance, disease resistance and subsistence on poor feed, opening scope for allele mining for these traits. Thus the emphasis should be to describe, characterize and document lesser known populations in the country so that the proportion of non-descript population is considerably transformed in to defined breeds. This will be first step towards planning the organized breeding program for their genetic improvement, conservation strategies and sustainable utilization.
Gangatiri, Shahabadi and Purnea are three nondescript cattle populations of eastern India. The registered cattle breeds in and surrounding area include Bachaur and Siri. Bachaur, Shahabadi and Purnea cattle are from Bihar state whereas, Gangatiri belongs to area of Uttar Pradesh state bordering Bihar and habitat of Siri borders Bihar state on eastern side (Figure 1). Typical animals of these breeds and populations are available in their native tracts in optimum number (>4000) to constitute viable breeding populations (FAO 2007). Since last two decades these populations are showing declining trend ) thus need immediate attention, failing which our ancestral efforts of artificial selection, operative since time immemorial, may go in vein and there may be a non-repairable loss from the national and world gene pool. Employment of microsatellite markers is one of the most powerful means for studying the genetic diversity, calculation of genetic distances, detection of bottlenecks and admixture because of their high degree of polymorphism, random distribution across the genome and neutrality with respect to selection (Dodgson et al. 1997). Considering the importance of cattle in Indian agriculture, efforts have been made to evaluate the genetic diversity and relationship in Indian cattle using microsatellite markers. These included native cattle breeds adapted to the north-western arid and semi-arid region of India (Sodhi et al. 2008;, north Indian cattle breeds (Sharma et al. 2009), South Indian breeds (Metta et al. 2004) and cattle of Orissa state and hill cattle of Kumaun (Sharma et al. 2012a). However, the genetic relationship between native cattle breeds of eastern India is unknown.
The present study aimed to genetically characterize and distinguish lesser known cattle populations (Shahabadi, Purnea and Gangatiri) and recognized cattle breeds of eastern India (Bachaur and Siri). Twenty one microsatellites (Short Tandem Repeats, STR) were amplified in five multiplex PCR. We intend to evaluate not only the current diversity but also to know their relationship for the conservation of the genetic diversity in the context of biodiversity management programs.

Biological material
Shahabadi, Purnea and Gangatiri are non-descript cattle from Eastern India (Figure 1). Gangatiri and Shahabadi are white or grey colored animals. Gangatiri animals are reared for milk production (4-6 liters per day) as well as for agricultural operations . Shahabadi cattle are mainly distributed in Buxar, Bhojpur, Kaimoor and Rohtas (Sasaram) districts of Bihar. The milk production varies from 2 to 6 liters in a day (Sharma et al. 2012b). Purnea animals are of small size and compact body. The coat colour of Purnea cattle varies from light to deep red with few greyish white animals (Sharma et al. 2013  bordering Nepal. The animals are white to grey in colour. The cows are poor milker, producing an average of 2.2 kg of milk per day and are managed under extensive management (Sharma et al. 2007). Animals of Siri breed are small in size and are distributed in the hilly tracts of West Bengal (adjoining state of Bihar) and Sikkim states of India. The color most frequently seen is black and white or extensive solid black. The Siri has a hump that is thoracic, muscular-fatty and slightly forward in position as compared to other Zebu breeds of India. The animal carries a thick coat all the year around which protects them from heavy rains and severe cold (Sharma et al. 2008).

Blood sampling and microsatellite analysis
Samples of the populations included in this study represented animals of the original autochthonous phenotype. Blood samples from 243 individuals were collected (Bachaur-50, Gangatiri-50, Shahabadi-48, Purnea-47 and Siri-48) from different villages of habitat ( Figure 1) while avoiding closely related individuals on the basis of detailed interview with owners. Blood samples collected in 10 ml vacuitainer tubes containing EDTA as anticoagulant were stored at −20°C until DNA extraction. DNA isolation was carried out using modified Phenol-chloroform method (Sambrook et al. 1989). Isolated DNA samples were amplified by PCR in correspondence with the selected panel of 21 loci. The loci were chosen, according to ISAG/FAO recommendation aiming to analyze high polymorphic markers spread all over the genome and ability to coamplify in PCR reactions (FAO [Food and Agricultural Organization of the United Nations] 2011). The fluorochrome labeled (FAM, NED, PET& VIC) primers were synthesized by Applied Biosystems (Table 1). For amplification, 50-100 ng of genomic DNA was added to a reaction mixture containing 50 pMol of primer-forward and reverse, 200 μM of every dNTPs, 1.5 mM of MgCl 2 and 0.5U of taq polymerase in a final volume of 25 μl. All the microsatellites were amplified by a MJ thermal cycler at the following conditions: initial denaturation step of 1 min at 95°C, 30 cycles of 1 min at 95°C, 1 min at T°C (optimum annealing temperature of each primer) and 1 min at 72°C and a final extension of 5 min at 72°C. Amplified fragments were separated by capillary electrophoresis using an ABI PRISM 310 automatic sequencer (Applied Biosystems, Foster City, CA, USA). Fluorescently labeled fragments were detected and sized using GeneMapper (version 3.7, Applied Biosystems).
Statistical analysis GENALEX 6.2 software (Peakall and Smouse, 2008) was used to estimate basic population genetic descriptive statistics for each marker and population: gene frequency, observed number of alleles (N o ), number of private alleles, effective number of alleles (N e ), observed (H o ) and expected heterozygosity (H e ) and Hardy-Weinberg equilibrium (HWE). Wright's statistics F IS (f), F ST (θ) and F IT (F) and Nei's (Nei 1987) standard genetic distances among populations were calculated as implemented in GENALEX software. Pair wise matrix of the genetic distances was then used to obtain a Neighbor-joining tree which was visualized using the software TreeView (Page 1996). Bootstraps of 1000 replicates were performed in order to test the robustness of tree topology using the Phylip software (Felsenstein 1993). Multivariate analysis of microsatellite allele frequencies (Principal Component analysis, PCA) was applied to reveal the underlying evolutionary history and admixture among populations. An alternative model-based Bayesian clustering analysis was used to infer how many clusters or subpopulations (K) were most appropriate for interpreting the data without prior information on the number of locations at which the individuals were sampled as implemented in STRUCTURE v2.2 (Pritchard et al. 2000). Analysis was performed with a burn-in length of 50,000 followed by 30,000 MCMC (Marcov Chain Monte Carlo) iterations for each K = 1 to 7 with five replicate runs for each K using independent allele frequencies and no admixture. Optimal K value was selected after analyzing the result files with STRUCTURE Harvester (Earl and vonHoldt 2012).

Results and discussion
In the present study genetic status and diversity of lesser known indigenous cattle populations of eastern India and their relationship with established breeds of the same region was established using microsatellite markers. All microsatellite markers used in this study were successfully amplified in five multiplex sets designed with consideration for annealing temperature, product size and specific dye label in all the populations (Table 1). The genotype data generated in present study showed that significant amount of genetic variation is maintained in local cattle populations. All the markers were found to be polymorphic in each of the five populations analyzed. Considering all the populations, majority of the markers were in HWE. Number of loci deviating from HWE (P < 0.05) were one for Gangatiri (ETH3) and Shahabadi (MM8), two for Bachaur (TGLA122, MM12), three for Siri (TGLA227, ETH10, CSSM66) and none for Purnea cattle populations. The level of variations depicted by number of alleles at each locus serves as a measure of genetic variability having direct effect on differentiation of breeds within a species (Buchanan et al. 1994). All the 21 microsatellite loci showed ample polymorphism for evaluating within breed genetic variability and exploring genetic differences between breeds. A total of 304 alleles were detected with ILSTS34 presenting the highest number of  Table 2). Lower values of expected number of alleles as compared to observed number of alleles in all the populations suggested that there were many low frequency alleles in the populations. The mean observed number of alleles across all the loci was 9.486 ± 0.327 and was higher than other indigenous cattle breeds (Metta et al. 2004;Mukesh et al. 2004;Pandey et al. 2006a and2006b). Lower allelic diversity than studied populations have also been reported in exotic cattle-Burlina-6.7 (Dalvit et al. 2008), Brown Swiss-5.4 (Schmid et al. 1999) and Creole cattle-7.2 (Egito et al. 2007). Previously also the allelic diversity in the Indian livestock breeds has been observed to be higher than that reported for the European counterpart (Joshi et al. 2012) that has been attributed to lack of artificial selection pressure. Allelic diversity of similar magnitude has also been reported in Tharparkar, Rathi and Orissa cattle populations of India (Sodhi et al. 2008;Sharma et al. 2012a). Measures of genetic diversity based on allelic richness are considered important in conservation genetics as marker-assisted methods for maximizing number of alleles conserved have been shown to be effective (Bataillon et al. 1996). It is also relevant in long-term perspective, as selection limits are determined by the initial allelic composition rather than by heterozygosity (Petit et al. 1998).
Estimates of observed heterozygosity including all loci and population (0.704 ± 0.016) confirmed the remarkable level of diversity in the studied populations. Among populations, observed heterozygosity ranged from 0.681 ± 0.04 to 0.721 ± 0.028 with the lowest value found in Purnea cattle and the highest in Siri cattle. Most of the indigenous breeds including Kherigarh-0.57 (Pandey et al. 2006a), Kenkatha-0.54 (Pandey et al. 2006b), Sahiwal-0.43 (Mukesh et al. 2004) and Deoni-0.59 (Metta et al. 2004) showed lower estimates of observed heterozygosity than local cattle populations and the breeds investigated in present study. Overall heterozygosity estimates were comparable with Tharparkar cattle (0.64, Sodhi et al. 2008 Egito et al. 2007). Purnea cattle represented the lowest observed and effective number of alleles as well as observed and expected heterozygosity. On the other hand Siri cattle presented the highest value of the above said parameters amongst all genetic groups (Table 2). Genetic variation is necessary to allow organisms to adapt to ever changing environments with some of this variation stemming from introduction of new alleles by the random and natural process of mutation. Higher genetic variation in Siri cattle must have contributed to its adaptability and this is reflected in wide-spread distribution of Siri cattle not only in India but also in neighboring country, Bhutan.
Observed heterozygosity was lower than expected heterozygosity in Bachaur, Shahabadi, Purnea and Siri cattle populations, showing departure from HWE and analysis of F IS evidenced some heterozygote deficiency too (Table 3). This disequilibrium was caused by heterozygote deficiency in each population which was highest in Purnea (0.042 ± 0.041) and lowest in Bachaur (0.017 ± 0.030). On the contrary, Gangatiri cattle presented slight heterozygote excess in the population (−0.010 ± 0.022) which was expressed in heterozygosity pattern too (H o = 0.709 ± 0.034, H e = 0.702 ± 0.030) ( Table 2). Existence of this population in small geographical region with free grazing of Gangatiri and non-descript animals in a herd could be the likely sources for the sufficient heterozygotes. Positive F IS estimate for Bachaur, Shahabadi, Purnea and Siri indicated either the presence of inbreeding and /or Wahlund effect (presence of population substructure within breed). Since blood samples were collected from different villages, presence of a hidden substructure cannot be ruled out. In fact, animals of the same population but belonging to different villages could derive from genetically different founders. A significant homozygote excess was also observed in other studies on indigenous cattle breeds such as Sahiwal F IS = 0.32 (Mukesh et al. 2004). Most likely inbreeding in these populations is arising from unplanned and unsystematic breeding owing to lack of sufficient number of breeding males required in the breeding region. Moreover instead of local bull semen, exotic (Jersey and Holstein Friesian) or crossbred semen was available in the habitat. This can be one of the causes for dilution of the populations. Together these two factors were resulting in the reduction of true to the breed type animals. Actual picture of Bachaur cattle in its habitat also indicated towards the inbreeding in the population. Very less number of breeding bulls in the habitat of Bachaur cattle has been reported (Singh 2004). Bachaur is a draft purpose breed thus most of the males are used in carrying loads and agricultural operations. These males are castrated at the age of one year leading to their genetic death. Similarly over the last few years the population of Siri cattle has been declining due to extensive cross breeding with the result that Siri animals are now confined only to the remote and inaccessible areas of Sikkim. Thus, few Siri breeding bulls have been left in the habitat which might decrease the effective population size.  Results of F-statistics for each of the 21 loci across populations are presented in Table 4. The global deficit of heterozygotes across populations (F IT ) amounted to 7.3% (P < 0.001). An overall significant deficit of heterozygotes (F IS ) of 2.6% occurred in the analyzed loci because of inbreeding within populations. All loci except six contributed to heterozygote deficit within populations. The studied populations showed a low but significant genetic differentiation among five populations (F ST = 0.048). All loci contributed to the differentiation with the highest values found for ETH25 (19.6%). Genetic differentiation of similar magnitude has been reported among cattle breeds of Orissa and hill cattle of Kumaun (0.044) from India (Sharma et al. 2012a (Kantanen et al. 2000) and Swiss cattle F ST = 0.090 (Schmid et al. 1999) also depicted higher genetic differentiation than populations investigated in this study. The low value of genetic differentiation in Eastern Indian cattle may be attributed to the lack of high selection pressure as compared to established breeds of India or cattle breeds of developed countries. These results reflect that within-breed genetic variations is more than between-breed and this variation could be a valuable tool for genetic improvement and conservation of cattle populations of eastern India.
Pair-wise genetic differentiations quantified by F ST estimates identified Bachaur and Gangatiri (0.008) as the closest populations while the most differentiated were Purnea and Siri (0.044) ( Table 5). Similar results were obtained with Nei's genetic distance matrix. The highest   genetic distance was found between Purnea and Siri (0.292), while Bachaur and Gangatiri were closest to each other (0.037) (Figure 2). In accordance to these observations highest gene flow was among the Bachaur and Gangatiri (N m = 32.809) populations and least between Purnea and Siri (N m = 5.408). Overall N m value also indicated high rate of genetic flow between the populations (10.230 ± 1.922) ( Table 4). Visualization of breed relationship was done by constructing Neighbor joining tree on the basis of Nei's genetic distance. As expected, the Siri was most distinct and separated first, while remaining populations formed two groups with clustering of Gangatiri and Bachaur on one node and Purnea and Shahabadi on second with more than 95% bootstrap value ( Figure 2). This grouping pattern was further supported by PCA analysis. Since phylogenetic reconstruction may not take into account the effects of admixture between breeds, PCA was performed to further investigate possible genetic relationship between recognized breeds and local populations. First three dimension of the PCA (PC1 = 40.35; PC2 = 30.72; PC3 = 22.74) accounted for 93.81% of total variation. In the multivariate space defined by PCA, Bachaur and Gangatiri were much closer. It confirmed distinctiveness of Siri as well as lesser known cattle populations of Purnea, and Shahabadi ( Figure 3). Among methods not assuming predefined structure, tree-based methods use genetic distance between individuals and tree construction algorithm such as UPGMA or Neighbour-joining to group them in clusters. Similarly, multivariate analyses can help in defining clusters of individuals. However these graphical methods are loosely connected to statistical procedures allowing the identification of homogeneous clusters of individuals. An alternative approach to delineate clusters of individuals on the basis of their genotypes at multiple loci was also performed using a Bayesian approach employed in software STRUCTURE. It works by grouping individuals into clusters (K) such that Hardy-Weinberg equilibrium is maximized within clusters. Likely value of K which best captured the variation present in the data was four based   (2005). Siri, Purnea and Shahabadi were grouped in their own clusters. However, Bachaur and Gangatiri animals were intermingled in one cluster ( Figure 4). The results are coincident with genetic distance among the populations as divergence was lowest between Bachaur and Gangatiri. It is important to mention that geographically habitat of Gangatiri and Shahabadi are much closer than Bachaur and Gangatiri. Closeness of Bachaur and Gangatiri instead of Gangatiri and Shahabadi may be due to attempt in field to upgrade local populations with the use of semen of exotic (Jersey and Holstein Friesian) and indigenous milch (Hariana, Sahiwal, Red Sindhi, Tharparkar) cattle breeds. Secondly the habitat of Gangatiri and Shahabadi are separated by river Ganges, which may act as a geographic barrier. Based on present study Siri cattle appeared to be distinct from all other neighboring breeds and populations of eastern region of India commensurating geographical distance of its habitat resulting in reproductive isolation by distance. As per literature Bhutan is said to be the real home of this breed (Nivsarkar et al. 2000). It was distributed from that area to the various parts of Sikkim and Darjeeling region of West Bengal states. Small cattle with similar black and white markings have been found in Sikong Province of China, which occupies a portion of the Tibetan highlands northeast of Bhutan.

Conclusions
It is vital to report that lesser known cattle populations too have high genetic diversity. Unfortunately, animals of these populations are continuously decreasing due to change in agricultural practices as mechanization is fast replacing the traditional practices involving these animals, procedures performed by breeders to increase efficiency (crossing with available exotic or crossbred semen) and apathy of government agencies since these populations are not recognized as distinct breeds. The present study established the uniqueness of two such populations; Purnea and Shahabadi from the recognized cattle breeds of Eastern India. Conservation of genetic variation in these populations should be considered by breeders, in the interest of long term future of the populations in their native tract. To begin with, breed societies need to be created, that will be responsible for registration of these cattle populations as breeds, complete maintenance and improvement of the breed to make it economically sustainable in the transforming agricultural scenario of the country. Figure 4 Clustering assignment of 243 animals representing five East Indian cattle populations using STRUCTURE at K = 4 arranged by Q values. Each individual cattle is represented on the graph by a vertical bar divided into K colored segments corresponding to K genetic clusters. The length of each colored segment is proportional to the individual's membership in the cluster of corresponding colour. Shahabadi (Red), Purnea (Blue) and Siri cattle (Yellow) form separate cluster. Bachaur and Gangatiri (Green) cluster in one group.