| Objective:Microhaplotype(MH)is a kind of new genetic marker discovered in recent years,which refers to the combination of 25 single nucleotide polymorphism(SNP)loci within 200 bp,usually located in non-recombinant hotspot.MH has the advantages possessed by both short tandem repeats(STR)and SNP,such as short fragments,low mutation rate,high polymorphism,containing ancestral information,and so on,while without the disadvantages of them,such as stutter peak of STR and disability of SNP for mixed samples.Therefore,it is a potential forensic marker being worth of study.This project intends to filter microhaplotype loci that is suitable for Chinese population,constructing a detection panel using next generation sequencing(NGS),evaluating the forensic application value of the panel,to provide new genetic markers and technical programs for forensic individual identification and kinship identification.Method:SNP loci with heterozygosity greater than 0.4 are filtered from the 1000-Genome database.Microhaplotype loci are filtered based on 25SNPs not located in recombinant hotspots within 200 bp.With the selected loci as the core content,we constructed a microhaplotype database website.Twelve loci with an autosomal,heterozygosity greater than 0.7 and a balanced allele frequency are selected to construct a second-generation sequencing complex panel.To extract DNA from peripheral blood of Hebei Han population and construct a library with QIA Targeted DNA Custom Panel,it mainly includes five steps:genomic DNA fragmentation,terminal modification,DNA junction and purification,targeted enrichment and purification,routine PCR and PCR product cleaning,library quantification.The libraries are mixed in equal volume,diluted and degenerated according to the kit instructions,and sequenced on MiSeq FGx platform.The original data are processed by computer programs such as BWA to obtain genotyping results and evaluation panel.Three libraries constructed with the same sample are used for repeatability study,and the sensitivity of the same sample is studied with 20 ng,10 ng,5 ng,2.5 ng and 1.25 ng DNA template starting quantities.Population genetic data of Han population in Hebei Province are investigated with unrelated individual genotyping data,which provided a basis for the application of the panel.The paternity index is calculated with the tested family samples.Result:1.Nomenclature of microhaplotypesWe propose a nomenclature method for microhaplotypes,which consists of five parts.Taking mh02-174285354/400/461 as an example,"mh"represents the abbreviation of microhaplotype to distinguish different genetic markers,"02"represents chromosome 2,"174285354"represents the chromosome location of the first SNP of the locus,and"3"represents the number of SNPs in the locus.2.Laboratory evaluation of sequencing dataThe coverage depth of 54 tested samples are all over 500 x,the highest is2198 x,the lowest is 518 x,and the average DoC of all samples is 1391×.Before the analysis of sequencing results,we set the analysis threshold,and determined that the sequence whose reads is more than 20%of the total reads is valid.The results showed that the effective reads of all loci are more than 80%of the total reads.On this basis,the quality of 12 loci sequencing data from 54 samples is analyzed.The results show that the average coverage of each locus is more than600 x,the average coverage of 12 loci is 1379±1218×(mean±2SD),and the minimum coverage is 628×.The coverage depth of two loci are over mean±2SD.The constituent ratio of locus sequence(%allele,%noise)showed that%allele had the lowest value of 84.11%and the highest value of 98.96%.The allele of all loci is 93.98%on average.Except for three loci,the allele of other loci is more than 90%.The lowest%noise is 1.04%,the highest is15.89%,and the average%noise of all loci is 6.0%.The lowest heterozygosity ratio is 0.683,the highest is 0.876,and the average heterozygosity ratio of all loci is 0.738±0.06.Among the 12 loci,the average heterozygosity ratio is ranged from 0.6 to 0.7,and 10 loci is greater than 0.7.3.Repeatability studySequencing results of three libraries constructed with the same sample shows that the allele genotyping results of three repeated sequencing are consistent.There is no significant difference between 3 repeats of%allele(P=0.43)and ACR(P=0.56)in each library.4.Sensitivity studyWhen the DNA template is 20 ng,10 ng and 5 ng,the sequencing data has no significant difference.With the decrease of the DNA input,ACR shows a downward trend,and CV value shows an upward trend,which means the imbalance between alleles increased.When the DNA template is 1.25 ng,allele loss occurs at three loci.5.Forensic parametersThe panel consists of 12 microhaplotype loci,each containing 2 to 4 SNP.In 46 unrelated individuals,3 to 14 alleles are detected at each locus.Overall,the number of alleles increased with the increase of SNP loci.Population genetic parameters of 46 unrelated individuals are analyzed.After the Bonferroni correction,it shows that allele frequencies of all loci are in accordance with Hardy-Weinberg equilibrium and all loci are in linkage equilibrium.Ho of 12 loci ranges from 0.500 to 0.935,with an average of0.789.He ranges from 0.505 to 0.858,with an average of 0.758.Ho of 10 loci is greater than 0.7,with a high genetic polymorphism;PIC ranges from 0.375to 0.835,with an average of 0.705;DP ranges from 0.525 to 0.964,DP of 8loci is greater than 0.9,TDP of 12 loci is 0.99999999811;PEduouo value is 0.125to 0.550,and CPEduo is 0.995;PEtrio is 0.1870.712 and CPEtrio is 0.99995.Among 12 loci,Ae values of 10 loci are greater than 3.0,5 loci are greater than 5.0,and Ae values of 2 loci are greater than 6.0.When four Ae>5.0 loci are used to detect the samples,99.84%of them are theoretically possible to detect a mixture of two samples.6.Practical case applicationThe microhaplotype typing panel is used to detect two family samples.The results show that there is at least one identical allele between father/mother and son,and no mutation or recombination occurs.The CPE of triplets is 0.99995,and the panel efficiency meet the paternity identification standard.In 7 triplet families,the average CPI is 12741,the maximum is54477.9784,and the minimum is 1472.2114.CPI of two families is greater than 10000,which reach the identification standard.CPI of other families ranges from 0.0001 to 10000.The cumulative grandparent index(CGI)of grandparent/child pairs or uncle/nephew pairs is also calculated.In the absence of parental references,CGI ranges from 0.118 to 17.072 for 9 kinship pairwise,with an average of5.804;CGI ranges from 0.030 to 0.882 for 9 pairs of unrelated individuals,with an average of 0.415,and CGI for 2 pairs of grandchildren/child falls within the range of unrelated individuals.However,when biological mother’s microhaplotype results are considered,the CGI ranges from 0.051 to 84.048for 9 pairs of grandparent/child pairs or uncle/nephew,with an average of17.579,and 0.008 to 0.407 for 9 pairs of unrelated individuals,with an average of 0.158.Only one kinship pairwise and unrelated individuals crosses the CGI range.7.Construction of Microhaplotype DatabaseWe filter SNPs on 22 autosomal chromosomes of CHB and CHS from1000 human genome data.According to the presupposed conditions,245479microhaplotype loci are filtered.The haplotype,haplotype frequency and heterozygosity of the loci are estimated by PHASE.With the selected data as the core content,a microhaplotype database website,MPH Database 1.0(http://www.ehbio.com/MPH/),is established to search microhaplotype data.Conclusion:In conclusion,we screened out 245479 microhaplotype loci with high polymorphism and good balance for Chinese Han population,and established a microhaplotype database on the basis of these data.We developed an NGS panel and analysis method of 12 microhaplotype loci,which perform well with high sensitivity and repeatability.Using this panel we obtained the population genetic data of 12 microhaplotype loci in Hebei Han population,which laid a foundation for forensic application. |