| Microhaplotype(MH)was proposed by Professor Kidd in 2013 and introduced into the field of forensic science.Due to the advantages of not being disturbed by stutter and having good polymorphism,microhaplotypes are considered as ideal genetic markers in forensic mixed DNA analysis.This study focuses on microhaplotype and uses next-generation sequencing(NGS)as a detection method.By setting certain screening conditions,new(the SNPs contained in the reported microhaplotype loci are not completely consistent)high-efficiency(Ae≥4)microhaplotype loci were obtained.A microhaplotype composite detection system was constructed.By conducting typing tests on a certain scale of Han Chinese population samples,corresponding population genetics and forensic science parameters were obtained,and conduct confirmatory research on the composite detection system.Finally,preliminary exploration was conducted on mixed DNA genotyping.The main research content is as follows:(ⅰ)This study uses the 1000 Genomes Project and Genome Aggregation Database r2.1.1(gnom AD)as source databases.Filter single nucleotide polymorphism(SNP)based on the following principles:1)Retain SNPs with allele frequencies between 0.02 and 0.98;2)Remove SNPs located in genome repeat regions;3)Remove SNPs from low complexity sequence regions and repetitive sequence regions;4)Delete SNPs located in the encoding area.The aggregation of microhaplotype loci adopts a sliding window approach,which is achieved through two scripts.One script fixes the window size and sliding distance at 200 bp(100 bp or300 bp)and 10 bp,respectively,and the other script deletes unnecessary information.Obtain microhaplotype loci based on the following conditions:1)The distance between the locus and the centromere region must exceed 5 million bases(Mb);2)Each microhaplotype locus contains SNPs≥3;3)Delete microhaplotypes containing insertion-deletion(In Del)mutations;4)Only the marker with the highest effective number of alleles(Ae)is retained in the microhaplotypes found within 5 Mb(1 Mb),while removing the microhaplotypes with overlapping SNPs.Finally,only the microhaplotype loci with Ae≥4 were retained.Through the above methods,efficient microhaplotype loci were obtained.Among them,301microhaplotypes obtained under the screening conditions of 200 bp and 5 Mb were not completely consistent with the reported loci,with an average Ae(effective number of alleles)of6.30.(ⅱ)Primers were designed for each locus,and genotyping was performed on DNA samples of 819 unrelated individuals of Chinese Han population.Using Integrative Genomics Viewer(IGV)software,the allele coverage ratio(ACR)was unbalanced(ACR<0.5),the sequencing depth was less than 100,and the number of alleles was greater than 2,and the loci that affected the correct genotyping factors in the sequences were excluded,and 202 loci were determined as the final composite detection system.(ⅲ)Testing was conducted on 819 unrelated individuals of Chinese Han population to obtain various population genetics and forensic scientific parameters.The frequency distribution of alleles ranges from 0.0006 to 0.5849;The Ae value ranges from 1.9440 to 12.2668,with an average Ae value of 5.0039;The expected heterozygosity(He)ranges from 0.4859 to 0.9190,with an average value of 0.7914;The discrimination power(DP)ranges from 0.6195 to 0.9871,with an average value of 0.9232;The total discrimination power(TDP)is 1-3.2705×10-232;The probability of exclusion(PE)of is 0.1715~0.8352,with an average of 0.5818;The cumulative probability of exclusion(CPE)is 1-6.4155×10-79.(ⅳ)Confirmatory research on composite detection systems mainly includes:a)Repeatability and accuracy research.The sequencing results of three parallel experiments on the same sample showed that the sequencing depth of all three duplicate libraries in the same sample was greater than 5000×,and the system equilibrium was greater than 0.98;All 202 loci in three libraries of the same sample obtained complete genotyping and consistent genotyping;Further compare the genotypes of 903 SNPs covered by 202 microhaplotype loci with the whole genome sequencing(WGS)typing results of DNA samples(average sequencing depth>100×),in this sample,only seven loci with partial SNP inconsistencies were observed,bidirectional Sanger sequencing was performed on these inconsistencies,and the results showed that these SNP typing were consistent with the results of this study.b)Research on Interlocus and Intralocus Equilibrium,the average depth of coverage(DOC)of this detection system in 60unrelated Han Chinese individual samples was calculated to be 8337±3704,with 199 loci having an average sequencing depth greater than 20%of the overall average;The average ACR reached 0.88±0.09.c)Sensitivity study,use separately 2800M control DNA samples of 2.00 ng,1.00 ng,500.00 pg,250.00 pg,125.00 pg,62.50 pg,and 31.25 pg to prepare libraries.When using DNA input levels of 2.00 ng~125.00 pg,the average sequencing depth of most gene loci in the library is relatively high,above 6000×,and all loci can be correctly typed.When the DNA input is reduced to 31.25 pg,99.17%of the loci can be fully detected.d)Research on Simulated Degradation DNA Detection Capability,by heating 2800M control DNA at different times to simulate the degradation degree of forensic DNA samples,for samples with a degradation index(DI)value≤3.07,all microhaplotypes obtained complete genotyping.When the DI value reaches 627.38,47 loci(23.27%)can still be detected.e)Research on PCR inhibitors,when the concentration of tannic acid was less than 150.0 ng/μL,humic acid was less than 30.0 ng/μL,and heme was less than 50.0μM,the detection rate of loci was close to 100%.When the concentrations of tannic acid,humic acid,and heme were 250.0 ng/μL,45.0 ng/μL,and 62.5μM,respectively,3.47%,35.15%,and 35.64%of loci could still be correctly typed.(ⅴ)Establish a mixed microhaplotype sample splitting model and optimize the key parameters of the model,2-person mixed samples of different genders and mixing ratios were split.Through statistical analysis,it was found that the first contributor of all mixing ratios(except for 3:1,1:1,1:3)had over 95%of loci that could be split,and the splitting accuracy was100%;Even if the mixing ratio is as high as 49:1 or 1:49 among the second contributors,there are still about 20 or more loci that can be correctly split.When mixing 3 to 5 people,the splitting efficiency will significantly decrease with an increase in the number of mixed people.In summary,this project screened new and efficient microhaplotype loci,and based on this,constructed the 202 microhaplotypes composite detection system.Using this composite detection system to test 819 samples of unrelated Han Chinese individuals and obtain various population genetics and forensic science parameters,further consolidating the foundation of genetic polymorphism data.At the same time,based on the obtained basic data,a preliminary microhaplotype mixed DNA splitting model was constructed,which played a certain promoting role in the application of microhaplotype genetic markers in mixed DNA genetyping. |