Font Size: a A A

The Research Of Key Technology In Gang Rape Mixed DNA Analysis And Separation Software Development

Posted on:2015-01-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:N HuFull Text:PDF
GTID:1264330428474016Subject:Forensic medicine
Abstract/Summary:PDF Full Text Request
In this study, scientifically verified experimental data were used for evaluatingparameters of mixed DNA and exploring constraints through constructing theexperimental models of mixed DNA in gang-rape cases; and then separationmodels for the mixed DNA was constructed based on STR genotyping data.The separation models were compared with the mixsep software abroad; andthey were then transformed into a software package whose efficacy andapplicability was verified using the genotyping data of the simulated mixedDNA. This study has brought forward an basic expert system for theindividual identification of mixed DNA.Part I: Constructing an Experimental Model of Mixed DNAObjective: Experimental models of two-male mixed DNA andthree-person mixed DNA (two males+one female) were used to simulate themixed DNA samples in gang-rape cases. ABI7500real-time PCR analyzerwas used to construct the simulated mixed DNA, including sample preparationwith different contributors and different mixed ratios. And the scientificallyverified experimental data was used for evaluating parameters of mixed DNAand developing the separation model. Afterwards, the deviation D valuebetween the measured Mxand the theoretical Mx, and the Mxvalue estimatedby the mixsep software were taken as the scientific verification indexes for theexperimental model. The data quality of the experimental model wasevaluated through data mining and statistical analysis.Method: ABI7500analyzer was performed on the DNA extracted from50whole blood samples. During the construction of simulated two-male andthree-person mixed DNA, single DNA samples were classified as contributorsbased on the criterion that the DNA concentrations were very close so as toensure the preparation of different mixed ratios through the volume adjustment of DNA solution; and in order to avoid the “overfitting” which might becaused by simple sample types and insufficient sample size while constructingthe separation model, multiple types of mixed DNA from differentcontributors needed to be constructed, and each mixed DNA should containmultiple mixed ratios to ensure that they could objectively reflect the impactof mixed DNA profiles and mixture proportion (Mx) on the analysis; besides,the concentration of the original mixed DNA solution needed to be adjusted tothe recommended concentration range within0.5-1.25ng/μl, so as to meet therequirement of PCR Amplification Kit for DNA template.Results: With the DNA concentration difference no less than0.5ng/μl asthe standard for classification, there were22single DNA samples that met thestandard for two-male mixed DNA, which could construct11groups; andthere were12single DNA samples that met the standard for three-personmixed DNA, which could construct4groups. The deviation D value of mixedDNA’s Mxwithin95%confidence interval before and after PCR amplificationwas≤0.1with relatively small fluctuation, which indicated that the data usedto construct the experimental model for two-male mixed DNA were of goodquality. Therefore, it could provide a favorable data basis for the accurateseparation of mixed DNA genotype.In mixed Identifiler profiles, among the root mean square errors (RMSEs)of the measured alpha values, the data with relatively larger RMSEs werescattered among the11groups of samples; except that the RMSE of1:1ratiowas>0.02, all the RMSEs for the rest8ratios were within the range of0.01-0.02. That is, the RMSE differences between the measured Mxs andtheoretical Mxs were no more than0.02in the simulated two-male mixed DNAprofiles. This experimental model could provide a favorable data basis forscientific analysis of mixed DNA.In mixed Yfiler profiles, measured alpha values with relatively largerRMSEs were scattered among the11groups of samples; except that theRMSEs of1:3and1:4ratios were>0.02but <0.3, all RMSEs of the rest ratioswere within the range of0.01-0.02. That is, in the Yfiler profiles of two-male mixed DNA constructed in this experiment, the RMSE differences betweenthe measured Mxs and theoretical Mxs were no more than0.03.Conclusion: In this part, with ABI7500Analyzer and the scientificverification of experimental model,297simulated two-male mixed DNA and264simulated three-person mixed DNA for simulating the mixed DNA ingang-rape cases were established. Besides constructing the separation modelfor mixed DNA and R&D of the separation software, the Identifiler-STRprofiles of297simulated two-male mixed DNA would also provide datasupport for the evaluation analysis, and regularity mining for the parameters ofmixed DNA (such as the average peak height/area of active alleles, mixtureproportion, heterozygote balance ratio, allelic drop-out, and inter-locusbalance).Part II: Parameter Estimation and Mixsep Software Verification for theSimulated Two-male Mixed DNAObjective: to clarify the constraints in the mixed DNA analysis and findout their regularity by evaluating and analyzing the parameters of mixed DNA,and by analyzing the correlations among parameters. Through applying thesimulated mixed DNA profiles data into mixsep software, it would beexpected to verify its advantages and disadvantages for further improvement,providing reference and efficacy comparison for the development of ourmixed DNA separation model.Methods: For correlation analysis between the peak height (PH) andpeak area (PA) of mixed DNA profiles, the generalized additive model fittingmethod was adopted for curve fitting, and the least square regression analysiswas used to compute the regression coefficient in order to observe whetherthere was efficacy difference of between the two quantitative information inthe mixed DNA analysis.For correlation analysis between the two parameters of APH and Hb, thelocally weighted regression and Kruskal-Wallis rank test were adopted fornon-normally distributed data, and the Hbdistribution corresponding to16STR loci and9mixed ratios could be analyzed separately. Variation analysis of fluorescence sensitivity was performed on the STRloci corresponding to each channel of the mixed DNA profiles through theparameter analysis of each channel’s fluorescence sensitivity with APH andthe Inter-locus balance (Ci), so as to prove whether there was differencebetween the efficacy of each STR locus in the mixed DNA analysis; and themultiple test was performed through the Tukey’s Honestly significantdifference method.All statistical charts in this paper were drawn with the ggplot2(Version0.9.3) program package of R software (Version3.0.1).Results:1Correlation analysis of PH and PA:The distribution of PH andPA corresponding to16STR loci showed that, besides the good linear relationbetween loci D19S433, D3S1358, D58S18and D8S1179, there was asignificant linear relation between the PH and PA of the rest12loci. This wasprimarily consistent with the study conclusion of Tvedebrink, i.e. PH and PAhad a good linear relation, and the two quantitative information could both beused in mixed DNA analysis with little difference in the analytical efficacy.2Correlation analysis of APH and Hb: Through Kruskal-Wallis rank-sumtest, the p value for the Hbdistribution of each locus was0.0063, which wasless than0.05, indicating that the Hbdistribution of each locus werestatistically different; besides, the p value for the Hbdistribution of each mixedratio was0.02257, which was less than0.05, indicating that the Hbdistributions of each mixed ratio were also statistically different, that is, the Hbdistribution would be affected by STR locus and mixed ratio. WhenAPH<1250rfu, Hbvalue significantly increased (from0.75to around0.87);When APH≥1250rfu, Hbvalue was almost constant and Hbmean value was0.878. Combined with the experimental data, when APH≥1250rfu, theHb>0.6threshhold data accounted for92.74%.Among loci CSF1PO, D19S433, D21S11, D2S1338, and vWA, therewere more data with correspondingly high Hband high APH value than theother loci; when the imbalance of the mixed ratio increased (from1:5to1:9),there would be more data with lower Hband APH value. 3Correlation analysis of APH and drop-out: For relatively balancedratios (1:1to1:3), there were less allelic drop-outs (ADO); but for the veryimbalanced ratios (1:7to1:9), the ADO number increased rapidly, suggestingthat its number was correlated with the Mx; along with the increment of ADO,the relevant sample APH gradually decreased.4Impact of fluorescence sensitivity on APH: In order to test whetherthere was statistical difference among APH mean values of four channels atdifferent fluorescence sensitivities, multiple test was performed based onTukey’s Honest Significant Difference method. There was no differencebetween the fluorescence sensitivities of blue channel and green channel(p=0.446), and there was also no difference between the fluorescencesensitivity of yellow channel and red channel (p=0.530). And for the tests ofrest4groups, blue and yellow group, blue and red group, green and yellowgroup, and green and red group, the corresponding p values were all equal to3.95E-08, which was far less than0.05. The sensitivity to blue and greenfluorescence therefore differed significantly from that to yellow and red, thatis, the ABI3130xl Genetic Analyzer was truly more sensitive to blue andgreen fluorescence than to the other two.The median APH of locus D8S1179was highest in the blue channel. Themedian APH of loci D3S1358, TH01, and D13S317were higher than those ofthe other loci in the green channel, and the median APH of loci D18S51andFGA were lowest in the yellow and red channels. That is, the distribution ofAPH was generally consistent with the molecular size of the STR loci, and themedian APH values of loci with small molecular sizes (i.e., D8S1179, D21S11,D3S1358, TH01, D13S317, D19S433, vWA, AMEL-, and D5S818) wererelatively high.5Analysis of parameter Ci: The Pearson correlation coefficients (R2) ofthe mean and median Ciwith the ADO count were-0.7179and-0.7065,respectively. The corresponding P values were1.736E-3and2.215E-3,indicating statistically significant differences; the mean and median Cihadsignificant negative correlations with the ADO count. The locus with the highest Cimedian was D8S1179.The Civalues distribution of16STR loci were generally consistentedwith the fluorescence sensitivity of ABI3130xl Genetic Analyzer in the fourchannels, that is, ABI3130xl had higher fluorescence sensitivity to blue andgreen channel, and the corresponding8loci D8S1179, D21S11, CSF1PO,D3S1358, TH01, D13S317, D16S539, and D2S1338(D7S820as an exception)all had higher Civalues; but ABI3130xl had lower fluorescence sensitivity toyellow and red channels, and the corresponding6loci, D19S433, vWA, TPOX,D18S51, AMEL-, and FGA (Except D5S818) all had lower Civalues.6Horizontal analysis of mixsep: Correlation analysis was carried out onmixed ratios and locus separation accuracy, revealing correlation coefficientR2=-0.7121and p value=0.03139; the two had negative linear correlation.Besides, correlation analysis was also performed between mixed ratios and theADOs count, revealing R2=-0.4244and p value=0.2549, suggesting nomarked correlation. Ratio1:1had the lowest accuracy. And along with theincrease unbalance of mixed ratios, the corresponding accuracy rised at first,and then decreased. Among them, ratios1:2,1:3, and1:4had higher accuracy,while ratios1:1and1:9had relatively lower accuracy and greater variation.The locus separation accuracy without ADO was higher than that with it,meaning the allelic dropout would impair the analytical efficacy of mixsep.7Vertical analysis of mixsep: Loci D5S818, D8S1179, and FGA had ahigher accuracy (>88%), while loci D19S433, D2S1338, and D7S820had alower accuracy (≤80%); loci AMEL-, D5S818, and D8S1179had the leastdropout count, while loci D18S51, D19S433, FGA, TPOX, and vWA hadmore dropout count (>15); and these5loci were all located at yellow and redchannels with lower APHs, which was consistent with the pattern where ABI3130xl Genetic Analyzer had lower fluorescence sensitivity to yellow and red.For ratio1:1, except loci AMEL-and D3S1358, accuracies of all theother loci were≤70%; and the outliers at the lower area of the box plot werethe data of this ratio. For ratios1:2,1:3,1:4, and1:5respectively, theaccuracies of each locus were all higher, and particularly at ratio1:3was the highest (≥90%); for ratios1:8and1:9, the locus separation accuracies werecomparatively more fluctuated with lower mean values.Conclusion: Combining the APH of DNA profiles, mixed ratios andSTR loci, correlation analysis on parameters Hb, Ci, and fluorescencesensitivity, as well as efficacy analysis of mixsep software, this study suggests:during the genotype separation of the mixed DNA profiles in ABI Identifiler,if the APH of this profile was greater than1250rfu while the mixed ratio waswithin1:1to1:5(excluded1:1), we prefer the genotype separation results ofloci D8S1179, D21S11and CSF1PO in blue channel, loci D3S1358, TH01,and D13S317in green channel, loci D19S433, vWA, and TPOX in yellowchannel, and loci AMEL-and D5S818in red channel (with a total of11loci).That is, separation efficacies of16STR loci in the mixed DNA analysis aswell as evidence strength were not the same. If the APH of mixed DNAprofile was less than1000rfu and the mixed ratio was extremely imbalanced(lower than ratio1:6), and when allelic dropout was not clear or there was noknown samples, it is not recommended to perform the software analysis withthe mixed DNA hastily, which easily leads to misjudgment. Moreover, mixedDNA profiles with mixed ratio close to1:1could not undergo genotypeseparation and individual identification. That is, even if there were completemixed DNA separation model and analytical software, around the time ofgenotype separation, artificial judgment of forensic investigators were stillneeded, and an expert conclusion could not be drawn simply based on thesoftware report.Part III: the Separation Model Construction and Efficacy Analysis inMixed DNAObjective: Constructing scientific and conservative mixed DNAseparation model based on large number of simulated mixed DNA profiles, toverify the efficacy of the separation model, and to compare it with the mixsepsoftware, so as to prove the robustness of separation model constructed.Method:1Na ve Bayesian model: The peak height of alleles wereassumed to conform to normal distribution. For convenience, the prior distribution of mixed proportion α was also assumed to normal distribution N(m, A); the variance parameter τ was still a parameter, and the α had no relationship with the parameter τ, therefore, the marginal distribution of ha was deduced as follows: For the variance super parameter A of prior distribution, when the experimental data was relatively accurate, the difference between α of each locus was≤0.05; with the three standard deviation ranges taken for interval estimate, the prior α variance was about A=001672≈0.00028(obtained through data experience of this lab). Since A was very small, the B2A relative to the original variance could be ignored, then the marginal distribution for the ha could be simplified as: ha|τ~N(Bm+C,τ2) In which, m was obtained through prior distribution; for all the genotypes of each locus, when the likelihood value was the largest, its corresponding optimal match and parameters could be obtained through maximizing marginal likelihood; and for the suboptimum matched genotypes, it could be done through artificial judgment. Our experience was that the likelihood difference from optimum match exceeding over1.5times was not considered.2constrained single locus analysis model:Given the initial mixed proportion, the experiential constraint was performed on the fluctuation range of mixed proportion α, and then for all the genotypes of each locus, the α and variance parameter were solved through maximizing likelihood function. The assumed condition for normal distribution would still be followed, and then the mean of alleles peak height and variance were as follows, respectively: Herein, the limiting conditions of the mixed proportion is α∈[a, b], and variance parameter is τ∈[(?), M], with (?) close to0while M is usually large. Through limitation of parameters, the genotypes are traversed and the likelihood function is maximized; if the a to be solved reached or exceeded the upper-limit or lower-limit of the constraint, even if the likelihood of this genotype were the maximum, the genotype would still be warned or excluded. In addition, according to the formula of the variance parameter τ2, the better the peak height fitting is, the smaller the variance parameter will be obtained; and when the variance parameter τ2approaches the lower limit (?), the corresponding peak height fitting will be close to its best. Combining the experimental mixed DNA data generated from our lab, the estimated α fluctuates around the prior α solved through Naive Bayesian model, with the variation range≤0.08, in which the variation ranges of two allelic bands≤0.05. If the estimated α approached the upper limit or lower limit of the constraint, then this locus would be very much likely to be abnormal whose optimal match could be replaced with the suboptimum one according to experience.Results:The two types of separation models constructed in this study, Naive Bayesian model (Called Bayer, for short) and constrained single locus analysis model (Called Iter, for short), both separation errors on locus AMEL-of NAN3-1-9-B DNA profile, and the misjudged genotype combinations were X,X and X,Y (the result was the same as mixsep). In the forensic DNA testing, locus AMEL-played an important role in the suspect gender inference. When other factors affecting mixed DNA profile were not considered, it was not conservative enough to directly infer whether the mixed DNA was from multiple males or males and female just relying on the peak height of this locus, which could cause the separation model to misjudge the gender of suspects, thus providing a wrong direction for the case.In the influential factors of peak height degradation, when molecular weight became the major factor, the misjudged locus could be corrected through peak height adjustment; while for loci whose molecular weight were not the major causes for peak height degradation (such as locus vWA of NAN3-1-5-B sample), the separation result after peak height adjustment remained the same. In another word, the peak height degradation coefficientwas relatively conservatively estimated based on the mixed effect model.Therefore, the peak height adjustment is only effective for some STR loci inmixed DNA profiles.Conclusion: The research started with global consistency problem, Na veBayesian and single locus solving, and through constructing Bayer model andIter model, genotype separations were done in5mixed DNA profiles datawhich did not have ideal analytical results from the mixsep. On the premisethat the peak height degradation causing separation error was not considered,the combined use of bayer and Iter, could make the best matched genotypesanalysis get more ideal results. In addition, the mixed effect model constructedcould conservatively solve the phenomenon of peak height degradation; whenmolecular weight was the major factor contributing to peak height degradation,the misjudged loci could be corrected through peak height adjusting; that is,the peak height adjustment was only effective to some STR loci, which wasonly taken as an optional correction.Part IV: Development of sepDNA Software in mixed DNA analysis andCase ApplicationObjective: Select STR marker as the input data which is compatible withthe DNA database of Chinese Forensic Science, using the two separationmodels developed in this study together, to research and develop a mixedDNA separation sepDNA software, and to verify the robustness and reliabilityof sepDNA software through experimental data.Methods: The sepDNA software package was developed through Rlanguage by converting the source code from the multiple mixed DNAseparation models constructed in Part III, and adding the source code ofsepDNA user interface; application verification was performed to theanalytical efficacy of this software.Results: The sepDNA software contained two separation models andmultiple little modules. In the Bayes model, through exploring prior mixtureproportion, it was converted to normal distribution where average peak height was only related to genotype; and the best match genotype was found aftermaximizing the marginal likelihood function. And in the Iter model, jointanalysis was replaced by unilateral analysis on each locus, and empiricalconstraint was performed on the fluctuation range of mixed proportion; themixture proportion and variance parameter of each locus were solved throughmaximizing likelihood function, and traversal solving to single locus was donethrough parameter constraint.The two types of models completed genotype separation of mixed DNAfrom two different modeling ideas, global optimizing and local optimizing of. Although the separation results of loci D3S1358and D7S820impaired theoverall separation accuracy of the Iter model, to consider from the genderestimation of locus AMEL-and the robustness of separation model, it wasnecessary to use these two models combinedly and take the separation resultas a reliable one when both models had the same result; and for differentresults in two models, further artificial judgment was needed to ensure that theseparation report of the mixed DNA had robustness and reliability.Conclusion: The sepDNA software created in this study includes twoseparation models, Bayer model and Iter model. The two models should beused together and the separation result of the mixed DNA appears in bothmodels is considered as a reliable one in order to ensure robustness andreliability. This software has no module of allelic drop-outs. In the software,there was parameter information of “average peak height” and “mixtureproportion”, and if the average peak height was too low or the mixed ratio wasextremely unbalanced, it prompted that allelic drop-out might happen in mixedDNA profile. The analysis report of sepDNA was needed to be drawn with theparameter information and artificial judgment. In the three-person mixed DNAseparation module of this software, there was a function called “set up fixedgenotype”, which could properly increase the separation accuracy ofthree-person mixed DNA, but the efficacy of this module still needs to befurther verified with more three-person mixed DNA data.
Keywords/Search Tags:Mixed DNA, mixture proportion, average peak height/areaof active alleles, heterozygote balance ratio, allelic drop-out, inter-locusbalance, R, short tandem repeats, forensic genetics
PDF Full Text Request
Related items