| With the development of high-throughput sequencing technology and the completion of the Human Genome Project,the study of the complex diseases pathogenesis is gradually turning into Genome-Wide Association Study(GWAS),which further reveals the pathogenesis of complex diseases by analyzing the association between Single Nucleotide Polymorphisms(SNPs)and complex disease phenotypes.As the pathogenic mechanisms in real SNP data are unknown and the small sample size and high dimensionality of real SNP data are not conducive to the identification and analysis of SNP interaction.The use of SNP simulation data provides benchmark data for evaluating the effectiveness of SNP interaction detection methods,but how to effectively calculate SNP epistatic models and embed them into the simulation data is an urgent problem.Therefore,this dissertation proposes a SNP data simulation method that complements existing SNP data simulation tools in both model calculation and data generation and designs and implements SNP data simulation software to promote the in-depth research on SNP interaction detection methods.Specific research contents are as follows.(1)A simulation method for high-order SNP epistatic models is proposed to address the problem of calculating high-order SNP epistatic models with marginal effects.Firstly,the method selects one of the calculated parameters of prevalence and heritability for the model calculation,reducing the constraints on the calculation.Secondly,the simulation method uses a resampling method to generate simulated sample data by randomly segmenting and sequentially stitching together real biological SNP data,and then embedding the calculated SNP epistatic models to generate the sample phenotype.Each of the three properties of the simulation method is validated and the experimental results show that the simulation method has good applicability.(2)A simulation method for SNP data based on the solution of an under-determined system of equations is proposed to address the problem of calculating the SNP epistatic model without marginal effects.The simulation method takes the penetrance values of the epistatic model to be solved as an unknown and transforms the calculation of the SNP epistatic model without marginal effects into a problem of solving the system of under-determined equations.The under-determined linear and under-determined non-linear systems of equations are calculated using either the complete orthogonal decomposition method or Newton’s method,depending on the prevalence constraint or the joint prevalence and heritability constraint,respectively.The simulation method also uses a resampling method to simulate the sample data and generate sample labels.The experiments show that the simulation method is a good method for SNP data simulation and can provide data security for the study of SNP interaction detection methods.(3)Based on the above two simulation methods,a SNP data simulation software is designed and implemented.The simulation software not only provides two methods of calculating SNP epistatic models,but also adds the function of user input of SNP epistatic models.In terms of the output of the simulated data,a resampling method and a randomly specified MAF of SNP data are provided to generate SNP simulation data.Finally,the SNP data simulation software provides multiple data output formats to output SNP simulation data files and pathogenic model files.The model calculation function and data output function of this simulation software enable researchers to analyse SNP data conveniently and comprehensively to facilitate the development of SNP simulation data research. |