Biogenetics and genomics research has revealed many disease-related gene mutation problems,and important methods for studying complex diseases include genome-wide association analysis(GWAS),which uses millions of single nucleotide polymorphisms(SNPs)in the genome to compare and find gene loci with strong disease-causing relevance throughout the genome.In traditional genetic research,scholars have considered the use of linear and generalized linear models for gene association analysis,in view of the heterogeneity of the data,some scholars have proposed to use Gaussian mixed models for gene association analysis,but such studies do not fully consider the uncertainty of genotype data,because in the process of gene sequencing,we usually only know the probability of three genotypes of the locus,and the genotype of each locus is difficult to measure accurately,and its true genotype cannot be obtained.Therefore,based on the uncertainty of genotype data,this paper further considers the use of nonparametric Gaussian mixed model modeling for gene association analysis.At the same time,due to the large scale and complexity of the data involved,this paper uses the punitive likelihood method to select the variables of high-dimensional gene data based on Gaussian mixture model,so as to screen the pathogenic gene locus,and confirms it by numerical simulation,and the results show that the variable selection method based on Gaussian mixture model has good results.Finally,the method of this paper is summarized and prospected. |