| In the research of human genetic diseases,genome-wide association analysis(GWAS)is a critical analytical method aimed at identifying sequence variations,namely single nucleotide polymorphisms(SNPs),that exist within the entire human genome,and selecting gene loci that are associated with specific diseases.Linear models are typically used for screening gene loci,but scholars have further applied nonlinear models to this process.However,with the application of high-throughput sequencing technology,traditional variable selection methods are no longer suitable for ultra-high-dimensional gene loci.Some scholars have applied non-parametric independent screening methods to nonlinear models,but these methods do not take into account the uncertainty of gene models.In practical research,it is often impossible to determine the true model that affects the disease in advance,and specifying an incorrect gene model can lead to reduced statistical power.Secondly,in ultra-high-dimensional association analysis,the gene model is usually constructed as a linear model,but this study uses B-spline fitting to model the nonlinear part of the actual model.Therefore,this study constructs a generalized additive model that considers the uncertainty of the gene model,namely the nonlinear part of the logistic model,which avoids the strong constraint conditions of the linear model in parameter estimation.At the same time,based on the sparsity of high-dimensional gene data,this study uses a two-step screening method,first reducing the dimension of gene loci to high-dimension through non-parametric independent screening(NIS),and then further screening high-dimensional gene loci using the SCAD variable selection method,in order to select relevant disease-causing gene loci.Finally,the effectiveness of the NIS-SCAD two-step screening method based on the nonlinear logistic model is evaluated through numerical simulations to determine whether this method can accurately screen disease-causing gene loci. |