Font Size: a A A

Research On Identification Methods Of Gene-gene And Gene-environment Interaction Based On Penalty Function

Posted on:2022-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:W W XieFull Text:PDF
GTID:2480306542486124Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the development of genome-wide association studies,more and more cancer-related pathogenic genes have been discovered,and how these genes act on diseases has become a hot topic in biomedical and bioinformatics.Gene and environment affects the pathogenesis of complex diseases.Therefore,the gene-gene interaction and gene-environment interaction has become a focus of multi-field and interdisciplinary research.In this paper,two penalty regression methods are proposed to solve the variable selection problem of interaction effects.The main contents and conclusions of this paper are as follows:(1)For gene-gene interaction variable selection,Hierarchical Minimax Concave Penalty(Hier MCP)is proposed.Hier MCP approach is based on a structured interaction analysis model that can accommodate multiple data types.MCP penalty maximizes the convergence of penalty loss in the sparse region.In this paper,the MCP method is improved to satisfy the hierarchical structure between the main effect and the interaction effect,and coordinate descent method is used to optimize the function.Taking survival data as an example,Hier MCP can effectively compress the dimension of covariates and has a good identification accuracy for gene-gene interaction.(2)For gene-environment interaction variable selection,the method of LAD-SCAD(Least Absolute Deviation combined with Class Clipped Absolute Deviation)was proposed.The LAD-SCAD model has a good prediction effect in data containing outliers,and covariables with sparsity can be screened out.LAD loss can adapt to the long tail error in contaminated data.SCAD penalty is unbiased,sparse and continuous,and the corresponding solution converges to the local minimum.The model selected by LAD-SCAD is simple and has good explanatory ability.(3)The SNP data and gene expression data were simulated and the real data were empirically studied.The results of data simulation and empirical analysis lung adenocarcinoma data and breast cancer data demonstrate that Hier MCP and LAD-SCAD proposed in this paper can effectively select the variables that satisfy the hierarchical structure constraints between the main effect and interaction effect for continuous response variables and sub-type response variables.Using true positive(TP),false positive(FP),predicted mean square error(PMSE),and root of sum of squares of errors(RSSE)as evaluation indexes,the proposed method is significantly better than the existing methods for selecting high-dimensional data interaction variables.
Keywords/Search Tags:gene-gene interaction effect, gene-environment interaction, variable selection, high-dimensional data, MCP penalty, SCAD penalty
PDF Full Text Request
Related items