Font Size: a A A

Research And Application Of Variable Selection Based On SELO Penalty Function

Posted on:2021-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:R X ZongFull Text:PDF
GTID:2480306110464434Subject:Statistics
Abstract/Summary:PDF Full Text Request
The rapid development of the internet and data storage capabilities has led to the generation of massive data.How to accurately and efficiently mine important information from high-dimensional data has become the key to processing massive data.Therefore,the method of variable selection has become one of the statistical problems that statisticians focus on.The Seamless-0L(SELO)variable selection method is based on the expression form of the0L penalty function,and a continuous function is constructed instead of the0L penalty,so that the SELO method not only retains the advantage of the0L method directly penalizing the number of non-zero elements,but also overcomes its shortcomings of discontinuity,And the SELO method has better performance in model selection and parameter estimation than the classical variable selection method.Therefore,this article applies SELO to partially linear models and complex network graph models,constructs new methods of parameter estimation and network structure analysis under partially linear models,and studies its asymptotic properties and applications in practical problems.The specific innovation content and research results are as follows:(1)In the linear model and the Cox model,the SELO method performs well in model selection and parameter estimation,and the estimation satisfies the properties of Oracle.Considering its application in the above model,this paper combines the method with a partial linear model,proposes SELO estimation of the parameters of the partial linear model,and discusses the asymptotic properties of the parameter estimation.It is proved that under certain conditions,the parameter estimation under this method has consistency,sparsity and asymptotic normality,that is,it satisfies the Oracle property.(2)In the complex network graph model,the variable selection method can be introduced as a regular item to make the analysis result of the network structure more accurate.Based on the excellent properties of the SELO variable selection method,this paper applies it to a complex network graph model,using SELO as a regularizer,and proposes a new regularization model.The model can simultaneously carry out model selection and multivariate covariance estimation to realize the recovery and analysis of complex network structures.Through the proof of its related asymptotic properties,the covariance estimates under this model have asymptotic normality.Comparing the model with the Graphical Lasso method in numerical simulation,the results show that the regularization model based on the SELO method performs better than the Graphical Lasso method in the recovery capability of the network structure,and is more effective than the Graphical Lasso method in solving high-dimensional data problems.Finally,the gene expression data of E.coli bacteria is analyzed by examples.The results show that the regularization model proposed in this paper can find more actual regulatory relationships between genes than Graphical Lasso,which further illustrates the practical application of the model in the structure of gene networks.Has excellent performance.
Keywords/Search Tags:variable selection, SELO, Partial linear model, Complex network graph model
PDF Full Text Request
Related items