Research On Logistic Regression Problem With Regular Penalty Term

Posted on:2022-11-09

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Yang

Full Text:PDF

GTID:2480306764468474

Subject:Insurance

Abstract/Summary:

PDF Full Text Request

The Logistic regression model derived from strict theory is widely applied in many fields.However,the traditional Logistic model has over fitting problem,and does not have sparsity,resulting in all or most of the estimated parameters are not 0.However,many examples(such as diabetes risk prediction)show that although there are many risk factors,there are usually only a few key variables that affect the model.In order to solve the above problems,thesis proposes a L_1/2+1-logistic regression model with regular penalty term composed of the linear combination of L_1/2 norm and L₁ norm,and studies the solution algorithm of the model.Research on L_1/2+1-logistic model:firstly,based on the analysis of the theoretical properties of L_1/2 and L₁ regularizer,including unbiasedness,sparsity and Oracle properties,thesis explores the basic properties of L_1/2+1 regularizer,including gradient and Hessian matrix,and explores the sparsity of L_1/2+1 regularizer from the perspective of graph(solution space)and gradient.Then,taking the linear combination of L₁ and L_1/2norm as the penalty term,L_1/2+1-logistic model is proposed.Further,starting from the traditional logistic model,its iterative form is extended to L_1/2+1-logistic model,and the iterative format of L_1/2+1-Logistic model is given.Research on the algorithm of L_1/2+1-logistic:firstly,based on the iterative format of L_1/2+1 model,thesis introduces the coordinate descent method into the solution of the model.According to the idea of coordinate descent,the high-dimensional problem in thesis can be transformed into a series of univariate function extreme value problems.Then,for the above extreme value problem,Then,for the above extreme value problem,thesis classifies and discusses the value of?_k,and gives the analytical expression of parameter estimation of L_1/2+1-logistic model combined with Cardano formula,so as to give L_1/2+1 regularized logistic regression algorithm.For the experimental analysis of L_1/2+1-logistic:firstly,thesis simulates six groups ofdata with different structures by controlling the sample size and the correlation coefficient between explanatory variables?_ij,and calculates the evaluation index values of L_1/2+1method and traditional regularization method:ACC,PPV,Recall and F1-Measure.By comparing the index values,it can be seen that when there is a strong correlation between explanatory variables and the sample size is large,L_1/2+1 method is better than the traditional regularization method in the comprehensive performance and accuracy of variable selection.In other words,L_1/2+1 method is suitable for data sets with large samples and high correlation between explanatory variables.In addition,considering the actual data of early diabetes risk,the 6 significant variables selected by the L_1/2+1 method are Polyuria,Polydipsia,Sudden Weight Loss,Irritability,Itching and Gender,and the indices are all above 0.93.By comparing the L_1/2+1 method with other regularization methods and the existing literature,combined with the statistical report issued by the International Diabetes Federation(IDF)in 2021,we can see that the L_1/2+1 method has chosen fewer variables and ensured a better comprehensiveness and accuracy at the same time.It is clear that the L_1/2+1 method not only ensures better comprehensiveness and accuracy,but also selects fewer variables.

Keywords/Search Tags:

L1/2+1-Logistic Regression Model, Variable Selection, Coordinate Descent, Sparsity

PDF Full Text Request

Related items

1	Variable Selection In Log-Birnbaum-Saunders Regression Models
2	Classification Variables Of Logistic Regression Model And Its Application Research
3	Click Through Rating System Based On Distributed Logistic Regression Model
4	The Variable Selection Problem In Cox Model And Cox Model With Varying Coefficients Based On The Adaptive LASSO Method
5	The Cluster Elastic Net For High-Dimensional Logistic Regression
6	The Research Of Resource Access Model Based On Logistic Regression
7	Variable Selection Problems Using Bayesian Method And Graph-constrained Regularization For Analysis Of High-dimensional Genomic Data
8	Variable Selection Mothod In Several Regression Models
9	Random Lasso Method In Logistic Regression
10	Research On Auto Insurance Renewal Rate Based On Kernel Logistic Regression