Font Size: a A A

Research On Logistic Regression Problem With Regular Penalty Term

Posted on:2022-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y H YangFull Text:PDF
GTID:2480306764468474Subject:Insurance
Abstract/Summary:PDF Full Text Request
The Logistic regression model derived from strict theory is widely applied in many fields.However,the traditional Logistic model has over fitting problem,and does not have sparsity,resulting in all or most of the estimated parameters are not 0.However,many examples(such as diabetes risk prediction)show that although there are many risk factors,there are usually only a few key variables that affect the model.In order to solve the above problems,thesis proposes a L1/2+1-logistic regression model with regular penalty term composed of the linear combination of L1/2 norm and L1 norm,and studies the solution algorithm of the model.Research on L1/2+1-logistic model:firstly,based on the analysis of the theoretical properties of L1/2 and L1 regularizer,including unbiasedness,sparsity and Oracle properties,thesis explores the basic properties of L1/2+1 regularizer,including gradient and Hessian matrix,and explores the sparsity of L1/2+1 regularizer from the perspective of graph(solution space)and gradient.Then,taking the linear combination of L1 and L1/2norm as the penalty term,L1/2+1-logistic model is proposed.Further,starting from the traditional logistic model,its iterative form is extended to L1/2+1-logistic model,and the iterative format of L1/2+1-Logistic model is given.Research on the algorithm of L1/2+1-logistic:firstly,based on the iterative format of L1/2+1 model,thesis introduces the coordinate descent method into the solution of the model.According to the idea of coordinate descent,the high-dimensional problem in thesis can be transformed into a series of univariate function extreme value problems.Then,for the above extreme value problem,Then,for the above extreme value problem,thesis classifies and discusses the value of?k,and gives the analytical expression of parameter estimation of L1/2+1-logistic model combined with Cardano formula,so as to give L1/2+1 regularized logistic regression algorithm.For the experimental analysis of L1/2+1-logistic:firstly,thesis simulates six groups ofdata with different structures by controlling the sample size and the correlation coefficient between explanatory variables?ij,and calculates the evaluation index values of L1/2+1method and traditional regularization method:ACC,PPV,Recall and F1-Measure.By comparing the index values,it can be seen that when there is a strong correlation between explanatory variables and the sample size is large,L1/2+1 method is better than the traditional regularization method in the comprehensive performance and accuracy of variable selection.In other words,L1/2+1 method is suitable for data sets with large samples and high correlation between explanatory variables.In addition,considering the actual data of early diabetes risk,the 6 significant variables selected by the L1/2+1 method are Polyuria,Polydipsia,Sudden Weight Loss,Irritability,Itching and Gender,and the indices are all above 0.93.By comparing the L1/2+1 method with other regularization methods and the existing literature,combined with the statistical report issued by the International Diabetes Federation(IDF)in 2021,we can see that the L1/2+1 method has chosen fewer variables and ensured a better comprehensiveness and accuracy at the same time.It is clear that the L1/2+1 method not only ensures better comprehensiveness and accuracy,but also selects fewer variables.
Keywords/Search Tags:L1/2+1-Logistic Regression Model, Variable Selection, Coordinate Descent, Sparsity
PDF Full Text Request
Related items