| Microfinance refers to the government or financial institutions provides small amounts of credit services to the low and middle income group.As the rise of Internet consumer finance,microfinance gradually penetrated into all aspects of our lives.It is important for commercial banks and small loan companies to assess and handle individual credit risk.For the personal credit score model,the main modeling methods at home and abroad include: expert model,machine learning,linear programming,discriminant analysis and Logistic regression.Logistic regression has no assumption of variables,and can transform input into monotone increasing output bounded by 0 and 1.And it has good explanatory and stability,so it is widely used in credit score.In order to reduce the subjective limitation,scholars introduce GAMKL(assuming the link function is Logistic)into the credit scoring model.The composition function of each variable can be linear form or a form of nonlinear.GAMKL(assuming the link function is Logistic)expanded the application of Logistic regression in the in the field of credit score model.But scholars did not testify why the connection function is Logistic.In order to solve the above problems,the main work of this paper is as follows:First,systematically introduce the methods of variable selection,including variable selection methods based on penalty functions,Dantzig selector and its derivation methods for response variables larger than samples,SIS and its derivation methods for response variables far larger than samples.In the present credit scoring model in the process of modeling,scholars mostly use Lasso and its derivative method Alasso.But these methods are put forward in the background of generalized linear.In order to reduce the limitation on the model,this article uses the nonparametric independent scanning for variable selection.Second,this paper detailed introduces the random Forest,GAMKL(assuming the link function is Logistic)and GAMUL.We combine these methods with the nonparametric independent scanning methods to build three models: randomForest,GAMKL(assuming the link function is Logistic)and GAMUL.Third,we analyze a real dataset about loan repayment.Since Logistic can transform input into continuous monotone output bounded by 0 and 1,previous researches on credit risk prediction were basically carried out around logistic.However,there is no sufficient reason to believe that the independent variable and response variable are Logistic relations.In addition to get a better prediction accuracy,microfinance financial companies that provide data also want to know how each variable affects credit risk.In order to solve these two problems,this paper use the nonparametric independent scanning method to select variables firstly,which fully takes into account the nonlinear relationship between independent variables and dependent variables.Then,apply GAMUL to fit the data.NISGAMUL did not make subjective assumptions both in variable selection and modeling.And compared it with the machine learning algorithm random Forest and NIS-GAMKL(assuming the link function is Logistic).Finally,this paper evaluates the above three models from the perspectives of explanatory and the accuracy of prediction:First,from the perspectives of explanatory,Although the random Forest lists the important order of variables,this method has poor explanatory power compared with the other two models.Because it votes from many decision trees and has no way to describe the impact of each variable on the response variable.Second,from the perspective of prediction accuracy,The performance of GAMUL is better than random Forest and GAMKL in specificity and overall. |