| With the rapid development of the global economy,people’s economic level has continued to rise in the recent decade,and people’s consumption concept and financial management concept can be changed "dramatically".Consumption concept from "within your means" and "instant consumption" to "advanced consumption",financial concept from"money bank" and "buy products" into "the" advanced consumption mode",but financial institutions in credit lending business cannot accurately grasp the law between" high return"and" high risk ",in the lending business for the considerable profits also bear the huge risk of default challenges.Although the number of personal credit lending is huge,but compared with other enterprise lending business involved amount is very small,therefore,every financial institution in the personal credit lending business can not use the enterprise lending business standards and ways,in such a context,personal credit risk score.At the Fifth Plenary Session of the 19th CPC Central Committee on October 29,2020,it once again proposed to " improve the modern financial supervision system,improve the transparency and level of financial supervision,improve the level of rule of law,improve the deposit insurance system,improve the financial risk prevention,early warning,disposal and accountability system,and have zero tolerance for illegal behaviors."This paper aims to establish effective personal credit score models by using big data means and techniques,and by constructing a nuclear principal component logistic regression analysis model.The hybrid model is first based on the nuclear principal component analysis(KPCA)theory,which realizes the principal component extraction of the data set,finds the optimal subset of all attributes,and reduces the dimension of the eigenvectors.Then,the optimal subset after dimension reduction is taken as the input variable of the logistic regression analysis model,and the optimal parameters are brought into the model to conduct personal consumption loan credit prediction,which shortens the training time.Then,we will continue to compare L2 and L1 regularization using the logistic regression analysis model.Finally,the logistic regression analysis model and the single LR model including the PCA-LR,PCA-SVM,PLS-SVM,PLS-LR and KPCA-LR were compared using evaluation indicators,selecting the optimal model.Before starting the analysis,we characterized the gravel map of the original data and selected four principal components for logistic regression analysis,then compared L1 regularization and L2 regularization,and used the evaluation index F1score.The F1score value of KPCA-LR remained around 0.835,but the F1 score value of PCA-LR was greater than 0.830 but failed to exceed 0.832.We we compared the five mixed models and a single LR model,KPCA-LR,PCA-SVM,PLS-SVM,K L S-LR and KPCA-L R,and the ACC,TPR and TNR showed that the results of KPCA-LR model were better than the other models.Finally,for a further comparison of the model accuracy,our comparison of the ROC plots and AUC value sizes across models shows that the difference between LR,PCA-SVM and PCA-LR is small,and the effectiveness of PLS-L S-LR and PLS-SVM is slightly stronger than the LR,PCA-SVM and PCA-LR,while the R P C A-LR models have ROC curves and AUC values than several other mixed models and a single LR model.Experimental results show that KPCA-LR model and KPCA-LR and PCA-LR,PCA-SVM and PLS-SVM,PLS-LR and KPCA-LR and LR,which are the best method among several credit score models.Main innovation point of this article is the kernel principal component analysis and logistic regression analysis,in personal credit score,the better finished our expected goal,to a certain extent,provides a new credit score.However,we must also realize that the amount of data we have is relatively insufficient in the process of writing the paper.If the data set can be more abundant,our results can be improved to some extent. |