Font Size: a A A

Research On Internet Personal Credit Riskprediction Based On Machine Learning

Posted on:2020-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:2518306563967289Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In the Internet finance market of rapid development,Internet credit is an important reference for avoiding the risk of loss.The prediction of Internet personal credit risk by machine learning has become a study hot-spot,which helps to strengthen the construction of Internet credit reference system.The Internet credit data is different from traditional credit data.It has massive,high-dimensional,unstructured features which makes traditional credit model satisfy technical requirements difficultly.It is an effective way to predict Internet personal credit risk by machine learning.This paper mainly uses machine learning to study the two key parts of feature selection and prediction model established.To address the high-dimensional Internet credit data,this paper first establishes Boruta model to filter out most of the features independent of the variable.This method overcomes limitations which filter method is irrelevant of learner and wrapper method depends on learner.Then,the optimal feature subset is further selected by using genetic algorithm;and the 10-fold cross-validation is introduced as external performance estimation in the process of genetic algorithm in order to avoid over-fitting.The result shows that the proposed Boruta-GA feature selection method is feasible.The credit data is seriously imbalanced,and allocates predictors more often to the majority class.However,the accuracy of the minority class is more important for financial companies.From this perspective,an improved random forest algorithm(CS-RF)is proposed.Using the data after feature selection,and the cost sensitive function is introduced after generating decision trees to improve RF.Give different costs to misclassification of the majority class and the minority class,and searcher for better decision trees.The improved model in this paper is compared with logistic regression,BP neural network,support vector machines and RF.The accuracy,sensitivity,specificity and AUC value are used as the model evaluation indexes.The result shows that the improved CS-RF model has better prediction results,and is superior to other models in overall accuracy and less class prediction accuracy.
Keywords/Search Tags:Boruta algorithm, Genetic algorithm(GA), Random forest(RF), Unbalanced sample, Internet credit
PDF Full Text Request
Related items