Font Size: a A A

Research On Individual Credit Evaluation Method Based On Imbalanced Data

Posted on:2022-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:T H YunFull Text:PDF
GTID:2518306332951759Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In today's information age,Internet economy promotes the rapid development of Internet credit industry.A variety of network credit consumer products come into our lives,such as ant flower,Jingdong Baitiao and so on.These credit consumer products are easy to operate,simple process and high income,which make more borrowing customers and investors join in.Therefore,the way in which the credit rating problems faced by the network credit industry can be effectively addressed is the research direction of relevant scientists at home and abroad.Among them,the use of machine learning algorithm to build a personal credit risk assessment model is a common way to solve such problems.However,personal credit data not only has complex index variables,but also has the problem of imbalanced categories.The traditional model has poor classification effect in the face of this kind of data,which makes the prediction result not ideal.Therefore,in the face of a large number of complex indicators and imbalanced categories of data,it is of great significance to effectively identify default customers.Based on this background,this paper reviews the research results of domestic and foreign scholars on the issue of personal credit evaluation.In order to improve the prediction accuracy of imbalanced data sets,we use the transaction data set of Q1 in2018 on the "lending club" platform for empirical analysis.The original data contains112 features,107864 samples,and negative samples account for 1.9% of the total samples.In addition,data cleaning,data conversion and data standardization are carried out for the original data package;secondly,aiming at the problem of complex data set indicators,three filtering feature selection methods are combined with correlation visualization to realize feature screening,eliminate redundant features,reduce the data set dimension and improve the classification ability of the model;thirdly,smote is used to over sample the data class imbalance Methods 7948 negative samples were added to balance the data set,and then three machine learning models,decision tree,support vector machine and random forest,were used to evaluate the default risk of borrowers.According to the comparative analysis of multiple models and the balanced data obtained by smote method,the prediction accuracy of the hybrid model in the file is96.2% and 97.1% respectively.According to the decision-making,transmission and random selection of the tree,the prediction accuracy of the hybrid model in the file is96.2% and 97.1% respectively,it has different degrees of improvement and achieves the goal of this paper.Through the research,it is found that the hybrid feature selection method proposed in this paper reduces the correlation between features and the spatial complexity of the algorithm,and improves the operation efficiency of the algorithm;compared with the classification ability of traditional machine learning model for imbalanced data sets,the balanced data sets processed by smote oversampling method are not only classified in the logistic regression model combined with multiple probability prediction The results show that the model has better fitting ability to the data set and higher prediction accuracy,which further improves the stability and generalization ability of the model when facing the class imbalanced data set.
Keywords/Search Tags:Individual Credit Evaluation, Class-imbalanced Problem, Feature Selection, Machine Learning
PDF Full Text Request
Related items