Font Size: a A A

Research On Credit Evaluation Based On Improved Random Forests Algorithms

Posted on:2020-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2428330578456093Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of economy,people's consumption concept is gradually changing,and the scale of personal credit consumption in China is expanding.Personal credit is to provide personal credit according to certain agreements on the basis of mutual trust,For example,banks can decide whether to lend to customers on the basis of personal credit,and so on.Reasonable and effective personal credit assessment is conducive to ensure the normal operation of banks and national economic security,and it can also better develop banking business,promote consumption growth and national economic development.However,with the increasing of fraud and default in today's society,it is very important to judge whether customers are honest or not and whether credit fraud will occur when lending.At present,along with the rapid development of personal credit consumption in China,there are also the following problems.On the one hand,the current personal credit system is not sound enough,on the other hand,the existence of imbalanced data sets in bank databases reduce the accuracy of credit evaluation.These problems may lead to huge losses for banks.In order to solve these problems,improving the accuracy of personal credit evaluation has a very important and far-reaching practical significance for the development of personal credit consumption scale in China.In massive data sets,random forests acquire subsets of data through self-help sampling.The decision trees constructed by these subsets of data can achieve good classification and prediction results.Random forests have the advantages of high prediction accuracy and low time complexity,which make them widely used in many fields,such as Internet attack recognition,medical diagnosis,image processing and so on.Combining these advantages,random forest is also suitable for personal credit evaluation research.In this thesis,credit evaluation based on improved random forest algorithm is proposed.Firstly,some indexes affecting the credit evaluation are analyzed,and the corresponding index system is established.Then,the model is classified according to the established index system.But in practical application,because the data obtained by personal credit are often imbalanced data sets,In data processing,undersampling and oversampling methods are often introduced.,however,due to the undersampling method,many sample information may be lost.,and oversampling may cause a few samples to be over fitted..In this thesis,a new random forests algorithm(BSI)based on mixed sampling is proposed.This method firstly introduces coefficient of variation is to find out the sparse domain and dense domain samples,and then deal with them in different ways,an oversampling method(BSMOTE)is proposed for the minority samples in sparse domain.An improved undersampling method(IS)is proposed for the minority samples in dense domain.Finally,the balanced data sets are sent to the random forest classifier for training.In the research of personal credit evaluation,experiments show that the improved algorithm achieves higher G-mean value,F-value value,AUC value,and has an advantage over the traditional algorithm on credit evaluation.
Keywords/Search Tags:Credit Evaluation, Random Forests, Imbalanced Data Sets, Oversampling, Undersampling
PDF Full Text Request
Related items