Font Size: a A A

Application Research Of Data Mining Technology In Credit Data

Posted on:2020-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ChenFull Text:PDF
GTID:2428330590996028Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
The credit data of the credit reporting agencies has great value and it's worth studying that how to use this data to help the credit reporting agencies judge whether or not the conduct credit business has been dealt with the clients.The common practice for this research process is taking advantage of relevant technologies to realize and evaluate the occurrent credit data and finding out the rules that offer convenience for predicting unknown clients' credit scores which regard as the basic rule to judge whether or not the conduct credit business has been proceeding.This thesis focuses on this issue and analyzes the improvement of data mining methods and the resolution of practical problems.The main work is as follows:Firstly,considering the user rating classification problem in the actual credit data,after preprocessing like supplementing missing value,incorporate the actual situation to clients' division of loanable and non-loanable,the problem would be transformed into a class of two classification.Using XGBoost algorithm establishes a model of predicting clients' potential defaults.The simulation results inform that compared with the traditional Logistic regression and GBDT algorithm,the XGBoost algorithm has better classification effect and its AUC value has been increased by 5.24% and 6.06% respectively.Secondly,the problem of client credit classification rating will be transformed into a multi-class classification problem.That is to say,the client' credits are divided into 4 levels and higher the level,higher the credits.The pre-processed credit data sets up support vector machine ensemble model and calculates distance from the sample point to the hyperplane.The model reduces the impact of subjective factors and lists the top ten credit clients for later research.Thirdly,the thesis comes up with some improvements of Boruta feature selection algorithm.When the shadow feature is established,the proportion of line data shuffling will be reduced.Then,experiments are carried out on these four sets of UCI data sets of different magnitudes.The results show that the feature selection result is improved and the prediction fitting effect has also been improved.After comparing with the traditional methods of reducing the purity and random Lasso,the results have also been improved.Finally,the improved algorithm is applied to the credit data to certify the superiority of this method.
Keywords/Search Tags:Credit data, data mining, XGBoost algorithm, support vector machine integration, Boruta algorithm
PDF Full Text Request
Related items