Font Size: a A A

Research On Credit Evaluation Method Based On Mixed Sampling And Stacking Integration

Posted on:2021-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:W R LuFull Text:PDF
GTID:2518306200453154Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In the internet age,the scale of both online and offline credit business has grown rapidly in the context of China's economic prosperity.However,the problems of overdue and fraud behind the prosperity may lay hidden dangers for the continued and stable development in the future.At present,the main means for financial institutions to control credit transaction risk is credit assessment.Applicant data,past credit records,credit card consumption,online borrowing consumption,and credit records from other platforms can all be used as data sources for assessment.However,due to characteristics of the large sample size,many features,imbalance,redundancy and many more of these data,manual methods and general machine learning methods cannot dynamically meet the urgent needs of financial institutions for the accuracy of risk prediction,and thus cannot play the due value of the data.Data imbalance is one of the most obvious characteristics of credit data and the main factors affecting model performance.Aiming at the limitations of the existing imbalanced data processing methods,a hybrid sampling method considering sparse boundary samples(HSCSBS)is proposed,the boundary points are identified by calculating the coefficient of variation of each sample,and the sample space is divided into boundary and non-boundary domains.The negative samples in the non-boundary domain are under-sampled,and the positive samples on the boundary region are over-sampled using the SMOTE algorithm to maximize the retention of sample information and eliminate overlapping information,and finally achieve a basic balance between the two types of samples.Experiments show that the HSCSBS algorithm can effectively help the model to improve recognition rate.In order to further improve the recognition rate ability of the model,an improved traditional Stacking integration algorithm was constructed using the Stacking integration framework.Firstly,a strong classifier with two integrated ideas of Bagging and Boosting and logistic regression(the most widely used credit evaluation algorithm)are used as the base classifier for Stacking at the first layer,in order to avoid over-fitting problems caused by cross-learning when training the base classifier,a 50-fold cross-validation is nested internally;unlike the traditional learning strategy of Stacking,the improved integration framework no longer uses the category value as the feature value when constructing new features,but uses the probability value as the feature value.At the same time,the meta-classifier no longer uses a single logistic regression,but combines the three algorithms of logistic regression,improved Gaussian Naive Bayes,and support vector machines as meta-classifiers in a weighted average manner.Compared with other6 algorithms,the improved Stacking integration algorithm has the highest values in Recall,F1-Score,AUC and KS,which are improved by 2%,4%,2% and 1.3%,which can enhance the identification rate of "common customers" to assist credit institutions in decision-making.
Keywords/Search Tags:Stacking, credit evaluation, imbalanced data, coefficient of variation, meta classifier
PDF Full Text Request
Related items