Font Size: a A A

A Credit Assessment Hybrid Model Based On CPSO Optimization Algorithm

Posted on:2021-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y W TieFull Text:PDF
GTID:2428330605964138Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,with the rise of Internet finance,more and more individuals and small and mi-cro enterprises begin to make commercial and consumer loans to various types of financial machine purchase.Then these loans have great risks for banks and consumer financial in-stitutions because the loan amount is low and there is no mortgage,some users will have overdue non-repayment behavior.Therefore,an accurate credit evaluation model for users is particularly important.However,the credit evaluation model not only needs to automati-cally analyze the information provided by users but also to select the complex features.At present,in the models used by various institutions,the IV value is usually used for feature selection,and a single logistic regression is used as the training model.Although such a combination is simple,its accuracy is relatively low,and it may not identify potential dan-gerous users.In addition,in real-life data,the positive and negative proportion of samples is extremely unbalanced,which makes the model's ability to identify bad samples decline.When training the model,if a large number of machine learning models are introduced,the complicated parameter selection will also affect the accuracy of the model.To establish an effective credit evaluation model,the following three problems need to be solved:1.Sam-ple imbalance.In reality,there are fewer negative samples,the proportion of which is only about 5%.In this case,the accuracy of model prediction will be reduced.Make the model invalid.2.Feature selection.At present,the credit evaluation data shows the characteristics of high dimension,irrelevant and redundant variables will have a negative impact on the accuracy of the model prediction.Too much feature selection may lead to over-fitting.At present,most banks use to calculate the ? value of each feature in the data,and select fea-tures according to the size of the ? value.The accuracy of the feature selection method is poor,which may affect the training results of the model.3.Parameter optimization.Among the models based on a variety of machine learning algorithms,there are many parameters in the algorithm,and the parameter value directly affects the model effect.For example,XGBoost algorithm contains more than ten features,such as learning rate(learningrate),maximum depth(maxdepth)and minimum weight of leaf node(minchild weight).How to determine the optimal combination of parameters has reached the best training effect.This is the problem we need to solve.The main work of this paper is as follows:First of all,in the process of data analysis,data preprocessing,data cleaning for high-dimensional data.And deal with the imbalance of positive and negative samples.The main work of the thesis is as follows:firstly,in the process of data processing for high-dimensional data,including data analysis,data preprocessing,data cleaning.Because of the imbalance between positive and negative samples.The thesis achieves sample balance by increasing the number of samples.Here we mainly use the remote oversampling algorithm.The basic idea of the remote algorithm is to analyze and simulate a small number of class samples,and add new artificial simulation samples to the data set,so that the two samples of the original data balance.KNN technology is used to simulate the algorithm.Sampling nearest neighbor algorithm can calculate the k-nearest neighbor of a few samples in each unbalanced sample.Then randomly select n of these samples for random linear interpolation.Finally,the new samples are combined with the original data.Secondly,in feature selection,the random forest(RF)feature selection methods embodied in this paper are wrapper and gradient lifting decision tree(GBDT)feature selection methods embodied embedded method.Two different decision tree models are used to ensure the feature information and better adapt to the hybrid intelligent model.At last,the paper trains the logistic regression model,support vector machine model and XGBoost model.The parameters of xgboost model are optimized.Optimize the above model.A single optimal model is obtained by chaos particle swarm optimization.The voting integration method is used to aggregate these models and generate hybrid intelligent models.Through the above methods,the thesis trains and tests the data of the personal credit data set and enterprise credit data set.It has been proved that the model with multiple hybrid technologies has higher accuracy and better effect.
Keywords/Search Tags:SMOTE Oversampling, extreme gradient lifting tree, chaotic particle swarm, voting set model, feature selection
PDF Full Text Request
Related items