Font Size: a A A

Research On The Risk Control Model Based On Machine Learning Algorithms

Posted on:2021-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y L HuFull Text:PDF
GTID:2518306107459514Subject:Statistics
Abstract/Summary:PDF Full Text Request
As is known to all,under the background of the age of big data now,the source of the credit data is very complex.In addition to the bank's records,credit data also includes various social data,electricity supplier data,operator data,and credit data of other financial institutions,etc.Not only are there many types of data,but also problems such as severe data missing and abnormal data are common.Therefore,how to deal with serious data sparseness has become a problem in the risk control of consumer credit big data,and how to mine hidden information from these massive credit data to evaluate consumer credit status became a challenging task.Based on this background,this paper mainly studies how to use machine learning algorithm to build an effective risk control model,so as to predict the default probability of customers.Considering that in the process of constructing a risk control model,it is necessary to reduce the dimension of high-dimensional sparse credit data better,thus this paper first deeply researches a classic filtering feature selection algorithm,the Relief algorithm.However,the Relief algorithm has obvious shortcomings when applied to imbalanced data.Therefore,this paper improves the sampling strategy of the algorithm and uses the adjusted cosine similarity to measure the correlation between features to remove redundant features..Based on this,this paper proposes a tsRelief algorithm for imbalanced data,and verifies the effectiveness of the improved algorithm through experiments.Subsequently,we preprocesses the selected credit data set,and uses the tsRelief algorithm proposed in this article to select features.Then the risk control models based on logistic regression,XGBoost and random forest algorithm were established respectively,and the model parameters were optimized.Later,the Stacking method was used to fuse the three models.At the end of the article,we use AUC as the evaluation index to compare the performance of risk control models based on different algorithms from multiple perspectives,and analyze their advantages and disadvantages.The progress of this work enriches the choice of risk control model in the financial industry.In addition,the model was built on the data without feature selection,and it was found that after using the tsRelief algorithm,the AUC value of each model on the test set has increased to varying degrees,further confirming that the algorithm has certain feasibility in the field of risk control.
Keywords/Search Tags:The risk control model, Machine learning, Feature selection, Relief, Stacking
PDF Full Text Request
Related items