Research On The Risk Control Model Based On Machine Learning Algorithms

Posted on:2021-03-12

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Hu

Full Text:PDF

GTID:2518306107459514

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

As is known to all,under the background of the age of big data now,the source of the credit data is very complex.In addition to the bank's records,credit data also includes various social data,electricity supplier data,operator data,and credit data of other financial institutions,etc.Not only are there many types of data,but also problems such as severe data missing and abnormal data are common.Therefore,how to deal with serious data sparseness has become a problem in the risk control of consumer credit big data,and how to mine hidden information from these massive credit data to evaluate consumer credit status became a challenging task.Based on this background,this paper mainly studies how to use machine learning algorithm to build an effective risk control model,so as to predict the default probability of customers.Considering that in the process of constructing a risk control model,it is necessary to reduce the dimension of high-dimensional sparse credit data better,thus this paper first deeply researches a classic filtering feature selection algorithm,the Relief algorithm.However,the Relief algorithm has obvious shortcomings when applied to imbalanced data.Therefore,this paper improves the sampling strategy of the algorithm and uses the adjusted cosine similarity to measure the correlation between features to remove redundant features..Based on this,this paper proposes a tsRelief algorithm for imbalanced data,and verifies the effectiveness of the improved algorithm through experiments.Subsequently,we preprocesses the selected credit data set,and uses the tsRelief algorithm proposed in this article to select features.Then the risk control models based on logistic regression,XGBoost and random forest algorithm were established respectively,and the model parameters were optimized.Later,the Stacking method was used to fuse the three models.At the end of the article,we use AUC as the evaluation index to compare the performance of risk control models based on different algorithms from multiple perspectives,and analyze their advantages and disadvantages.The progress of this work enriches the choice of risk control model in the financial industry.In addition,the model was built on the data without feature selection,and it was found that after using the tsRelief algorithm,the AUC value of each model on the test set has increased to varying degrees,further confirming that the algorithm has certain feasibility in the field of risk control.

Keywords/Search Tags:

The risk control model, Machine learning, Feature selection, Relief, Stacking

PDF Full Text Request

Related items

1	Research On Internet Financial Risk Control Model Based On Machine Learning Algorithms
2	Relief-based Feature Selection Algorithms
3	Research On Fraud Identification Of Vehicle Insurance Based On Machine Learning
4	The Design And Application Of Early Warning Model Of Enterprise Loan Default Risk Based On Stacking Fusion Algorithm
5	The Study And Application Of Feature Selection Algorithms Based On Relief
6	Research On Optimization Algorithms Of Stacking Classifiers
7	P2P Online Loan Default Risk Early Warning Research
8	Stock Price Prediction Research Based On Feature Selection And Improved Stacking Algorithm
9	Research On E-commerce Purchase Behavior Prediction Based On Feature Selection And Stacking Integrated Algorithm
10	Research On Detection Algorithm Of Extortion Software Based On Machine Learning