Font Size: a A A

Research On Pre-loan Risk Control Of Consumer Finance Based On Machine Learning

Posted on:2022-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:J Q YuanFull Text:PDF
GTID:2518306536467624Subject:Engineering
Abstract/Summary:PDF Full Text Request
Consumer finance,as an infrastructure industry that supports consumer consumption and the real economy in China,has developed rapidly in recent years.With the “14th Five-Year Plan” and the “Double Cycle New Development Pattern” proposed,the development priority of consumer finance has been further elevated.The essence of consumer finance is micro finance,and its main business is to provide consumers with consumer loans to meet consumers' daily consumption of non-durable goods.However,with the expansion of the business,consumer finance is facing increasing risks.Due to the existence of information asymmetry,financial institutions can't fully grasp user information when providing consumer loans to users,and the non-performing loan rate keeps rising,which has brought huge losses to both the country and financial institutions.Therefore,only the establishment of a strong credit risk control system can maintain the healthy development of the industry.Fraud detection and default prediction are two key business processes of pre-loan review process of credit risk control,and their results play an important role in credit quality.Among them,fraud detection checks whether users have repayment intentions,and default prediction checks whether users have repayment capabilities.Aiming at these two core business scenarios,this thesis proposes a cost-sensitive random forest self-training semi-supervised fraud detection model combined with isolation forest and a supervised default prediction model based on optimized Boruta and XGBoost algorithm to make the pre-loan review process more accurate and effective.The main research results of this thesis are as follows:(1)A cost-sensitive random forest self-training semi-supervised fraud detection model combined with isolation forest is proposed.In the presence of a large number of unlabeled samples,pseudo-labels are given to unlabeled samples through the self-training algorithm in semi-supervised learning,which is used to expand the training set of the base classifier cost-sensitive random forest.In order to ensure that the pseudo-labels of the unlabeled samples added to the training set are correct,the detection results of the isolation forest are combined to determine whether the samples meet the conditions for joining the training set.Select the unlabeled samples that meet the screening conditions and assign their pseudo-labels and sample weights to the training set of the base classifier,and iterate until the unlabeled samples that meet the conditions can no longer be screened out.Experiments show that compared with other commonly used models,this model has a better classification effect for fraud detection.(2)A supervised default prediction model based on optimized Boruta and XGBoost algorithm is proposed.This model solves the problem of default prediction under highdimensional labeled data sets.Firstly,The model selects the optimal feature subset through the optimized Boruta feature selection algorithm,and then uses the genetic algorithm to find the optimal parameter combination of the XGBoost algorithm,so that the algorithm classification effect is optimal.Through comparative experiments,the feasibility of the model and its improvement in classification effect are proved.
Keywords/Search Tags:Risk Control, Isolation Forest, Cost-Sensitive Random Forest, Genetic Algorithm, Optimize the Boruta Algorithm
PDF Full Text Request
Related items