Font Size: a A A

Research On Credit Anti-fraud Model Prediction Based On Machine Learning

Posted on:2022-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:P Y HeFull Text:PDF
GTID:2506306479451354Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In the past,there were many problems in applying for loans from traditional financial institutions,resulting in financing difficulties for many enterprises and individuals.With the rapid development of Internet finance,its existence has effectively reduced the risk of censorship and borrowing costs,and the internal operation of traditional lending has become more and more open and transparent.Therefore,small loans have become possible.Online lending refers to the direct lending between the supply and demand of funds through the Internet platform,which is an important part of Internet finance.In today’s society,the advantage of the Internet platform has been rapidly growing in China’s financial industry.However,behind the advantages,there must be a lot of hidden dangers.As it lowers the threshold of borrowing,it has spawned a series of problems,including illegal fund-raising,absconding with money,telecom fraud and so on.There are huge risks for both the lender and the platform,and information asymmetry exists on both sides.For lending platforms,with the continuous expansion of credit business,the quality of lending users is also uneven,and the accompanying increasing risks,such as user fraud and credit risk,directly lead to their bad debt rate,which is much higher than that of bank lending platforms.How to deal with the high dimensional and complex user credit data to carry out all-round anti-fraud risk early warning is a problem that we have been committed to solve in recent years.In order to effectively evaluate the credit risk,this article is based on article 518112 year data contains Lendingclub2019 credit user data and 150 variables,including the amount of time borrowing,lending to the user,the borrowing rate,the number of transactions over a period of time,all transactions in the credit and build some variables such as debt ratio model.Then based on the data preprocessing including invalid values,missing values and outliers,and unbalanced after data processing,get the high quality of the data set,and then through the chi-square box of variable selection,this article chooses the logistic regression,GBDT,XGBoost,Light GBM,random forests,Stacking method such as integrated modeling and comparison analysis,and calculate the AUC and KS value of each model and from the aspects of model effect and interpretability for comprehensive evaluation.According to the results,the effect of ensemble learning model is better than that of logistic regression model in general,and Light GBM has the best effect.However,from the perspective of model interpretability,logistic regression model is still chosen.Therefore,a scoring card model is established based on logistic regression,and the relationship between credit scores of credit users and default rate is contrastively analyzed.Based on this,some suggestions are put forward.
Keywords/Search Tags:Credit score, Logistic regression, Integration model, Machine learning, WOE points
PDF Full Text Request
Related items