Font Size: a A A

Research On P2P Network Credit Default Prediction Model Based On Hierarchical Sorting Weighted Fusion

Posted on:2019-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:B YuFull Text:PDF
GTID:2428330572963955Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As an important part of Internet finance,P2P network lending expands the range of services in the traditional financial industry.Although the emerging Internet financial platform has the characteristics of low entry barrier,fast operation and simple operation,the characteristics of the investor's risk prevention and control ability are obviously different compared with the traditional financial model.Therefore,credit risk assessment and default forecasting based on online loan users become more important.In the scenario of online lending business,the loan amount is generally lower than the loan application to the bank,but due to the large number of audience users,the loan business volume is very large.Therefore,measurement methods that rely solely on traditional manual audits or screening of individual user information are no longer sufficient.In fact,the online lending platform relies on the overall environment of the Internet,and it naturally has certain data advantages.It makes full use of the existing data of the online lending platform and integrates the payment,shopping,social and other data of users in the Internet to its default rate.Forecasting is a major direction for future development.In the big data scenario,the core method of P2P network loan risk control is to establish a data-driven wind control model and apply it through the method of data cleaning and feature engineering through data cleaning and feature engineering steps.In the business process of risk control approval,to guide the development of the approval business,this is also the research direction and goal of this paper.In China,at present,including Renren Loan,Rong 360,Pat Loan,Ant Financial,etc.are actively promoting the construction of data-based wind control models.Not only do companies conduct in-depth research on anti-fraud models,but they also actively support research by students and data scientists to promote the development of Internet finance anti-fraud.This paper uses the real historical transaction data published by the auction,the customer login log and the customer information update log as the research object,and studies the online loan default probability prediction model based on these three main information.Firstly,the data is observed,analyzed and processed from multiple angles:among them,the basic statistics of the original data,the lack of multi-angle observation data,the adjustment of variable types,the deletion of constant variables,and the formalization of the original records of data.Then,feature engineering operations are performed on the cleaned data.The main work of feature engineering has the following aspects:the original time is described in a more granular manner according to the user login log,and the related features such as the number of user remediation contents and the number of modification times are constructed according to the user update log table data.In addition,in order to reduce the outlier interference of numerical data and increase the robustness of the model,this paper uses sorting features and statistical features for numerical variables.Finally,features greater than 0.99 are deleted based on the correlation coefficient of the feature.According to the clean data and features obtained from the previous work,the linear model LR,the tree model CatBoost and the non-linear model neural network were used for modeling.In the step of hyperparameter selection,the Hyperopt library can be used to obtain better model results relatively more efficiently.At the same time,in order to ensure the stability and generalization ability of the model,the "training set" of 70,000 is divided into training data and verification data according to the ratio of 9:1.Since the data in the scenario studied in this paper is unbalanced,this paper divides the data according to the proportion of the target variable when the data is segmented.The obtained training data and the verification data distribution are almost identical,and the obtained results are obtained.More credible.After determining the hyperparameters of the model and obtaining three basic models,the paper analyzes and compares the model results from multiple angles.Firstly,from the perspective of the evaluation index AUC,the CatBoost model is better than the neural network and LR in the P2P online loan default prediction scenario studied in this paper.Then,based on the weights obtained by the model,it is found that in today's society of big data and mobile Internet,users can be portrayed by deeper mining of third-party information,which has a great effect on predicting whether users will default on payment.After getting three basic models,in order to further improve the model effect.In this paper,the simple linear weighted fusion method is used first,and the fusion result is improved compared with the single model.Since the performance of the CatBoost model on the verification set is better than the two models,in the simple linear weighted fusion method,the weight ratio is too high,and the difference of the model cannot be fully utilized for fusion to achieve the optimal fusion effect.Therefore,this paper improves the linear weighted fusion,innovatively proposes hierarchical weighted fusion,hierarchical sorting and weighted fusion,so that the model results are optimal.Although this paper studies the data of P2P online loan default forecast in Internet finance,the overall research ideas and methods have very effective and practical value in the field of machine learning.
Keywords/Search Tags:internet finance, P2P, feature engineering, machine learning, model fusion
PDF Full Text Request
Related items