Font Size: a A A

Based On The Research Of Credit Overdue Prediction Under Internet Finance

Posted on:2020-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:A E WangFull Text:PDF
GTID:2437330578454490Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet Finance,credit risk prediction is becoming more and more important.Risk management is one of the hot topics now.With the continuous improvement of information construction,massive data has been precipitated and accumulated.More big data technologies are needed to analyze and obtain more valuable information,so as to provide theoretical support for automation,thus saving costs and improving working efficiency.Through the analysis of the user loan information data of nearly 60,000 users,we extracted more the valuable information of the users',and then we established several models of the credit overdue prediction.We optimized and compared the models to provide theoretical support for automatic overdue prediction.The first step of the process was data preprocessing,mainly dealing with missing and repeated values.The second step of the process was feature engineering,including deriving features,selecting features and encoding features.Feature engineering is particularly important,which directly determines the quality of the prediction model.Before building models,it is necessary to one–hot encode discrete features.Because some models do not recognize discrete features and will treat them as continuous values,which will lead to errors.However,some models can recognize discrete variables,such as random forest.The third step of the process was modeling.My paper mainly used Logistic Regression,Decision Tree and Xgboost models to model.Each model is adjusted and optimized several times until the optimal prediction result being more accurate.The established models needed some index to evaluate them in paper.This index mainly included: AUC,Precision rate,Accuracy rate,Recall rate and KS value.On the whole,the prediction effect of Logistic Regression model is as good as that of Decision Tree model.But the predictive effect of Logistic Regression is better,and the efficiency of training decision tree is higher.Comparing Xgboost model and post-adjustment Xgboost model,the prediction performance of them was not significantly optimized,but the adjusted model was slightly better.In my papper,Logistic regression models added L2 regularization to prevent overfitting of the model.The missing value is treated as a feature,and the proportion of the missing value feature is great importance to the model.In conclusion,by comparing evaluation indexes of each model,Xgboost model has the best prediction effect,followed by Logistic Regression and Decision Tree.The research in my paper can provide the following references for the establishment of credit overdue risk prediction model: First,feature engineering is the key in the whole modeling process,and feature derivationis the most important.Through the extraction and analysis of data,the accurate portrait of users can be obtained and we dig for valuable features,so feature engineering can be done to effectively improve the accuracy and performance of the prediction model.Second,missing values should be treated as a feature and not to delete it directly even if there is a missing value when building the relevant overdue credit prediction model.Third,missing values have great influence on the accuracy of the model.The platform of the Internet financial should constantly improve and perfect the user information system,reduce the missing value of user information.Fourth,the Xgboost model can be considered firstly when actually building a credit overdue prediction model.Based on this model,a more accurate model can be built through model fusion.
Keywords/Search Tags:Feature engineering, Evaluation index, Logistic regression, Decision tree, Xgboost
PDF Full Text Request
Related items