| With the rapid development of the financial industry,various ways of credit consumption have penetrated into people’s lives.When the scale of credit consumer groups is expanding rapidly,major financial institutions are also facing severe credit risk issues.At present,credit risk has become one of the important factors affecting the stable development of banks in the future.Therefore,research and analysis of credit risk can help banks effectively identify potential fraudulent users and reduce bank loss.Under the credit lending scenario of Xia Men International Bank,this thesis selects the real data of users as the research object of the article.The characteristic variables of the data cover basic user personal information,historical borrowing and lending behavior information,etc.In the preliminary preparation stage,this thesis preprocesses the data set,including outlier detection,missing value filling,category feature coding and so on.In the feature engineering stage,this thesis visually analyzes the feature variables of the data set based on the python visualization tools,excavates some potential user fraud information from the user area code,academic code and other features,and constructs related feature variables.Then according to the correlation coefficient method and the feature importance ranking method,40 features are selected as the input features of the subsequent model,and the data is balanced by the SMOTETomek Links method.In terms of models,Support Vector Machines,Random Forests,XGBoost,and Light GBM are used to construct user risk assessment models,and the grid search method is selected to optimize the parameters of the models.The results show that under AUC,F1-score and other evaluation indicators,the Light GBM has the best performance,where AUC and F1-score are 0.836 and 0.723 respectively.In order to optimize and upgrade the user credit risk assessment model,this thesis finally adopts the Stacking model fusion method.In the first stage,the method selects Random Forest,XGBoost,and Light GBM which have better performance as the basic classifier.In the second stage,the method chooses the Logistic Regression to train the result of the first stage.The final result shows that most evaluation indexes under the Stacking model fusion method are better than those of all single models,and the AUC value is 0.842.Through the analysis and experiment of bank credit lending data,this thesis constructs four machine learning models successively,and builds the user credit evaluation model based on the Stacking method.The final model performs well,and it has certain reference significance for user credit risk assessment. |