| At present,machine learning models have achieved good results in many aspects,providing effective solutions to problems for all walks of life.In the financial field,machine learning has attracted the attention of many scholars because of its excellent performance.However,for models with overly complex structures,they cannot explain the logic behind model decisions,and can only obtain model prediction results,making the model a black-box model,which seriously hinders The application of machine learning in the financial field.Based on this background,this paper takes banking business in the financial field as an example to establish an early warning model for bank customer churn,aiming to effectively identify potential bank churn customers.Provide data reference and advice to prevent customer churn.Firstly,data cleaning is carried out according to the characteristics of the original data,mainly including missing value processing and repeated sample processing.After that,the features that only play a role of marking and have no statistical significance are removed through feature coarse screening,and then Lasso CV and Pearson correlation coefficient method are used for feature selection to further improve the data quality.At the same time,there is a serious imbalance in the bank data.The BORDERLINE-SMOTE algorithm is used to balance the training set.Next,data mining work before modeling is carried out,including the distribution of visualization data,the laws of features themselves,and the laws of features and research goals.It is divided into two parts in terms of modeling and model interpretation.In the first part,LR,random forest,XGboost,light GBM,four self-interpretable models are established,and the results of the four models are ACC,F1_score,Precision,Recall four The indicators are compared,and the importance evaluation of the features by the four models is analyzed.However,these four self-explainable models can only reach about 60% in the recall index of lost customers,which is the most concerned about in this paper.Therefore,in the second part,two black-box models,support vector machine and DNN,are established.In contrast,DNN has the best effect,and the recall rate of lost customers can reach 86%.When using three model-independent interpretation methods to explain,first use the Shapley value to explain how each feature and instance affects the model results.Then use the partial dependency graph(PDP)to explain the marginal effect of several important features on the model results;finally use the local surrogate model(Lime)to locally replace the black-box model to explain how the model makes decisions on a single sample.Based on the accurate identification of lost customers,this paper excavates useful information from the black-box model,so as to provide data information for bank business personnel to carry out customer retention work,retain customers in a timely manner,and improve the bank’s income. |