| With the continuous growth of the economy and rising levels of consumption,personal consumption has undergone an upgrade in conceptualization,and people are more and more inclined to use borrowing to consume instead of the previous conservative consumption.As a result,personal credit business has rapidly expanded,generating considerable economic benefits for lending institutions,while also creating potential risks.Personal loan default risk,one of the most important and complex risks faced by credit institutions,is critical to their survival and development.Furthermore,it represents a core issue for national financial security,which has attracted extensive attention from market regulators and scholars.Predicting the individual loan default risk in a scientific and effective manner plays a fundamental role in maintaining the long-term stability development of the lending market as well as the entire financial system.In recent years,both domestic and international scholars have made considerable progress in researching methods and tools for loan default prediction.However,due to the high-dimensional,complex,and uncertain characteristics of personal loan data,traditional prediction methods still have some problems.First,previous studies have prioritized prediction accuracy while neglecting to integrate the lender’s core goal of maximizing revenue into the model construction and evaluation,resulting in a failure to effectively help lenders to achieve revenues.Secondly,some studies that investigate the default risk of loan applicants from a revenue perspective often use a single prediction model that fails to capture the complex credit data characteristics.Finally,most studies have failed to explore the interpretability of prediction results,leading to limited practical application value.How to build reliable prediction models so as to identify the risk of personal loan default and enhance the lender’s revenues has become a critical issue in the statistical prediction field.In light of the existing problems in prior studies,this paper aims to help lenders maximize their revenues by proposing hybrid,weighted ensemble,and fusion ensemble strategies based on various machine learning algorithms,hyper-parameter optimization methods,and interpretability analysis methods.The constructed strategies are applied to the personal online loan,personal vehicle loan,and personal bank loan default prediction fields,respectively,which provides support for improving the revenue level of lenders and promoting the steady and healthy development of credit field.This paper is divided into seven chapters.Chapter 1 introduces the background of the topic selection,research significance,research ideas,research content,and research innovation and limitations.Chapter 2 summarizes the current research status of personal loan default prediction.Chapter 3 builds the performance evaluation system of the personal loan default prediction models.In Chapters 4,5,and 6,based on the perspective of lender’s revenue maximization,the hybrid model,weighted ensemble model,and fusion ensemble model are respectively used to predict personal loan default under different scenarios.The experimental results verified that the proposed models can effectively help lenders to maximize the revenues.Chapter 7 summarizes the research content,expounds the research conclusion,puts forward policy suggestions,and provides future research directions.The main research contents and conclusions are summarized as follows:(1)From the perspective of revenue-driven approach,this study aims to address the issue of neglecting the maximization of lender’s revenue in existing prediction models that focus more on prediction accuracy.A hybrid model is constructed to predict individual loan default risk by using the Bayesian optimization algorithm to optimize the categorical boosting algorithm.Firstly,revenue-related metric is constructed and used as the model training objective and evaluation criteria.Specifically,the revenue metric is used as the objective function of the Bayesian optimization algorithm to select the optimal combination of hypeiparameters for the prediction model,thereby incorporating the goal of maximizing the lender’s revenue into the model training process.Next,the performance of the proposed hybrid model is evaluated by combining revenue metrics,accuracy metrics,and statistical significance tests.Finally,an interpretability analysis is conducted to reveal the significant factors that influence the hybrid model’s prediction results,providing more guidance and reference for market participants to make efficient decisions.Both the performance evaluation indexes and statistical test results show that the proposed hybrid prediction model achieves higher revenue index values than all the comparison models,which verifies the effectiveness of default prediction by combining category boosting algorithm,Bayesian optimization strategy and revenue-driven framework.Shapley additive explanations value further reveals the key features that have important impact on the prediction results and provide technical support for decision-makers to make profitable decisions in the future.(2)In response to the insufficiently in-depth analysis and mining of complex credit data features in existing research and the poor interpretability of model prediction results,the Salp swarm algorithm is used to optimize the weight coefficients of each ensemble member,and an interpretable weighted ensemble model is constructed for revenue-driven personal loan default prediction.The model consists of three modules:revenue-driven sub-prediction model construction,revenue-driven weight coefficient optimization calculation,and interpretability analysis.In the process of revenue-driven sub-prediction model construction,eight different machine learning models are selected as sub-prediction models,with profit maximization as the search target,and the optimal hyperparameters of the sub-prediction models are determined using grid search.In the process of revenue-driven weight coefficient optimization calculation,the revenue function is used as the optimization function of the Salp swarm algorithm to adaptively select the optimal weight coefficients and calculate the final prediction results.In the interpretability analysis process,first,the Shapley additive explanation values of each sub-prediction model are calculated,and then the optimal weight coefficients obtained by the Salp swarm algorithm are used to calculate the Shapley additive explanation values of the proposed model,and important input features that have significant impact on the weighted ensemble prediction results are mined to improve the transparency of the prediction results.The experimental results show that the proposed weighted ensemble prediction model successfully integrates the advantages of various machine learning models with the support of Salp swarm algorithm,which can effectively describe the characteristics of complex credit data and help lenders to obtain higher revenues.The results of interpretability analysis based on Shapley additive explanations value provide more reference information about the potential influencing factors of prediction results for decision-makers.which is helpful for decision-makers to make scientific and reasonable decisions.(3)The existing ensemble models tend to focus on integrating weak learners while ignoring the use of strong learners as the base learners,which leads to limited improvement in model prediction performance.To address this issue,the boosting ensemble algorithm is used as the base learner of the bagging ensemble algorithm,and the fusion ensemble model is constructed for the revenue-driven personal loan default prediction.The proposed fusion ensemble model integrates a classic boosting ensemble model into the construction process of the bagging ensemble model,effectively combines the advantages of the bagging and the boosting ensemble algorithm,reducing both prediction variance and bias.In addition,the revenue metric is used as the search objective of grid search to determine the hyperparameters of the fusion ensemble model and improve the ability to help lenders to maximize the revenue.Finally,based on Shapley additive explanations value,the prediction results of the fusion ensemble model are analyzed to verify the effectiveness of the proposed model in revealing the factors affecting loan default.The experimental results show that the proposed fusion ensemble prediction model can obtain the maximum revenue metric values when identifying the overall defaulting borrowers and some borrowers with the highest default probability.Therefore,the applicability of the proposed model in practical scenarios is stronger,which can help lenders control the cost of identifying potential defaulting customers and motivating customers,and improve their economic benefits.Meanwhile,the stability of the proposed fusion integration model is verified by the sensitivity analysis of model hyperparameters.In addition,the Shapley additive explanations value further improves the interpretability of the model prediction results.The main innovations of this paper are as follows:First,a hybrid model,a weighted ensemble model and a fusion ensemble model are constructed based on various machine learning models,hyperparameter optimization algorithms,revenue measures and interpretability analysis methods,and good personal loan default prediction performance is achieved under various scenarios.Second,taking revenue maximization as the starting point of the research,the revenue index is set as the basis for optimizing parameters and evaluating model prediction performance in the process of model construction,so as to provide strong support for realizing the core goal of lender’s revenue maximization.Thirdly,Shapley additive explanations value is used as an interpretability analysis tool,which overcomes the defect of the traditional prediction model which sacrifices the interpretability when improving the prediction performance,and provides a valuable reference for decision makers of financial lending institutions to make scientific and reasonable decisions.Fourthly,the performance of the prediction model is evaluated comprehensively and systematically based on accuracy and revenue indexes as well as non-parametric statistical testing methods to ensure that the effectiveness and reliability of the prediction model can be verified from multiple perspectives.The shortcomings of this study are as follows:(1)The proposed personal loan default prediction models are trained only based on the numerical and category characteristics of the historical credit dataset,and the loan default scenarios used are limited.In future research,the fusion of multi-source data information should be considered and the effectiveness of the proposed model should be verified in more business scenarios;(2)The introduction of optimization algorithm to determine model parameters or model weight coefficient will increase the calculation time of the proposed models.In future research,more effective methods should be explored to improve the computational efficiency of the proposed models. |