| In recent years,with China’s rapid economic development,the country has entered a golden age of consumer finance and microfinance.Traditional financial institutions and Internet companies are competing to deepen their involvement in the consumer finance industry.It has become increasingly important to utilize modern technology to achieve rapid market response,improve the iteration efficiency of financial products,and enhance risk control.However,traditional risk control methods have limitations such as low efficiency,information asymmetry,high cost,and poor timeliness,making it difficult to meet the growing demand for personal consumption loans caused by strong personal consumption needs,and the long-tail user loan demand that has long been ignored by traditional financial institutions.Machine learning has experienced vigorous development in the past decade,the artificial intelligence industry has matured and developed.Consequently,the virtuous situation in which academia drives the development of industry has been gradually formed.Many machine learning algorithms have been applied to credit evaluation scenarios and have achieved high levels in various performance indicators.However,most of these algorithms are black box models,their internal workings and decisionmaking mechanisms are unknown,making it difficult to apply them in high-stake decision-making scenarios such as credit risk evaluation.Therefore,interpretable machine learning has received much attention in recent years.Nevertheless,the commonly used the post-hoc explanation methods(i.e.,building a black-box and then explaining its behavior)often provide inaccurate or even erroneous explanations,which,on the contrary,deepens the trust barrier between customers and modelers.This thesis focuses on developing a self-explanatory model for optimizing credit evaluation scenarios from the perspective of interpretable modeling in interpretable machine learning.The aim is to balance algorithm performance with interpretability while retaining the interpretability of the model.The main work is summarized as follows:(1)As a simple generalized linear model,the scorecard has natural interpretability and is widely used in banks or financial institutions.The traditional scorecard modeling process is based on a dataset that encoded by the Weight-of-Evidence and trains a logistic regression,which is then transformed through linear transformation to obtain a standard scorecard.However,in the actual modeling process,the results of automatic feature binning may sometimes not conform to the prior monotonicity of the feature,such as the non-monotonic relationship between a customer’s history of default and their credit risk,causing misunderstandings and communication difficulties with customers.Modeling personnel usually need to adjust the results of feature binning repeatedly to meet certain constraints.In the third chapter of the thesis,a set of optimized scoring modeling process based on derivative features is proposed to solve the problem of monotonicity constraints: The monotonicity constraint of the score is completed in the feature engineering stage in advance,without the need to repeatedly adjust the automatic binning results.This is an optimization of traditional scorecard modeling that leans more towards interpretability between performance and interpretability.Furthermore,based on the derived dataset,a "bad sample" explanation approach based on the Hamming distance is proposed,which can provide behavior guidance for customers who are rejected for loans.(2)The essence of standard scorecard modeling is based on logistic regression.Due to the limitations of the model itself,it is difficult to fit the non-linear patterns well in complex dataset.Chapter 4 of this thesis emphasizes performance over interpretability and proposes a credit evaluation network based on the Rule Representation Learner.Unlike traditional neural networks,this network uses an activation function that simulates conjunction operations and discrete binary weights.Therefore,the model can be equivalently transformed into a set of conjunction rules similar to decision trees.During the training process,refers to "the Gradient Grafting" training technique to train the network with discrete weights more effectively.Experimental results show that the model’s performance is significantly better than decision tree,and it is comparable to complex black-box models such as XGBoost and Light GBM.At the same time,the overall interpretability of the model is equivalent to that of traditional decision trees.The set of rules obtained from the network can be provided to credit practitioners to facilitate more efficient mining of business strategies and model debugging and optimization.It can also be directly provided to customers to provide accurate explanatory support for rejected loan applications without relying on any post-hoc explanation methods.Furthermore,the thesis further discusses the correctness of the post-hoc explanation algorithms represented by Anchor through empirical analysis,clarifying that the post-hoc explanation methods are not always reliable.(3)This thesis developed a credit scoring modeling platform that implements the optimized scoring modeling process presented in Chapter 3 and sample explanation methods based on the Hamming distance.In addition,standard scorecard modeling and commonly used statistical machine learning algorithms are embedded in the platform to implement the entire credit risk modeling process,from dataset management,feature binning(supporting automatic binning and manual adjustment),real-time model training,to performance evaluation and comparison.To further ensure the integrity of the system,commonly used the post-hoc explanation methods such as LIME and Anchor are also embedded for users to reference.Overall,the platform realizes most of the daily work of credit practitioners,effectively reducing the modeling threshold for credit evaluation and improving the efficiency of professionals. |