| Credit evaluation is an important part of the finance industry and plays a promoting and driving role in the development of the finance industry.It can not only help financial institutions reduce the risk of loan default,improve loan efficiency,promote the stability of the financial market,but also improve the quality of financial services,thereby promoting the healthy development of the finance industry.Credit assessment is a method for assessing the credit status of borrowers,aimed at helping financial institutions identify the credit risk of borrowers.By evaluating factors such as the borrower’s credit history,income,and assets,financial institutions can identify the borrower’s credit risk and decide whether to grant a loan.With the advancement of big data and machine learning methods,credit assessment methods based on machine learning methods have become a popular application research.However,in real-world credit assessment problems,there are usually problems such as imbalanced class distribution at the sample level,too many features at the feature level,and too many redundant features.In addition,in the actual credit assessment scenarios of financial institutions,there are also problems outside of the algorithm model.For example,the weak interpretability of machine learning leads to low adoption rates,and problems such as false and fraudulent information provided by applicants.In order to solve these problems,this study starts from literature review and conducts field research on the credit departments of banks to discover typical problems that financial institutions face in real credit assessment scenarios.Corresponding improvement methods are proposed for the imbalanced class distribution of credit data samples and the selection and removal of redundant features at the feature level,and the application value of these methods is analyzed.Specifically,this study includes the following parts: First,in order to understand the main problems and challenges faced by financial institutions in credit assessment in real-world application scenarios,this study uses interview methods and data collection methods to conduct field research on the credit departments of banks.The research findings help this study further focus on key research questions and combine research questions with practical applications.Based on the above work,at the method level,this study proposes four improved credit assessment models for the high-dimensional non-balanced phenomenon in credit assessment.Second,for the imbalanced class distribution problem in credit data,on the basis of existing research,this study proposes an improved non-balanced data learning method based on clustering and distance measurement.In addition,a non-balanced ensemble learning framework based on multi-class resampling schemes is constructed on this basis,and different combination methods are experimentally compared and verified.Third,for the scenario where the minority class samples in credit assessment are too sparse to use undersampling methods,this study proposes a non-balanced learning algorithm that combines a generative neural network-based data augmentation method with an undersampling method based on clustering and distance measurement.This method,which combines data augmentation and undersampling,can better deal with the non-balanced learning problem of sparse minority class samples.Fourth,for the feature redundancy and selection problem in credit assessment,an improved two-stage feature selection model is proposed.By combining the filter method and heuristic optimization algorithm,using the two-stage model for feature screening can improve the efficiency and accuracy of feature selection in credit assessment problems.Finally,considering the imbalanced class distribution at the sample level and the feature redundancy problem at the feature level that often occur in credit assessment problems,this paper proposes an improved credit assessment comprehensive model that uses both sample-level and feature-level processing methods.This model can be used to solve credit assessment problems in high-dimensional non-balanced situations.In addition,based on the credit assessment status and main problems and challenges discovered in the field research of banks,this paper also gives suggestions and countermeasures for improving the actual credit assessment application scenarios of banks and other credit system participants.This study proposes four improved models for non-balanced learning and feature selection in machine learning at the theoretical level,and at the practical level,the proposed improvement methods can be used to improve the performance of credit assessment models,as demonstrated by the test results on credit assessment datasets.In addition,based on first-hand information from field research in banks,this paper gives suggestions and countermeasures for improving the problems in actual credit assessment application scenarios. |