| In recent years,China’s economy has developed rapidly and consumer awareness has prevailed.Commercial banks and other financial institutions continue to introduce financial products and loan services to meet the capital demand of enterprises and individuals.With the continuous innovation of the Internet and Artificial Intelligence,financial instruments are constantly changing.In the Age of Big Data,commercial banks have more diversified channels to collect customer information,and the dimensions of credit data have continued to rise.Complex and high-dimensional credit data has become a major challenge for commercial banks in credit risk assessment.The existing credit risk assessment models cannot handle highdimensional features well.Commercial banks need more flexible and effective risk assessment models to help them develop healthily.This paper conducts research on the risk assessment model of high-dimensional credit data,which aims to help commercial banks improve their credit risk assessment capabilities,accurately identify defaulting customers and protect their own interests when facing high-dimensional credit data.Firstly,based on the basic theories and methods of credit risk assessment,the applied data mining method is briefly introduced,and based on the idea of data characteristics-driven,explain the key issues of the high-dimensional characteristics of data and the existing research methods to solve the high-dimensional characteristics of data.According to the different performance of the data set,the characteristics of high-dimensional data are subdivided into general high-dimensional data and high-dimensional sparse data,and the corresponding model construction scheme is proposed.Secondly,aiming at the characteristics of high-dimensional credit data,a model combining feature filtering and differential evolution-artificial bee colony wrapper feature selection method is proposed.The model uses three filtering feature selection methods to initially select feature sets to reduce the time cost of subsequent wrapper methods.Differential evolution algorithm and artificial bee colony algorithm are fused in this paper.The fused algorithm has stronger optimization ability and greatly improves the selection of optimal feature subset.The optimal classifier is selected by classifier coding to improve the classification accuracy of the model.In order to illustrate and verify the performance of this model,two high-dimensional credit datasets are used for empirical study,and a variety of benchmark models are used for comparison.Experimental results demonstrate the effectiveness and flexibility of the hybrid evolutionary filtering method.Thirdly,a model of fusion feature selection and feature extraction based on feature clustering is proposed for the data features of high-dimensional sparse credit data.In this model,the K-means method is improved first,so that the clustering method can better identify the similar relations between sparse features.Then regard each cluster as a feature group,and use the PCA method to reduce the dimensionality of the features within the group.Finally,the embedded feature selection method Group Lasso is used to select feature groups on a large scale,and the optimal feature subset is selected while training the model.Two real credit data sets are used to experimentally verify the effectiveness of the model.According to the comparison and comparison of different benchmark models,the experimental results prove that the model can achieve higher classification accuracy in the high-dimensional sparse data set,and the proposed feature grouping-Extract-Select the frame with higher flexibility.Finally,according to the above experimental results,the validity of the proposed data characteristic-driven model construction scheme is verified.According to the existing experimental results,the corresponding management suggestions are put forward for the credit risk assessment of commercial banks. |