Font Size: a A A

Research On Credit Assessment Of Micro And Small Enterprises For Unbalanced Data

Posted on:2022-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y ShiFull Text:PDF
GTID:2518306539453474Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In the post-epidemic era,alleviating the financing difficulties of micro and small enterprises has become a top priority for governments and financial sectors at all levels.In recent years,the inadequacy of the credit assessment system has made the financing conflicts of micro and small enterprises prominent,which has seriously restricted the long-term development of micro and small enterprises.Therefore,the effective solution to credit assessment problems of micro and small enterprises has become an important research direction for relevant researchers at home and abroad.Credit data usually have the problem of high-dimensional and unbalanced class distribution.Traditional classifiers deal with these data where the majority class dominates and the classification boundary is so paranoid about the dominant data that the misclassification rate of the minority class is high.In this paper,we focus on the relevant algorithms for highdimensional imbalanced data,proposes improved algorithms and apply them to the credit evaluation of micro and small enterprises.The main work is as follows:(1)To address the problem of high-dimensional data,the m RMR-RF feature selection algorithm is proposed by combining the advantages of Filter's high efficiency and simplicity and Embedded's superior classification performance.Firstly,the m RMR score and importance score of features are calculated using the m RMR and RF algorithms,respectively.Secondly,the two scores of the features are normalized and summed to obtain the final scores of the features,and ranked.Again,the optimal feature subset is selected based on the experimentally derived optimal number of features,and irrelevant and redundant features are excluded.Finally,experiments are conducted on the credit dataset,and the results show that the method has reliable dimensionality reduction when dealing with high-dimensional data,and improves the classification accuracy of a few classes of samples.(2)To address the problem of unbalanced data,the DP-SMOTE oversampling method is proposed by combining the advantages of density peak clustering that can handle various types of data sets,simple parameter settings,and high clustering efficiency.Firstly,density peak clustering is applied to handle a few class samples,and the sampling weights are determined according to the reciprocal of the number of class clusters.Secondly,the interpolation formula is improved to generate a balanced sample set by linearly interpolating between cluster cores and intra-cluster samples.Again,experiments are conducted on artificial and publicly unbalanced datasets respectively,and the results show that the algorithm is universal and superior,and improves the classification accuracy of unbalanced data.Finally,the effects of different parameter taking values on the DP-SMOTE oversampling algorithm are analyzed and discussed,and the optimal taking values are suggested.(3)To address the problem of high-dimensional imbalance caused by poor default samples and many assessment factors of micro and small enterprises data in the era of big data,this paper constructs a credit assessment model for micro and small enterprises.Firstly,the m RMRRF feature selection method and DP-SMOTE oversampling method are applied to the micro and small enterprises data processing process.Secondly,the Focal Loss is introduced to improve the XGBoost,focusing on the hard-to-classify samples and constructing the credit evaluation model based on FL-XGBoost for micro and small enterprises.Finally,the process of data analysis,data cleaning,feature selection,and imbalance processing is used to classify and predict the micro and small enterprises.The experimental results confirm that the model is effective in improving the classification accuracy of default samples.
Keywords/Search Tags:MSEs, Credit assessment, Feature selection, Imbalanced data sets, Focal loss
PDF Full Text Request
Related items