Font Size: a A A

Research On The Risk Of Illegal Fund-Raising Of Enterprises Based On RFE-Tomeklinks-CatBoost

Posted on:2024-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:F YuanFull Text:PDF
GTID:2568307160979729Subject:Applied Statistics
Abstract/Summary:
The financial industry will face many risks in the process of development,among which the illegal fund-raising of enterprises is particularly prominent.If we can predict the risks and take measures to prevent them,it will be helpful for the good development of our financial industry.In recent years,with the development of Internet information technology,the forms of illegal fund-raising are more diverse and the means of crime are constantly updated.The crime of illegal fund-raising has attracted wide attention from all walks of life.The existing researches mainly focus on the identification of illegal fund-raising,the improvement of relevant laws and regulations and the countermeasures of supervision and management.However,there are relatively few researches on the quantitative analysis of illegal fund-raising by enterprises.Therefore,it is of great significance to use the relevant data of enterprises to identify the risks of illegal fund-raising faster and more accurately and to strengthen the supervision and management of illegal fund-raising.Based on the Data Fountain platform’s illegal fundraising risk prediction data,this paper attempts to use machine learning algorithms to explore the best prediction model.For14,865 pieces of enterprise data,a series of preprocessing operations such as new feature construction and missing value processing were carried out first,and then the model results before and after feature selection were compared.Three methods of correlation coefficient,Lasso and recursive feature elimination were used for feature selection.It is found that recursive feature elimination method(RFE)has higher accuracy,precision and F1 value than other methods in classification,and can extract effective features to the greatest extent.At the same time,there is also the problem of category imbalance in the data set,with the ratio of risky and risk-free enterprises being about 1:14.In order to solve the problem of class imbalance,this paper adopts six classical resamassing methods to balance the data,and further combines the random Forest,Adaboost,XGBoost,Cat Boost and Light GBM algorithms to realize the risk prediction of illegal fundraising.Then,the TPE algorithm in Hyperopt optimizer is used to search the hyperparameter and adjust the model parameters.Finally,the classification performance of the models is compared and analyzed according to the evaluation indexes,and the results show that the classification performance of most models is best with Tomek Links undersampling method.Among them,"RFE+Tomek Links+Cat Boost" is the optimal combination model,with an accuracy of 0.9795,recall rate of 0.8952 and F1 value of 0.8604,which can realize the accurate prediction of the risk of illegal fund-raising of enterprises.Finally,based on the characteristic importance of each model output,it is concluded that the main determining factors of whether enterprises have illegal fund-raising are enttypeitem(enterprise type subcategory)industryco(industry category code),regcap(registered capital),account_risk_yw(risk business proportion)and other characteristics.The research results can provide reference for the regulatory platform to identify illegal fund-raising risk enterprises.
Keywords/Search Tags:Risk of illegal fund-raising by enterprises, Feature selection, Resampling, Bayesian optimization, CatBoost
Related items