Font Size: a A A

Research Of Enterprises Credit Risk Assessment Based On Random Forest

Posted on:2013-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2249330377454258Subject:Technical Economics and Management
Abstract/Summary:PDF Full Text Request
As the improvement of market economy system in China, enterprise’s credit has gradually become an important basis of social and economic development. The large and medium-sized firms which is the lifeline of our economic is a critical part for direct financing and indirect financing in capital market. And they are also the major objects for bank’s loans and institutions’investment. Therefore the research and measurement on credit and credit risk in corporate has a very important practical significance for the economical development of our country.Paper focuses on how to use data mining and statistical analysis techniques to establish a reliable credit risk assessment model to identify and predict the company’s credit rating. The random forest algorithm is a kind of non-linear modeling tools with a good adaptive capacity. In this method data classification or regression can be done by constantly refining the sample data. And the RF algorithm is very suitable for solving a application problem which don’t have clear priori knowledge and inadequate of the sample data or is multi-constraint and without rules.The advantages of RF algorithm have three points. At first, the method constructs different training set to increase the differences in classification model, making the extrapolation and predictive capability of the clustering model significantly improved. And it can overcome the shortcomings which is a single classification model is easier to fitting. Secondly, the algorithm is faster and more convenient, making up the problem of traditional method:time-consuming of access and indirection and low-level efficiency. The method lays a solid foundation for the forecasts of classification towards practicality. Thirdly, the RF algorithm can be used to delete and select the variable. Then the importance measurement of features used to evaluate can be obtained. The selected features can build an efficient evaluation system to promote the improvement of the model’s predictive ability. Through the empirical research the paper is mainly to prove the following conclusions. Firstly, for large and medium-sized enterprises in the electricity production industry, the features, revenue growth and total assets growth rate and EBITDA margin targets is more important than free cash flow and liquidity asset turnover and the other indicators. Secondly, the RF algorithm has a good tolerance of data noises due to two specialties:randomly selecting features in the internal nodes of decision trees and Bagging sampling method. Thirdly, the credit risk assessment model based on the RF algorithm is superior to the benchmark Logistic model and the CART model in extrapolation and prediction capabilities.The paper’s analysis method combines normative study with empirical research.In the second chapter the paper has combed and summarized the relevant research on domestic and foreign literature of corporate credit risk evaluation model and the RF algorithm according to time-series.In chapters III and IV, the paper strictly defines the contents and standardizes the scope of the thesis to form a complete theoretical system by layers. In the third chapter, the paper mainly discusses the enterprise credit risk assessment and the definition and status of the electricity power production industry. It also has a overview of the significance of the electricity production corporate credit risk assessment on China’s economic development. In the fourth chapter mostly introduces the basic principles of the random forest algorithm and focuses on two parts closely related with the definition of the random forest:the CART and the Bagging method at first. Then in the fourth quarter, the RF method are described in detail, including its definition and the basic idea and the standard of the evaluation’s merits-the generalization error and so on. Especially the fourth chapter provides a theoretical tool for the empirical research in the subsequent chapters. Though the chapters Ⅲ and Ⅳ are on their own independent chapters, they support each other and form the integrity theoretical basis of this paper together.The changes of the OOB estimation accuracy can measure the importance of the indicators in evaluation model. So, in the chapter five, the paper discusses how to use the RF algorithm to establish a more reasonable and efficient evaluation indictors system to improve operation efficiency of the model.In chapter six, the paper first verifies the super tolerance of the RF algorithm to the data noise, providing a basis for the experimental data processing of this article. By several experiments followed the paper determines the best choice of the model parameters. Finally, by the evaluation index system and experiment data dealt and the model parameters acquired above, the paper establishes a corporate credit risk assessment model and proves the RF model has good stability and extrapolation and excellent predictive ability through the comparison with the CART model and Logistic model.The RF algorithm, as a kind of machine learning method also a smart algorithm, always needs to make use of the computer to implement the model. This paper uses the R language program command with the help of the VarSelRF package and randomForest package to achieve the construction of the evaluation index system and the credit risk evaluation model.The main contribution of this paper includes two aspects:the combination of theory and practice to avoid falling into a purely theoretical analysis and the contrast research inducted into the process of empirical research. In the empirical performance of the random forest has proved to be excellent during the application of the model. This article figures that the RF algorithm will be more widely applied in the field of corporate credit risk assessment because of the advantages in the data processing and model performance.The inadequacies of this study are not to consider the outliers data and not possessing enough theoretical explanation for the choice of candidate evaluation index set and so on. All of these need to be further improved.
Keywords/Search Tags:random forest algorithm, credit risk evaluation, powerproduction enterprises
PDF Full Text Request
Related items