Font Size: a A A

Two-stage Credit Scoring Based On Random Forest And APSOLSSVM

Posted on:2017-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:B Y ZhangFull Text:PDF
GTID:2349330512956808Subject:Finance
Abstract/Summary:PDF Full Text Request
In the information era, Internet develops rapidly and many aspects of our basic necessities of life have taken place in the earth shaking changes, especially for the combination of mobile Internet and financial which is quietly changing people's consumption habits and stimulating domestic consumption tide. Under the stimulus of the Internet financial incentives, people are increasingly dependent on the consumption way of credit transactions. At present, the consumers'credit consumption scale is in rapid growth, but consumer credit consumption would cause many problems while it can stimulate the economy, due to the lack of the third impartial credit scoring system in our country, and customers'credit information of commercial bank is not shared, so China's personal credit system is not perfect, the corresponding credit information of everyone to measure their default risk is deficient. In addition, on the sites of risk control and management, which also exists obviously insufficient, therefore the credit scoring system is unscientific. In addition, the criterion of credit is not definite. With the increasing tendency of the credit transactions scale, the requirement of personal credit scoring model's accuracy becomes higher and higher, and the improvement of evaluation methods is very important. Establishing a nationwide-shared personal credit information database is imperative. By the people's Bank of China led, the civil credit system is constructed, which is to constrain the people to abide by the law and improve the consciousness of personal honesty and trustworthiness. The reasonable prediction and evaluation of personal credit have positive effects for financial institution, which on the one hand plays a certain reference role for the commercial bank credit risk management system to promote the healthy development of the credit business, on the other hand can in some extent to guard against financial risks, as far as possible to take income and security into consideration.In academic research, many scholars conducted a series of studies in personal credit scoring from the qualitative and quantitative perspective, which involves multiple fields to Econometrics, Statistics, Artificial Intelligence and machine learning. However, a stable reliable and general personal credit scoring model and system has not be built until now. In the practical operations of China's commercial banks in the personal credit scoring, there is still a certain gap compared with banks abroad, The means and technology of scoring are backward, on the one hand, subjective factors are mainly relied on, on the other hand manual operation caused the lower efficiency. As the center of research, the evaluation method of personal credit combined with domestic and international research and the practical data, which attempts to construct a new evaluation method of personal credit, The proposed model combined the Random Forest(RF), Adaptive Particle Swarm Optimization(APSO) and Least Squares Support Vector Machine(LSSVM). And this article studied new combination model of credit scoring, which uses the importance of the feature vector from the random forest algorithm estimate to select the feature, and researched the effect on the performance of model classification effect after cut feature variables with low degree of importance. This thesis promotes the deepening of the research of personal credit scoring, also provides a reference of credit scoring method for commercial banks.In this thesis, the personal credit scoring method is studied, and the random forest algorithm, the adaptive particle swarm optimization algorithm and the least square support vector machine algorithm are introduced. Personal credit scoring in essence can be regarded as a classification problem is overall customer division for good customers (not defaulting customers) and bad customer (customer defaults), so personal credit scoring belongs to the scope of the study of pattern recognition. After the selection of sample data and, data preprocessing, the thesis builds a two-stage credit scoring model using the RF and APSOLSSVM. And comparative analysis is applied in the empirical analysis by the built model.This thesis is organized as follows:Chapter one is the introduction. This chapter introduces the background and significance, research methods, ideas and chapter arrangement of the topic. This chapter describes the development trend of credit consumption in China, and points out the important significance of the study of personal credit scoring, and summarizes the research methods and content framework used in this thesis.The second chapter contains personal credit scoring methods and its current situation. Theoretical part of the study includes the theoretical basis of personal credit scoring, domestic and foreign research has been applied to the credit scoring model algorithm theory which is the theoretical basis of this thesis. This chapter, this article relates to the personal credit scoring algorithm which is divided into three categories through the research of personal credit related basic theory and the inside and outside algorithm research status and research results of domestic and foreign personal credit scoring of carding.The third chapter, constructs the LSSVM-RF algorithm. This chapter mainly explains the basic principle of RF and LSSVM algorithm, and each algorithm respective characteristics analysis, finally elaborated the RF LSSVM algorithm to evaluate the credit classification problem, feasible algorithm basic principle and building combination method of the concrete implementation steps.The fourth chapter introduces the credit data set and the processing work before the experiment. This thesis introduces the sample information of 4 credit data sets involved in the experiment, as well as the data pretreatment before the experiment, and the standard of the parameter setting and model evaluation of the algorithm.The RF-APSOLSSVM algorithm is applied to personal credit scoring in the fifth chapter. The empirical research part is to apply the new model to the credit data set, through the comparison between the model and the empirical results on the data set, to test the applicability of the new method.Conclusion is drawn in the sixth chapter. This chapter summaries the conclusion and prospect of personal credit consumption and the rating model.This thesis proposes a new combination model based on the domestic and foreign personal credit scoring model, and the model is called RF-APSOLSSVM. In the empirical test of the model, the thesis selected the open German credit data from the UCI machine learning, Australia real credit data set and the abroad study provide UK credit data sets, Poland credit data set. Prior to the experiment, these credit data sets have been through a series of data preprocessing including fill the missing values, index assignment, normalization and so on. At the same time, this thesis is selected the eight representative credit scoring models, respectively modeling in the four credit datasets, and the proposed classifier combination prediction results were compared and analyzed. The effectiveness of the applicability and classification of combination model RF-APSOLSSVM were more fully verified. The experimental results show that the classification performance of RF-APSOLSSVM composite model constructed in this thesis is better than most of the personal credit scoring models. The conclusion of this thesis can be summarized as follows:(1) The combination model can obtain better forecasting results. Random forest models with OOB estimation can estimation the importance of characteristic variables, but prediction accuracy of RF model is slightly less compared with APSOLSSVM model, so RF model that is combined with the model of APSOLSSVM not only can use the importance of the characteristic variables to select the features, but also can provide a more good prediction accuracy. In some of the credit data sets, the results of combination model even better than APSOLSSVM model.(2) The choice of different characteristics will bring changes to the prediction results. Compared with long computing time of APSOLSSVM model, RF-APSOLSSVM combination model cut some variables that relative importance is not high, therefore the combination model of operation speed is much faster than the running speed of the APSOLSSVM model, and the correct classification rate is superior to RF and APSOLSSVM model.(3)This thesis presents a good way to the combination model. Near the interval of the vote rate region that is 0.5, the combination model have high error rate and consequently stripping out the region within the scope of the sample, which are used to modeling the two-stage model APSOLSSVM. The forecast results of combination method are greatly improved compared with RF model predictions.In summary, through the empirical research, this thesis proves that two phase combination model of RF and APSOLSSVM is not only feasible in theory, the classification effect of it is also better fit in the empirical study. The empirical study shows that:the combination of credit scoring model RF-APSOLSSVM can be used for personal credit scoring field and it is an effective and novel evaluation method in the practice.
Keywords/Search Tags:Credit risk, Credit scoring, Random Forests, Classification
PDF Full Text Request
Related items