Font Size: a A A

Research On Logistic Credit Risk Evaluation Model Based On Sample Information

Posted on:2021-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:S XuFull Text:PDF
GTID:2428330614955448Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the increasing complexity of the social life information,there is a growing demand for the data collection,the storage,the analysis,and the application correspondingly,and it is more and more important to extract valuable information from the complex data.For example,when Banks make risk forecasts for customers,the proportion of the defaults differs from the proportion of the normal customer categories.Therefore,the effective data processing can make a better training result.Under this circumstance,although the number of the accounts of the defaulting customers is small relative to the proportion of the normal customers,if the default sample is wrongly judged as a normal sample,the loss will be unexpected either.Similarly,when the normal sample is judged to be a default sample,the bank will lose customers with good credit.On this basis,the paper studies the distribution of the samples and it correspondingly puts forward a method of the under-sampling of comprehensive score based on the imbalanced information quantity of the sample data.Firstly,the large samples' information size is extracted by Principal Component Analysis,Kernel Principal Analysis and Information Entropy,and then the same amount of small samples are selected.The equilibrium samples are used to establish Logistic Regression Classifier,the method with the best ability is found.Meanwhile,this paper makes an empirical analysis of 26234 competition data from Kaggle,the result shows that the Recall rate increases from 47.1% of the unprocessed data to the three methods of 93.3%,92.1%,and 94.7%,the Information Entropy is indicated to be the best method in the sample data,which shows the effectiveness of the under-sampling method can not only improve the convergence speed of the classification algorithm to a certain extent,but also it can improve the fitting degree and Recall rate of the data model.Due to the limited data set of empirical analysis,the KPCA code which invokes the toolkit and the applicability of the default parameters,the selection of the kernel function are limited.However,the conclusions still have certain guiding significance for the follow-up research.Figure 16;Table 16;Reference 49...
Keywords/Search Tags:data mining, unbalanced samples, data processing, logistic regression
PDF Full Text Request
Related items