Research On Logistic Credit Risk Evaluation Model Based On Sample Information

Posted on:2021-04-15

Degree:Master

Type:Thesis

Country:China

Candidate:S Xu

Full Text:PDF

GTID:2428330614955448

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

With the increasing complexity of the social life information,there is a growing demand for the data collection,the storage,the analysis,and the application correspondingly,and it is more and more important to extract valuable information from the complex data.For example,when Banks make risk forecasts for customers,the proportion of the defaults differs from the proportion of the normal customer categories.Therefore,the effective data processing can make a better training result.Under this circumstance,although the number of the accounts of the defaulting customers is small relative to the proportion of the normal customers,if the default sample is wrongly judged as a normal sample,the loss will be unexpected either.Similarly,when the normal sample is judged to be a default sample,the bank will lose customers with good credit.On this basis,the paper studies the distribution of the samples and it correspondingly puts forward a method of the under-sampling of comprehensive score based on the imbalanced information quantity of the sample data.Firstly,the large samples' information size is extracted by Principal Component Analysis,Kernel Principal Analysis and Information Entropy,and then the same amount of small samples are selected.The equilibrium samples are used to establish Logistic Regression Classifier,the method with the best ability is found.Meanwhile,this paper makes an empirical analysis of 26234 competition data from Kaggle,the result shows that the Recall rate increases from 47.1% of the unprocessed data to the three methods of 93.3%,92.1%,and 94.7%,the Information Entropy is indicated to be the best method in the sample data,which shows the effectiveness of the under-sampling method can not only improve the convergence speed of the classification algorithm to a certain extent,but also it can improve the fitting degree and Recall rate of the data model.Due to the limited data set of empirical analysis,the KPCA code which invokes the toolkit and the applicability of the default parameters,the selection of the kernel function are limited.However,the conclusions still have certain guiding significance for the follow-up research.Figure 16;Table 16;Reference 49...

Keywords/Search Tags:

data mining, unbalanced samples, data processing, logistic regression

PDF Full Text Request

Related items

1	Research On The Application Of Math Grade Prediction System Based On Logistic Regression
2	Design And Realization Of Logistic Regression Analysis In 3-tiers Calculating Architecture
3	Research On The Prediction Of Insurance Payment Based On Logistic Regression Model
4	Personal Credit Risk Assessment Under Unbalanced Data Sets
5	The Applied Research Of Data Mining On Calculator Audit
6	Research On Traditional Classification Model Based On Unbalanced Data
7	The Application Of Data Mining On Marketing Of The Credit Card Customer
8	The Research On Transdctive Transfer Learning With The Logistic Regression Model
9	Default Prediction Of P2P Online Loan Users Based On Unbalanced Data
10	Logistic regression for data mining and high-dimensional classification