Font Size: a A A

Research On Credit Scoring Model Based On Operator Data

Posted on:2021-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:S GuoFull Text:PDF
GTID:2510306455981829Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the progress of society,China's economy is moving towards a more open and free direction,and credit plays a more and more important role in economic development.The quality of credit affects the vital interests of citizens in many fields.How to evaluate the personal credit has become an urgent problem to be solved in various industries.In order to solve the problem of the accuracy of personal credit evaluation and to score the personal credit in advance,it is necessary to combine personal historical behavior data to model and analyze personal historical behavior and accurately evaluate credit rating.The banking industry has a long-term research in the field of personal credit rating.The historical data selected by the traditional bank evaluation system is generally personal loan data.What's less,these data will be recorded only when users have loan behavior.Now,more and more enterprises begin to use their own operation data for user credit evaluation.Their data has a wide coverage,large amount of data and sufficient indexes.The data set of this paper is the user's data of China Mobile Fuzhou branch in a month of 2018.There are 50000 pieces of data in the data set,each of which has 30 variables,covering the user's communication expenditure,travel situation,application behavior preference,social contacts and other multi-dimensional data,among which variable credit score is divided into dependent variables.Firstly,the user information in the data set is preprocessed,and the credit is divided into three levels by using the continuous variable discretization,and the missing value is filled by combining the credit levels.Then,the distribution characteristics of each variable are analyzed by descriptive statistics and visualization,and then the classification variables are coded by One-Hot coding,and the dimensional problems among different variables are solved by data standardization.In the process of feature selection,this paper uses stepwise regression.In terms of model selection,this paper not only selects statistical model and algorithm model,but also considers the difference between single learner and ensemble learning effect.Ridge model,Random Forest model and Light GBM model are considered successively.In the process of parameter selection,cross validation and grid search are mainly used,and the optimal parameters are substituted into the model for regression prediction.Then,the mean absolute error and goodness of fit are used to compare the models.Finally,the prediction of Light GBM model is found better than that of Random Forest and Ridge model.This paper uses the real data of China Mobile,based on the basic information and historical data of users,and successfully forecasts the credit score of users.This not only establishes a personal credit rating system for enterprises,but also provides ideas for the government to build a citizen credit rating system in the future.The cooperation between the government and enterprises can obtain massive data with diverse dimensions,real-time and effective coverage in various fields,which can improve the accuracy of credit rating prediction.
Keywords/Search Tags:Data mining, Credit scoring, Regression prediction, LightGBM Algorithm
PDF Full Text Request
Related items