Font Size: a A A

Research On The Users' Credit Forecasting Method Based On Big Data

Posted on:2019-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:S H XiongFull Text:PDF
GTID:2428330596960862Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Peer to Peer(P2P)is a brand-new financial service model,which is produced by combining unsecured petty lending model with emerging internet technologies.P2 P has developed rapidly in recent years because of its covenant-lite and high efficiency.However,lack of honesty and imperfect legal system caused the honesty crisis of personal credit in China,which seriously hinder the development of online lending.Generally,most of the traditional methods for credit forecasting of borrowing users are based on the data of three factors that affect the default rate of borrowing users.The prediction model is built on the basis of algorithms such as BP neural network,logistic regression,and random forest.Unfortunately,these traditional datasets in some degree cannot reflect the real credit of the borrower and there exists the possibility of counterfeiting,which in turn affects the performance of prediction methods.Therefore,to solve the drawbacks of traditional datasets,in this paper,using more comprehensive datasets,which include basic information,financial information and social information of borrowing users.Besides,based on the above datasets,we train more robust models,like SVM and GDBT,so as to predict the personal credit of users.Finally,the forecasting model is established based on ensemble learning strategy.The main contents of this paper are given as follows:In the process of machine learning,it's very important to select the feature of the dataset.Based on the characteristics of the data set and its operating environment,this paper first fills in and deletes the missing values in the data set.Then,the improved One-Hot code is used to map the categorical features into the Euclidean space.After that,four representative features: ranking features,counting features,discrete features and missing value discrete features are obtained to characterize the original datasets.Due to the high dimensionality of these four features,we use feature select to reduce the complexity and then fuse these features with simple linear combination.Finally,based on the above features,the nonlinear SVM and GDBT algorithms are used to learn the predictive model,and the weighted method is used for ensemble learning to obtain a fusion model that can complement the deficiency of each single model.The characteristics of this forecasting method are: Abandoning the traditional single data set,using big data,establish a prediction method for a specific lending platform by data mining,reduced the impact of borrower information falsification on the forecast results;Combining the features of datasets and the characteristics of the platform,features such as ranking features and missing value discrete features of the dataset are proposed;Due to the large data dimension and the large number of categorical features,the use of improved one-hot codes reduces the sparseness of the features in the dataset classifiers mapped to the Euclidean space;While using a machine learning algorithm to build a single predictive model,the introduction of ensemble learning methods makes it possible to complement each single model and further improve the performance of the model.Finally,a Hadoop pseudo-distributed cluster is built and a series of simulation experiments is conducted in the cluster,which verifies the feasibility and practicability of the method of credit forecasting for micro-credit users based on non-linear SVM and GDBT fusion.
Keywords/Search Tags:Personal Credit Forecasting, SVM, GDBT, Ensemble Learning, Hadoop
PDF Full Text Request
Related items