Font Size: a A A

Research On Personal Credit Model Based On Mobile Telecom Data

Posted on:2018-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y P TangFull Text:PDF
GTID:2359330518496558Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of economy, credit evaluation has a positive significance for individuals to obtain a more convenient service and the social operation cost reduction. Personal credit evaluation starts late in China, the current personal credit coverage is low, but demand is strong.Telecom data contain personal identity information, consumer records and these data have highly relevant with credit rating. Besides, telecom data have many users and rich dimensions which is suitable for personal credit evaluation.In this paper we first analyze and compare some common used credit models. These models include logistic regression, support vector machine and decision trees based on statistical and intelligent methods such as neural network. We analyze theoretically and deduced these models and offer some solutions for over-fitting in practice.Then we explore the telecom data and select the characteristic data needed for modeling based on FICO(Fair Isaac&Company) and correlation analysis. The data is then cleaned and preprocessed, including filling the vacancy values, eliminating outliers, data discretization and normalization. After analyzing the data, we find that there are multiple collinearity between some features, which is not consistent with the assumptions of logistic regression model. Therefore, principal component analysis is used to solve the collinearity problem. After preprocessing, we analyze the telecom data to have more in-depth understanding of the characteristics of the distribution and the relationship with the credit.A single model to do credit evaluation will not get good result sometimes, this paper take advantage of the idea of ensemble learning in machine learning to build a personal credit model forest based on random forest. Combining the features of the original telecom data with redundant information and class imbalance, a decision tree is introduced on the basis of a single decision tree, which has repeated sampling and random selection of feature subsets, training multiple decision trees, and then combines them to predict output. After the model training, the result showed that the prediction accuracy, precision, recall rate and the F1 score are both superior to the common credit models.
Keywords/Search Tags:credit model, telecom data, random forest, data mining
PDF Full Text Request
Related items