Font Size: a A A

Credit Risk Prediction Of Telecom Users Based On Improved Stacking Algorithm

Posted on:2024-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y SunFull Text:PDF
GTID:2569306938998049Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Credit market is an integral part of the socialist market economic system.In recent years,with the rapid development of big data technology,China’s personal credit evaluation industry has entered a new stage,and it has also brought about problems such as incomplete data sources,narrow coverage of the population,and low prediction accuracy.Telecommunication operators have a large amount of user information data,with the data advantages of large scale,high accuracy and diversity.They can provide new ideas for the construction and improvement of the credit system.However,at present,operators have shortcomings in the management of user credit platform and fail to make full use of effective data.Therefore,it is urgent to improve the credit evaluation system to solve the problem.This thesis evaluates personal credit based on the data of a domestic telecom operator.It also applies the Stacking fusion algorithm on the basis of the traditional Logistic regression model to further construct a telecom credit model with better rating effect.The research content of this thesis includes:(1)Data preprocessing and feature screening of the original telecom user data.For the data imbalance problem,this thesis adopts Borderline-SMOTE algorithm to deal with it.At the same time,there are many features in the telecom user data.In order to weaken the influence of data distribution on feature screening,this thesis considers both model and statistical perspectives,and uses feature importance ranking and feature relevance methods to select feature variables.(2)GBDT(Gradient Boosting Decision Tree),XGBoost(Extreme Gradient Boosting)and Random Forest are used as base classifiers,and Logistic Regression Algorithm is used as a meta-classifier to construct Stacking fusion model.In order to adapt to the multiclassification credit assessment problem,this thesis modifies the assessment indexes on the basis of the binary classification model.at the same time,considering the influence of the differences between the base classifiers on the performance of the meta-classifiers,a feature-weighted Stacking fusion model is constructed.(3)Considering the characteristics of the telecom user data in this thesis such as positive skewed distribution,this thesis selects two feature discretization methods,equal frequency discretization and cluster discretization,and applies them to the weighted Stacking model respectively.The results show that the model constructed in this thesis has better applicability compared to the binary classification model commonly used in other studies.Meanwhile,the weighted Stacking model has a stronger credit rating ability,and it is more suitable for application in large-scale credit assessment.After the data of this thesis was processed by equal-frequency discretization,the prediction accuracy of the model was improved by about 2.21%,indicating that feature discretization using equal-frequency method is effective.
Keywords/Search Tags:telecommunication credit evaluation, Stacking algorithm, ensemble learning, logistic regression, Borderline-SMOTE
PDF Full Text Request
Related items