Font Size: a A A

Research On The Application Of Telecommunication Customer Credit Evaluation Based On The Class Imbalance Method

Posted on:2022-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y H WuFull Text:PDF
GTID:2518306569481994Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the 5G and cloud computing era,telecom operators are facing challenges as well as opportunities.In the context of telecommunications big data and personal credit investigation,operators no longer simply provide communication and Internet services,and it is urgent to build comprehensive and effective user portraits.A reasonable credit evaluation model can analyze user data,tap potential customers,focus on low-credit customers,avoid bad debts,and increase corporate profits.The focus of the thesis is to build a credit evaluation model for the imbalance of telecom data sets.The main research contents and results are as follows:First,the original telecom data set is preprocessed and feature selection is carried out,and the classification effect of different classification algorithms on telecom data without processing is studied.The experimental results show that the Light GBM model has a more balanced performance in all indicators.Through the analysis of accuracy and G-means indicators,it can be concluded that when the data is unbalanced,the classification algorithm has certain limitations,that is,it is unable to pay attention to and identify minority samples.Secondly,for the problem of data imbalance,experiments are carried out from the three directions of sampling,cost-sensitive training and anomaly detection.Experiments show that for the data set in the article,the KMeans Smote oversampling method achieves the best performance,with a G-means score as high as 81.6%.Compared with the cost-sensitive model,The Ada Cost model is more balanced in various indicators,with an accuracy rate of less than 2% and F1-score higher than the benchmark algorithm by 6%.The isolated forest anomaly detection algorithm is more accurate than One Class SVM.In addition,the three algorithms that perform well are stacked.Experiments show that the weighted model can improve the original stacking effect,but it is slightly lower than the Ada Cost sub-classifier.Finally,in order to solve the blindness of the weight update in the integrated sampling algorithm,referring to models such as Easy Ensemble,the CM-KSLGM(Cost Matrix-Kmeans Smote-Light GBMBoost)model is proposed.The model introduces a cost matrix to update the sample weights,and uses the minority samples according to the sampling ratio.Sampling with replacement,most samples are not replaced with sampling to construct a training set,and Kmeans Smote is used for data sampling.Comparative experiments show that the CM-KSLGM model can improve the specificity and balance accuracy while maintaining a relatively stable G-means and F1,and the model's anti-risk score KS has been improved to a certain extent.
Keywords/Search Tags:telecommunications industry, credit evaluation, unbalanced data, cost matrix, data sampling
PDF Full Text Request
Related items