Font Size: a A A

Research On Abnormal User Identification Of China Unicom Based On Ensemble Learning

Posted on:2022-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:K L WuFull Text:PDF
GTID:2518306458497804Subject:Statistics
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous development of mobile communication services,the number of mobile network terminal users is increasing.According to the data of the Ministry of industry and information technology,as of June 2020,China Mobile has 947 million 4G users,China Telecom and China Unicom have 343 million and 310 million 4G users respectively,with a total of 1.6 billion 4G users.However,the rapid development of mobile network brings convenience to people’s life,but there are also some hidden dangers of information security.Some users will use mobile communication network to spread malicious information,junk information or false information,and some users will conduct telephone or SMS marketing for commercial purposes,or cheat for illegal purposes.These users are collectively referred to as abnormal users.The existence of abnormal users not only affects the daily life of normal users,but also is not conducive to the normal operation of mobile operators.Therefore,in order to create a healthy and orderly mobile communication network environment,accurate and efficient identification of abnormal users has very important practical significance.On the basis of summarizing the user identification methods at home and abroad,this paper first expounds the principle of recursive feature elimination method,and introduces two kinds of ensemble learning methods: Bagging and Boosting.On this basis,this paper makes improvement research based on LightGBM,and designs RFE-LightGBM,RF-LightGBM and XGB-LightGBM algorithms.Then,based on the call behavior data,SMS behavior data and App or Website access behavior data of Unicom users,this paper conducts data preprocessing,including data deduplication,classification statistics,feature merging,time period division,region division,feature derivation,etc.,and uses correlation test to eliminate highly related features,on this basis,in order to explore the difference between normal users and abnormal users,this paper makes a comparative analysis of their characteristic statistics and distribution.After that,this paper uses LightGBM,RFE-LightGBM,RF-LightGBM and XGB-LightGBM to establish Unicom abnormal user identification model,and analyzes the prediction effect of the four models through accuracy,F1 value and AUC.The results show that there are obvious differences in the characteristic mean and characteristic standard deviation between normal users and abnormal users of China Unicom.The LightGBM method based on Boosting has good prediction effect for abnormal users of China Unicom,and the accuracy of the four models is above 80%,compared with the four algorithms,the XGB-LightGBM algorithm designed in this paper can effectively improve the prediction results of the model.It not only achieves the accuracy of 90%,but also achieves the AUC value of 0.77.Therefore,the research results of this paper not only help to improve the accuracy of abnormal user identification,but also help China Unicom to create a healthy communication network environment.
Keywords/Search Tags:ensemble learning, abnormal user, RF-LightGBM, XGB-LightGBM
PDF Full Text Request
Related items