The Abnormal User Identification In Communication Operators Based On Unbalanced Data

Posted on:2023-08-15

Degree:Master

Type:Thesis

Country:China

Candidate:X J Zhou

Full Text:PDF

GTID:2558307100977469

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

In the Internet plus environment,communication operators lure consumers by issuing benefits or coupons.But at the same time,a group of econnoisseurs has emerged,causing heavy system load and loss of profits to operators.Thus,it is necessary to analyze and study the consumption behavior of such abnormal users.Through the analysis of mobile user consumption data,the behavior characteristics of abnormal users and the user identification model can be obtained,which is helpful for operators to take blocking measures in time,thereby avoiding capital loss and resource occupation,and forming an effective early warning system.The data in this thesis comes from the China Mobile Big Data Application Innovation Competition,which included 433,413 users and 18 variables.First,to deal with missing values in the data,the categorical variables are filled and the continuous variables are discretized.On this basis,a new feature is introduced and all categorical features were encoded.Then,the behavioral characteristics of the econnoisseurs are analyzed.After m RMR method and random forest algorithm were used for feature selection,the threshold shifting method is used to reduce the impact of class-imbalance of data.Subsequently,a Logistic regression model is established.Through this model,the important behavioral characteristics of the party can be indentified,and rules of indentification are summarized.By means of these rules,users of the data are preliminarily filtered.Finally,this thesis establishes an identification model with higher accuracy and precision based on the filtered data.The problem of class-imbalance of data is studied from the data level and the algorithm level.A variety of resampling methods are used at the data level,but the results show that the performance of model is not significantly improved compared with before.While at the algorithm level,the Easy Ensemble algorithm is used to generate 10 class-balanced subsets of data.After training the models respectively,a combination model is built through Bagging ensemble algorithm.The classification performance of decision tree,random forest,LightGBM and their combination models are compared and analyzed.The results show that the performance of LightGBM model and its combination model is better,with F1 values of 94.86% and 94.62% respectively,and LightGBM algorithm only requires less training time,which can effectively meet actual business needs.

Keywords/Search Tags:

Abonormal user, Imbalaced data, LightGBM, Ensemble learning

PDF Full Text Request

Related items

1	Research On Abnormal User Identification Of China Unicom Based On Ensemble Learning
2	The Used Cars’ Price Forecast Based On LightGBM
3	Research On Prediction Of Top-quality Tourism Service Formation Based On Ensemble Learning
4	Prediction Of User Offline Purchase Behavior Based On Ensemble Learning
5	Research On Fake Voice Detection Methods Based On Ensemble Learning
6	Ensemble Learning And The Application Of Predicting User Churn On E-commerce
7	Research On User Behavior Prediction Model And Its Application Based On Deep Walk And Ensemble Learning
8	Research On Loan Default Prediction Based On Convolutional Neural Network And Ensemble Learning
9	Research On Classification Method For Imbalanced Data Sets And Its Application
10	A Novel Approach To Product Precision Marketing In Industry Based On Ensemble Learning