Font Size: a A A

5G Potential User Identification Based On The Integrated Learning

Posted on:2022-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:B F LiuFull Text:PDF
GTID:2518306509989129Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the development of 5G technology,domestic and foreign operators have successively launched 5G commercial services.The 13 th Five-Year Plan outline mentions the active promotion of 5G commercial use.So the local governments of various cities have successively issued development policies,and the three major domestic operators are actively deploying networks and promoting 5G services in many cities.This paper selects user data of a telecom operator for model training.Based on domestic and foreign research,it aims to establish a potential 5G user identification model.This research mainly includes two aspects:First,we train a data set with a large number of categorical features and the imbalanced data problem.After comparing many types of unbalanced data processing methods,this article innovatively proposes the RUSCatboost model,which is a model based on ensemble learning that embeds random undersampling into the Catboost algorithm.Random undersampling solves the problem of unbalanced data.Catboost can automatically process categorical features and feature combinations.Taking AUC,F1 value and confusion matrix as the evaluation criteria,the RUSCatboost model is better than most methods in dealing with the problem of unbalanced data under the condition of ensuring that the model does not overfit.Therefore,the RUSCatboost model can obtain stable and better results when training the data set with the above two characteristics.Second,we establish an identification model for potential 5G users.At the beginning,it is found that the dependent variable has a strong correlation with indicators such as use behavior and consumption behavior based on correlation analysis.Thus,it is concluded that the user behavior of internal factors is the key to determining whether users choose 5G.Before the construction of our model,we discrete continuous variable to improve variable interpretability,divide 38 features into 4 groups to construct new features to replace low correlation variables and reduce model complexity and training time,establish user value indicators based on RFM model.There are seven new indicators in the top 30 ranked by feature importance.We use random forest,Xgboost and Catboost train samples separately and choose the best unbalanced data processing method to build the model.Compared with known results,the classification accuracy of this model has been significantly improved.
Keywords/Search Tags:Potential User Identification, Integrated Learning, Catboost, Random Forest
PDF Full Text Request
Related items