Nowadays,5G business is rising gradually.Faced with the huge potential of 5G market in China,major operators begin to compete for market share nationwide and promote the transformation of potential 5G users.In the current big data context,in order to improve marketing efficiency and reduce operating costs,operators need to focus on how to use the information of 5G users to mine potential data rules and conduct targeted marketing.In this context,based on the user information of 4G to 5G user data of an operator,this thesis establishes a variety of machine learning models to predict potential 5G package customers.The main research work in this thesis is as follows:1.Analyze and preprocess data based on business aspects.Through descriptive statistics and visual analysis of predictive variables,we get the distribution of characteristic data of the data set.Based on this,data preprocessing provides effective input variables for model training,which includes data cleaning,feature coding,feature building and feature filtering.2.Establish 5G package potential customer prediction models.First,two single machine learning models,the Logistic regression model and the Naive Bayesian classifier,are set up to predict the potential customers of 5G.Then,three ensemble learning models,the Random Forest,the GBDT model and the LightGBM model,are set up to predict the potential customers.The five models are evaluated and compared in terms of four evaluation indexes:F1_score,AUC,precision rate and recall rate.We find that the ensemble learning models are better than the single machine learning models in the prediction of 5G package potential customers.And the LightGBM model has the best comprehensive prediction effect.3.In order to improve the accuracy of the model prediction,further model optimization is carried out in this thesis.According to the empirical results of the above model,the LightGBM model has the best comprehensive prediction effect,while Random Forest has the highest precision rate.Therefore,we propose an improved two-layer LightGBM model on combining the idea of Random Forest.We take the leaf nodes results which are outputted by the LightGBM model of the first layer as new sample features,and then input the new data set into the second layer LightGBM model for prediction.In the second prediction layer,we build several LightGBM models based on the idea of Random Forest.Specifically,random sample selection and feature selection are carried out before model training in the second prediction layer,and the final results are obtained by voting method.The empirical results show that the improved two-layer LightGBM model has higher Fl_score and AUC value than the traditional LightGBM model.It shows that the improved model proposed in this thesis effectively improves the prediction performance and the generalization ability of the model.Therefore,it can be more effectively applied in the prediction of potential customers of 5G packages. |