| With global economic growth slowing down and the global epidemic turbulence,competition in all industries is becoming fiercer.On the one hand,all major Internet companies and operators have encountered bottlenecks in product innovation with serious product homogeneity,which can’t constitute a new user growth point.At the same time,with the savage development of mobile Internet in the past decade,the dividend of Internet population and traffic has become stable.On the other hand,a series of impacts,such as the China-USA trade war in 2018,the COVID-19 pandemic in 2020 and the world turmoil in 2022,have led to the global economic downturn,market contraction in various sectors and increasingly fierce competition.Faced with internal and external difficulties,it is inevitable for major Internet companies and operators to anchor the operation direction of user operation,loss prediction and reflux.In the fields of credit risk identification and anomaly monitoring applied by the traditional imbalance model,the imbalance of sample categories is relatively stable,and the application of the traditional imbalance model has achieved good results.However,by analyzing the user churn datasets from Internet companies and operators in the Kaggle data platform,this paper finds that the churn datasets are characterized by imbalanced sample,and there are large differences in sample imbalance ratio and data volume.Therefore,it is necessary to use a classification model adaptive to all kinds of imbalance datasets in the field of user churn identification.Based on this,this paper proposes an imbalance model based on the Boosting framework to avoid testing various imbalance models at a large cost in the process of selecting the optimal imbalance model for each dataset,and further improve the identification effect of the model on user churn and the identification stability of the model on various imbalance datasets.In this paper,the adaptive imbalance category ratio is taken as a breakthrough and combined with the current mainstream imbalance model construction idea--Boosting ensemble model as its basic framework,and the sample skewness adjustment function is introduced according to Bartosz’s 2016 idea of giving weight to a few categories in the imbalanced data set classification model and predicting or rebuilding the potential category structure.In this way,the ImBoosting-Weight model and the improved ImBoosting-ADASYN-Weight model are constructed.The improved Boosting model in this paper is different from the current mainstream Boosting imbalanced model,which combines the imbalanced processing method for method nesting.However,ImBoosting-Weight optimizes the sample weight update function in the iteration process of Boosting model based on sample skewness in the misclassification,which can adaptively strengthen the next round weights of minority classes in the misclassification samples and ensure that the weight adjustment pays more attention to the imbalanced distribution of difficult samples,and finally the imbalanced Boosting model—ImBoosting Weight algorithm is constructed.On this basis,in order to solve the problem that the imbalanced sample weight distribution in the iteration of ImBoosting-Weight algorithm is easy to cause the subsequent base model to repeatedly learn minority classes of original sample data,which leads to the overfitting of the base model,this paper proposes an improved ADASYN-Weight synthetic sampling algorithm based on the weight distribution to adaptively determine the number of samples synthesized by a single sample in the ADASYN sample synthesis stage using the adjusted weight distribution during the training process,Thus,the ImBoosting ADASYN Weight model is constructed.This model reduces the risk of over-fitting the base model by introducing random factors in the sample synthesis stage.In the empirical analysis stage,based on various imbalanced datasets,ImBoostingWeight and ImBoosting-ADASYN-Weight models are applied to compare with mainstream imbalance treatment methods or imbalance classification models,such as ADASYN,RUSBoost and Banlanced Bagging,etc..The AUC value,recall rate and precision value show the differences of various models.Among them,ImBoostingWeight algorithm ranks first in the AUC value and recall rate of the 80% user churn dataset selected in this paper,showing the ability to recognize minority classes and the stability of different proportions.The precision and accuracy of ImBoosting ADASYN Weight algorithm in all user churn datasets are also excellent,which shows that the improved algorithm has a certain effect on improving the precision of minority recognition.At the same time,the improvement of precision shows that the introduction of ADASYN-Weight model prevents certain overfitting.Finally,through dynamic threshold analysis,it is found that ImBoosting-ADASYN-Weight model has a more uniform prediction probability distribution,and the moving range of effective threshold extends from [0.4,0.7] to [0.25,0.75],which is more conducive to threshold adjustment based on demand to build a model with high recall rate or high precision features,thus contributing to the analysis of operational strategies. |