Font Size: a A A

Research On User Churn Prediction For Imbalanced Big Data

Posted on:2021-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:W Q ChenFull Text:PDF
GTID:2428330647457067Subject:statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of information science and technology,our access to information has become more extensive and convenient.Users have more contol of the choice of products or services in the market,which resulting in many enterprises are facing the risk of user loss.Before the internet and big data technology are widely used,the data latitude used by enterprises to build the user churn prediction model is low and the structure is simple.Using a single model can get a good performance.For the current online platform relying on the Internet,the difficulty of building the user churn prediction model mainly focuses on the high dimension of user data,complex structure,many noise samples,and the gap between the number of users loss and the number of users not loss.Based on the real data of a large number of users on an online platform,first we explores the factors that affect the loss of users through data visualization technology,and then carries out data cleaning,feature construction and embedded feature selection solution to build the data set for the training of the loss prediction model.Aiming at the problem of unbalanced data,this paper starts from two aspects: data resampling and ensemble learning classification algorithm.At the level of data resampling,logistic algorithm and lightgbm algorithm are used as classifiers to compare the impact of multiple resampling algorithms on imbalanced user data.the resample algorithm selected in this paper include SMOTE?Borderline SMOTE?ADASYN?ENN?Tomek Links?SMOTE+ENN and SMOTE + Tomek Links.and then the best ENN algorithm is selected.At the level of ensemble learning,stacking strategy is used to build a user churn prediction model.At the level of ensemble learning,this paper use random forest,adaptive boosting algorithm,light gradient boosting algorithm and extreme gradient boosting algorithm to build the user prediction model.Compared with the traditional logistic algorithm,the ensemble learning algorithms have better prediction performance on the unbalanced data set.On this basis,the resampling algorithm ENN is integrated into the stacking algorithm framework to further improve,F1 score and AUC value of the user churn prediction model on the imbalanced test data set.The final user churn prediction model F1 score reaches 0.8172,AUC reaches 0.9197.
Keywords/Search Tags:User Churn Prediction, Imbalanced data, Resample, Stacking
PDF Full Text Request
Related items