Research On User Churn Prediction For Imbalanced Big Data

Posted on:2021-04-16

Degree:Master

Type:Thesis

Country:China

Candidate:W Q Chen

Full Text:PDF

GTID:2428330647457067

Subject:statistics

Abstract/Summary:

PDF Full Text Request

With the rapid development of information science and technology,our access to information has become more extensive and convenient.Users have more contol of the choice of products or services in the market,which resulting in many enterprises are facing the risk of user loss.Before the internet and big data technology are widely used,the data latitude used by enterprises to build the user churn prediction model is low and the structure is simple.Using a single model can get a good performance.For the current online platform relying on the Internet,the difficulty of building the user churn prediction model mainly focuses on the high dimension of user data,complex structure,many noise samples,and the gap between the number of users loss and the number of users not loss.Based on the real data of a large number of users on an online platform,first we explores the factors that affect the loss of users through data visualization technology,and then carries out data cleaning,feature construction and embedded feature selection solution to build the data set for the training of the loss prediction model.Aiming at the problem of unbalanced data,this paper starts from two aspects: data resampling and ensemble learning classification algorithm.At the level of data resampling,logistic algorithm and lightgbm algorithm are used as classifiers to compare the impact of multiple resampling algorithms on imbalanced user data.the resample algorithm selected in this paper include SMOTE?Borderline SMOTE?ADASYN?ENN?Tomek Links?SMOTE+ENN and SMOTE + Tomek Links.and then the best ENN algorithm is selected.At the level of ensemble learning,stacking strategy is used to build a user churn prediction model.At the level of ensemble learning,this paper use random forest,adaptive boosting algorithm,light gradient boosting algorithm and extreme gradient boosting algorithm to build the user prediction model.Compared with the traditional logistic algorithm,the ensemble learning algorithms have better prediction performance on the unbalanced data set.On this basis,the resampling algorithm ENN is integrated into the stacking algorithm framework to further improve,F1 score and AUC value of the user churn prediction model on the imbalanced test data set.The final user churn prediction model F1 score reaches 0.8172,AUC reaches 0.9197.

Keywords/Search Tags:

User Churn Prediction, Imbalanced data, Resample, Stacking

PDF Full Text Request

Related items

1	Prediction Of User Churn By Machine Learning Algorithm Based On Unbalanced Data Sets
2	Research On User Analysis And Behavior Prediction Driven By Big Data
3	User Churn Prediction Of Financial News APP
4	The Application Of Ensemble Learning In The Early Warning Model Of Operator User Churn
5	Research And Application Of Teleconmunication Operator User Churn Prediction Based On Data Mining
6	Online Shopping Customer Churn Prediction Based On Stacking Ensemble Learning
7	A Research On Bagging Of XGBoost Classifiers For Prediction Churn In Telecom
8	Imbalanced Data Mixed Sampling Algorithm And Its Application In Customer Churn Prediction
9	Research On UFIDA User Portrait And Churn Prediction Model Based On Data Mining Method
10	Research On The Prediction Method For Imbalance Data Set