Font Size: a A A

Prediction Of User Churn By Machine Learning Algorithm Based On Unbalanced Data Sets

Posted on:2022-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:R Z HouFull Text:PDF
GTID:2518306527452294Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,music streaming media is growing,and major music platforms seize market share through a variety of ways.Generally speaking,in the case of highly saturated market,the difficulty of retaining old users is lower than that of attracting new users,so the prediction of user churn in customer relationship management has become a key step in enterprise management.At the same time,the collection and storage of massive data provides convenience for machine learning technology,so using machine learning algorithm to predict customer churn has become the mainstream way.This paper is based on machine learning algorithm to predict the loss of users of music platform KKBOX.However,the complexity of data and the imbalance of categories are the difficulties of user churn prediction,so the appropriate feature selection and model selection in a large number of data has become the focus of this paper.The main work of this paper is as follows:(1)First,after having a basic understanding of the data,the new features are constructed by adding the original data variables to the time window,and then the features are filtered based on the embedding method.Finally,12 variables that are more important to the prediction of user churn are selected,such as is?auto?renew?60,which represents the number of automatic renewals in the last 60 days.It is composed of the 60 day time window and the original variable is?auto?renew.(2)Three ensemble learning algorithms,namely Xgboost,Light GBM and Random Forest,are used to predict the loss of users,combined with random undersampling,Easy Ensemble and SMOTE oversampling.It is found that the Light GBM algorithm based on Easy Ensemble is the best,and its AUC and Recall values are the highest among the nine combinations.(3)On the basis of the above nine combination methods,the paper explores the model,proposes a sampling method combining Borderline-SMOTE,ENN and Easy Ensemble,and uses Light GBM algorithm for prediction.Compared with Light GBM algorithm based on Easy Ensemble,it is found that this method can keep the AUC value at a certain level and improve the Recall value of the model.Therefore,if you want to identify as many lost users as possible without increasing the cost,this method is feasible.
Keywords/Search Tags:user churn prediction, imbalanced data sets, machine learning, data sampling
PDF Full Text Request
Related items