Font Size: a A A

Research On E-commerce Customer Churn Prediction Algorithm Based On Multi-algorithm Fusion

Posted on:2020-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2438330596997566Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Aiming at the diversity of customer characteristics in e-commerce customer churn prediction,the imbalance of lost customers,and the use of a single algorithm,it is easy to appear“over-fitting”.This paper constructs an online application customer churn prediction algorithm model based on multi-algorithm fusion.Firstly,in order to solve the difference in the feature space distribution of customer samples,an improved algorithm based on K-means clustering is proposed in this paper for a series of problems,such as initial central point instability,which is easy to fall into local optimum and difficult to select the optimal number of clusters in traditional K-means clustering.The algorithm first uses K-means++ to select K+n objects as far as possible from the data as the initial clustering center,and then uses K-mediods to select the median of the data samples to update the clustering center,in order to further adjust the stability of the clustering,finally clusters the cluster centers into K using two-step clustering.With simulation experiments on four commonly used UCI standard data sets,it is found that the prediction accuracy of the algorithm is improved by 6.88%,1.34%,0.57% and 5.18% respectively.The results show that the algorithm is effective for improving the difference in feature space distribution.Secondly,in order to improve the data imbalance which will affect the accuracy of sample classification.Based on the oversampling method and undersampling method,this paper proposes an EasyEnsemble-Smote algorithm.Firstly,the algorithm synthesizes a new sample based on the analysis of the characteristics of a few types of samples through the Smooth algorithm and adds it to the original data set.Then,through the EasyEnsemble algorithm,the majority class is divided into several sets according to the sampling magnification.Finally,the minority samples generated in the first step are sequentially added to the those sets in the second step,and the these sets are respectively classified,and the average of the these sets is taken.Through the simulation experiments of three commonly used UCI standard data sets,the C4.5and KNN classification algorithms are used to classify the original data,the data processed by the Smooth algorithm and the data processed by the EasyEnsemble-Smote algorithm.The average G-Mean values of the former two increased respectively by 6.36%,3.65%;3.80%,1.70% and 5.65%,2.90%,and the average F-measure values increased respectively by 5.45%,2.25%;4.25%,2.15% and 7.40%,3.10%.The results show that the improved algorithm can effectively improve the data imbalance problem.Then,in order to solve a single algorithm-an "over-fitting" problem occurs on a non-standard data set with fewer samples.In this paper,a combined prediction algorithm model based on C4.5,Logistic regression,SVM and BP neural network is constructed.Firstly,the entropy method is used to determine the weight of each single model,and then the linear regression equation is used to combine the prediction results of each single model to obtain the final prediction result.It is verified by the following experiments that the fusion model improves generalization and universality.Finally,based on the above chapters,the rationality and effectiveness of the algorithm are demonstrated.This paper uses a music website customer data as the experimental data set to construct an online application customer churn prediction algorithm model based on multi-algorithm fusion.Firstly,the E-commerce customer is subdivided into four types of customers with different values by using the improved algorithm based on K-means clustering.Then the EasyEnsemble-Smote algorithm is used to balance the e-commerce customer data imbalance problem.Finally,the combined forecasting model is used to Customer churn rate is forecasted.The final prediction results show that compared with the single customer churn prediction model,the online application churn prediction algorithm model based on multi-algorithm fusion has better effect in prediction.The online application customer churn prediction algorithm model based on multi-algorithm fusion proposed in this paper not only studies and verifies the feasibility and effectiveness of multi-algorithm fusion,but also can visually display the basic characteristics of customers and accurately predict the status of customers.It has strong practicability.
Keywords/Search Tags:K-means clustering, Unbalanced data, Combined forecasting, Customer churn
PDF Full Text Request
Related items