Font Size: a A A

Research On E-commerce Purchase Behavior Prediction Based On Feature Selection And Stacking Integrated Algorithm

Posted on:2022-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhaiFull Text:PDF
GTID:2518306494467854Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of online shopping,the number of online shopping users in my country has reached 750 million,making it the world's largest online retail market for 7 consecutive years.However,the huge volume also brings about the problem of "information explosion".Users often need a higher purchase decision cost when facing a dazzling array of products.The marketing budget of merchants who want to promote their high-quality products is also increasing.How the e-commerce platform uses the massive user historical behavior data in the server to build a personalized recommendation system based on users and merchants through machine learning has become a current research hotspot.This paper has carried out the research of user purchase behavior prediction model from the perspective of feature selection and integrated machine learning algorithm.The main work is as follows:(1)Construct the characteristics of user purchase behavior prediction.First,through visual data analysis,the "crawler" users and missing values are identified,and then eliminated and filled.After that,statistical methods were used to construct 7types of 216-dimensional features in actual business scenarios.The optimal time interval is selected to construct the user candidate set,which solves the problem that the correlation between user behaviors gradually decreases over time.In the division of the training set and the test set,the label is completely isolated from the prediction interval,which avoids the phenomenon of sample "crossing" caused by timing problems.Finally,a random under-sampling algorithm based on DBSCAN clustering is proposed,which solves the problem of data loss caused by random under-sampling and reduces the problem of data skew caused by sample imbalance.(2)A recursive feature selection algorithm based on LightGBM(RFE-LightGBM)is proposed.After each round of LightGBM feature sorting,the feature with the lowest contribution is removed,which solves the problem that the LightGBM feature selection algorithm cannot determine the optimal feature subset.The constructed feature set is input into the RFE-LightGBM algorithm,and the optimal51-dimensional feature subset is obtained,with a dimensionality reduction rate of 76.4%,which improves the interpretability of the model and at the same time increases the AUC value of the model.(3)Aiming at the problem of poor generalization of a single model,and in order to solve the problem of sample imbalance caused by random sampling in cross-validation,a Stacking integration algorithm based on hierarchical cross-validation is proposed.Take LightGBM,XGBoost,SVM and LSTM as the first layer of the Stacking algorithm,and use the logistic regression algorithm as the second layer of the algorithm.After training,the final purchase prediction model is obtained.The experimental results show that the improved Stacking algorithm in this paper is better than a single model in both performance indicators of F1 value and AUC value,which reflects the effectiveness of the algorithm.In addition,this paper also designed and carried out an ablation experiment.By eliminating the four algorithms in the first layer and then training them,the contribution of the above algorithms in the Stacking model was obtained.Finally,a comparative experiment shows that the F1 values of the above-mentioned algorithms have different ranges before and after feature selection,which proves that the RFE-LightGBM feature selection algorithm in this paper has good generalization performance.
Keywords/Search Tags:Purchase behavior prediction, RFE-LightGBM, feature selection, Stacking integration model
PDF Full Text Request
Related items