Font Size: a A A

An Empirical Study Of User Purchasing Behavior Based On Machine Learning

Posted on:2022-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:L J ChuFull Text:PDF
GTID:2518306311968889Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In the last few years,the rapid development of the Internet and big data has greatly changed the way people live.In the field of e-commerce,major e-commerce platforms are actively seeking ways to explore users' preferences and needs,and analyzing and researching a large amount of user behavior data generated on the platform every day,in order to improve users' experience and satisfaction.Through the analysis of user behavior data,the platform can recommend more appropriate products for users to achieve accurate recommendation,and at the same time reduce business operating costs and promote the sound development of enterprises.This paper is based on the user behavior data set after desensitization provided by JD.In this paper,data analysis and machine learning algorithm are used to analyze the user's historical behavior,predict the user's future purchase behavior based on the commodity category,and explore the method to improve the accuracy of model prediction.Before establishing the model,a series of data preparation work is carried out in this paper.It mainly includes data integration,data analysis,data cleaning and feature engineering.After exploratory data analysis,we have a general understanding of the data set,and found that there is a large correlation between user purchase behavior and time.Data cleaning is mainly to delete or complete the missing data and noise in the data set,so as to conduct model training in the later stage.Finally,combined with the actual situation,we construct 30 features from three dimensions of user,commodity category and user-commodity category.They reflect the characteristics of users themselves,the characteristics of commodity categories and the interaction between users and commodity categories respectively.In this paper,two single machine learning algorithms,logistic regression and support vector machine(SVM),are used to predict user behavior for the preprocessed data set.According to the experimental results,the F1 score and AUC of the two models are all around 0.33 and 0.67,which are effective but not very good.Therefore,two ensemble learning algorithms,Random Forest and Adaboost,are used in this paper to predict the test set,in order to improve the accuracy of prediction.The experimental results show that the two integrated learning algorithms have better prediction effect than the single machine learning algorithm.Among them,the Adaboost algorithm has the highest accuracy,with the F1 value of 0.4182 and the AUC value of 0.7168.In order to further improve the prediction accuracy of the model,in Section 4.5,this paper puts forward the idea of using random forest to improve bagging algorithm to improve Adaboost.Before the training of Adaboost,we conduct a feature selection first,and finally construct the improved Adaboost fusion model.The experimental results show that the F1 score and AUC of the improved Adaboost fusion model are improved compared with the traditional Adaboost algorithm.The F1 score of the improved model is 0.4324,which is 3%higher than the traditional Adaboost algorithm.AUC is 0.7322,which is 2%higher than that of the traditional Adaboost algorithm.This indicates that the improved Adaboost fusion model can effectively improve the performance of the learner,enhance the generalization ability of the model,reduce the risk of over-fitting,and is feasible in the prediction and optimization of the user's purchase behavior.
Keywords/Search Tags:user behavior, machine learning, random forest, Adaboost algorithm
PDF Full Text Request
Related items