Font Size: a A A

Online Purchase Behavior Prediction Based On User's Behavior Sequence

Posted on:2020-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:W Q DuanFull Text:PDF
GTID:2428330572481775Subject:Engineering
Abstract/Summary:PDF Full Text Request
The online shopping has developed rapidly over the years,it becomes an indispensable part of people's daily life.Each e-commerce platform has accumulated hundreds of millions of loyal users and accumulated massive real data.How to find rules from historical data,grasp users' ideas,effectively solve users' practical problems and improve users' shopping experience are the key issues in the application of big data in precision marketing.Therefore,this paper uses machine learning algorithm to obtain the model by learning the purchase behavior patterns from the historical data of users,commodities and user-commodities,so as to realize the prediction of online purchasing behavior.In this paper,we use Chinese big data algorithm competition(as expected--prediction of user purchase time)as the background,the real purchase behavior data of users on jingdong e-commerce platform is taken as research data,and machine learning algorithm is used for modeling.Before building the model,basic preparations should be made,including data preprocessing,data analysis,feature engineering,etc.,determining the final modeling target and the division of offline training set and offline validation set,as well as building the original feature group of the original model,and recording the initial experimental effect.The research in this paper is data-driven.For the prediction of online purchase behavior,the user's commodity behavior is recorded as the sequence of user's behavior,and the utility function of the sequence of user's behavior is proposed.The utility function is considered from the behavior frequency and recency,and then the user's commodity preference is mined from the sequence of user's behavior.After the utility function is used to obtain the user's commodity preference data,the data is brought into the model for training.The experiment was conducted mainly on the Logistic Regression model and the GBDT model with the optimal initial results.Yet the user evaluation on the Logistic Regresion model showed that S1 improved by 0.014 and the final result reached 0.663.This shows that the proposed method has a good effect.At the same time,this paper also proposes to form a document through the sequence of user behavior,using CountVectorizer and TfidfVectorizer to make the word embedding respectively.CountVectorizer only considers the frequency of words in the text,and TfidfVectorizer not only considers the frequency of words in the text,but also focuses on the number of texts containing the word,which can reduce the influence of high frequency meaningless words and dig out more meaningful features.After obtaining the word embedding,train the corresponding word embedding in the LDA topic model to obtain the topic probability distribution.By setting different topics,we got the optimal number of topics n_topics=15.The subject probability distribution data is brought into the model training,and the user evaluation S1 of the experiment on the GBDT model is improved by 0.059,and the final result reaches 0.703.The experimental effect is better than the result of the utility function.Finally,the utility function and the topic probability distribution were used for joint evaluation,which improved the Logistic Regression model.The final result of the Logistic Regression model reached 0.672,which verified that the proposed method was feasible and effective.The prediction model in this paper can well improve the prediction results.Among them,the feature generation method based on user behavior sequence can be used as a feature generation method for time serial-like data.During the period,the behavior record of the user on the entity can generate behavior sequence,and then use the above two methods to generate features.
Keywords/Search Tags:online purchase behavior, Logistic Regression, GBDT, User behavior sequence, word embedding, LDA topic model
PDF Full Text Request
Related items