Font Size: a A A

Prediction Of User Repeat Purchase Behavior Based On Data Mining

Posted on:2022-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z H XiaFull Text:PDF
GTID:2518306509489144Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
The task of predicting repeated purchase behavior in the Alibaba Cloud Competition is to predict whether a new buyer in a store during Double Eleven will repeatedly purchase goods from the same merchant within six months.This paper analyzes the task and transforms it into a two-category problem.The solution of this paper includes four steps: Firstly,we analyze the characteristics of the data and design some basic analysis strategies accordingly;secondly,We perform feature extraction and feature selection from three aspects: user-related features,merchant-related features,and user-merchant interaction features;thirdly,we use the training set to train four different classifiers;finally,we have a hybrid integration of models and features to further improve the performance of the solution.In the end,we compared the AUC values under several different methods,among which an AUC value of 0.69 is the best solution.The main research work of this paper is as follows:(1)Data preprocessing.After performing a basic visual analysis of the data provided by the contest,we perform data preprocessing operations to make the data more suitable for subsequent data mining work.we perform data integration,expand the data,and then perform a series of operations such as filling missing value and conversing data.(2)Feature engineering.We extracts a total of 150 avaliable features from three aspects: merchants,users and merchant-user pairs.In order to improve the efficiency of model training,we finally selected 83 useful features as the features of model training through the feature selection method.(3)Model training.A variety of common machine learning models are used for model training data set,and the model with better training results is selected.Finally,four classification models of GBDT,RF,Logistic Regression,and Adaboost are selected.(4)Model Integration.In order to further improve the effect of model training,four classification models and three types of feature combinations are integrated.The experimental results of the validation set show that the integrated framework can effectively improve the result of model training compared with a single model.
Keywords/Search Tags:Repeated purchase behavior prediction, data mining, model integration
PDF Full Text Request
Related items