Font Size: a A A

Gstore Consumption Forecast Based On Machine Learning

Posted on:2020-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:J Q GuoFull Text:PDF
GTID:2428330575452044Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
The consumption forecast is the starting point and foundation of the business operation forecast,and is closely related to the business activities of various enterprises.Accuracy of consumption forecasting is of great significance for improving the scientific nature of business decision-making,and is directly related to the economic benefits of enterprises,and is related to the survival and development of enterprises.On the one hand,it is helpful to grasp the basic dynamics of the demand side of the product market and the general law of product sales changes.On the other hand,the consumption forecast is also the main basis for making business decisions.With the Pareto Rule(80/20 Law),when a company finds that 80% of its profits come from 20% of its customers,it should try to let that 20% of its customers be willing to expand its cooperation.Many companies shift the focus of their consumption forecasts to a small number of customers who generate most of their revenue.Therefore,how to accurately predict the consumption of each consumer in a certain period of time in the future and discover potential large customers from it becomes the focus of most enterprises.Previous consumption predictions are generally based on time series or BP neural network methods,but time series considers more time trends and seasonality,and limited to a single user historical consumption information is relatively rare,can only predict the total sales volume,facing the single user's consumption quota prediction problem is often helpless,and it is impossible to use the Pareto rule to find the 20% of users with the highest consumption level.However,the input data of BP neural network is redundant,the speed of algorithm learning convergence is low,and the error has local minimum value.In this paper,through the feature engineering and using the machine learning method represented by LightGBM,XGBoost and CatBoost,according to the characteristics of web page clicks,total number of pages viewed,average click rate,etc,the accuracy is compared from the single model to the combined model,and the better one is selected.The combined model,and predict the consumption quota according to the customer's future browsing behavior,and make reference for the formulation of enterprise decision-making and operation plan.The article uses 903653 sales data from the famous Google Merchandise Store(GStore for short)to predict the future spending of each customer in Gstore.First,explore the distribution and missing data through exploratory data analysis.Secondly,use feature engineering to split the columns containing complex information(such as purchase time)by year,month,week,day,etc,and then use visualization technology to find the right Theconsumption forecast predicts the most influential features,and then uses XGBoost,LightGBM and CatBoost algorithms to predict the consumer's Gstore consumption in the next year,and maps the importance of features to make the important features more intuitive and horizontal.Compare the performance of different single models in customer consumption forecasting,and finally linearly combine the models to select the linear combination model with the highest prediction accuracy.After processing and modeling the data,RMSE(Root-mean-square error)is used as the evaluation standard of the four basic models.The research shows that the single model prediction accuracy is GBM>CatBoost>XGBoost,and the combination of the model is found.The accuracy of ternary combined model GBM+XGB+CatBoost>accuracy of any binary combination model>single model accuracy,thus obtaining an optimized machine-based Gstore consumption forecasting model.
Keywords/Search Tags:LightGBM, XGBoost, CatBoost, Consumption forecast, Combination model
PDF Full Text Request
Related items