Font Size: a A A

A Research On CTR Prediction Based On Ensemble Of RF,XGBoost And FFM

Posted on:2019-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:X P WangFull Text:PDF
GTID:2417330572454090Subject:Statistics
Abstract/Summary:PDF Full Text Request
In the era of Internet big data,the accuracy of CTR prediction has a high commercial value to the company.The construction of CTR predictive model has some theoretic research value and practical commercial application value.The model commonly used in CTR prediction is LR.but in terms of advertising,the data dimension is high and volume is very large,and the research also find relevant information between features.On the one hand,in fact,data needs to be processed quickly and feature selection has to rely on human experience before,which does not necessarily bring about good results while consuming energy.On the other hand,there is some related information,and adding valid information can improve the accuracy of predictions.Therefore,how to quickly and automatically discover valid features and construct effective feature combinations are the key issues in CTR estimation.Based on the existing model of GBDT+ LR,the parallelizable XGBoost algorithm is used to replace the traditional GBDT,and the relevant information between features is added by using?wi,fj'wj,fi?xixj,the FFM model is added to the sigmoid function to get the probability value.In this paper,we also study the ensemble learning,and use the predicted value of FFM model as the new feature with the existing features to input RF and XGBoost model.So,we construct XGBoost + FFM,FFM + RF,FFM + XGBoost ensemble model.In the empirical,we use logloss to analyze the advantages of the model on the accuracy,computing speed,stability.The main advantage of the model is reflected in three aspects.First,this paper considers the relevant information,considering the sparseness of data and introducing factorization to construct FFM model,which has better nonlinear fitting ability than LR.Second,the approximate algorithm of XGBoost achieves the parallel computing,and the computing speed is obviously faster than GBDT and the accuracy is also high.Third,the ensemble greatly enhances generalization ability with certain assurance of accuracy.
Keywords/Search Tags:CTR, XGBoost, Field-aware Factorization Machine, Rondom Forests, Ensemble Learning
PDF Full Text Request
Related items