Font Size: a A A

Advertising Anti-fraud Research Based On Catboost Model

Posted on:2021-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:D XiaoFull Text:PDF
GTID:2518306455981899Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the vigorous development of 4G technology and the arrival of 5g era,people's life,work,entertainment learning and other aspects are increasingly inseparable from mobile intelligent devices,mobile Internet advertising in the advertising industry is more important.The rapid development of mobile advertising business also exposes more and more fraud risks.With the maturity of basic technology,advertising fraud gradually becomes large-scale collectivization,which will bring malignant consumption to advertising budget,and may damage the survival reputation of the Internet advertising market,and erode the marketing ecology of the advertising industry.Therefore,mobile advertising anti fraud is an important problem to be solved.To build an effective classified prediction model of advertising anti fraud can help advertisers and marketing platforms to mine important information from massive click data,predict whether sample click is fraudulent traffic,identify the IP or device type of large-scale fraud,etc.,so as to facilitate the marketing platform to take effective strategies to prevent fraud and reduce the loss of advertising budget.In this paper,1 million samples of i FLYTEK's AI marketing cloud traffic are used as data sets.Firstly,data analysis and data preprocessing are carried out to get a general understanding of the overall data distribution and feature types.Then,based on the original features,feature engineering is carried out to construct a large number of time class,statistical class and cross features,with a total of 139 features.Based on the importance of machine learning model,the data sets are analyzed Feature filtering.Then,we use the processed data and features to establish the advertising fraud classification prediction model based on xgboost,lightgbm and catboost models,and combine the grid search and Bayesian Optimization(SMBO)to optimize the parameters respectively.After selecting the optimal scheme of the parameters,we integrate the stacking model.The first level model is the first three models of the optimal state,and the second level model is the random forest model Based on the model evaluation index,the model effect is evaluated and compared.Combined with the above four models,the catboost model is the best one.The F1 and AUC of the model reach 0.9392 and 0.9865 respectively.Through the establishment of the model,not only the fraudulent traffic can be identified,but also the wrong non fraudulent traffic can be avoided.The fraudulent features obtained from the feature importance extraction also provide reference and basis for the strategy designation and prevention of the advertising industry,which is conducive to the healthy development of the advertising marketing ecology.
Keywords/Search Tags:advertising fraud, catboost, feature engineering, lightgbm, stacking model integration
PDF Full Text Request
Related items