Font Size: a A A

Research On Recognition Model Of Mobile Advertising Click Fraud Based On Ensemble Learning

Posted on:2022-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:R S ZengFull Text:PDF
GTID:2518306458498064Subject:Trade Economy
Abstract/Summary:PDF Full Text Request
In recent years,with the development of mobile internet technology,people‘s daily internet access methods have gradually shifted from traditional PC to mobile smart devices.The number of mobile internet users has grown rapidly,which promotes the vigorous development of mobile advertising.Because of the existence of CPC mode,mobile advertising click fraud occur frequently,which will not only damage the economic interests of advertisers,but also reduce the reputation of advertising platform and affect users‘ online experience.Because of this,it is very important to establish an efficient and accurate mobile advertising click fraud detection model.In this paper,based on the click data of mobile advertising,the data set is preprocessed,and the corresponding exploratory analysis is carried out.Then,according to the imbalanced of the data sets,the Boardline-SMOTE method is used for processing.From the perspective of business and statistics,56 features are constructed,including media information features,time features,IP information features,device information features,and derivative features.The IV value and correlation coefficient are used to preliminarily select the features,and the final features brought into the model are determined based on the importance of GBDT features.Finally,logistic regression,Ada Boost,Random Forest,XGBoost and Stacking methods are used to predict the data set.The accuracy,recall rate,F1 value and AUC value are used as evaluation indexes to comprehensively evaluate the prediction performance of each model and sampling method.The results show that the fraud rate of some client ip addresses and server ip addresses is much higher than the average sample fraud rate.The fraud rate of huawei mobile phone is as high as 0.64,and xiaomi and samsung are higher than 0.4.Users' daily clicks mainly focus on12:00 p.m.to 23:00 p.m.,but the probability of fraudulent clicks appearing from 1:00 a.m.to3:00 p.m.is higher than that of non fraud types.The F1 value and recall rate of XGBoost are the highest,AUC value before and after sampling are all greater than 0.85,and XGBoost is the best among the four models.After Boderline-SMOTE sampling,the AUC value of a single model is significantly higher than the AUC value without sampling.The AUC values of each model were greater than 0.88 with little fluctuation,which showed high robustness.The prediction accuracy rates of the Stacking ensemble models are all above 0.83,the recall rates are all greater than 0.92,and the AUC value is greater than 0.88.In general,the recall rate is better than the other single models.The prediction performance of the model is further improved.
Keywords/Search Tags:mobile advertising, click fraud, ensemble learning, unbalanced data, stacking method
PDF Full Text Request
Related items