Font Size: a A A

Click Fraud Prediction Based On Data Mining

Posted on:2022-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiFull Text:PDF
GTID:2518306509489044Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In the past decade,the rapid development of the Internet and the popularization of smart phones in the past ten years,it has brought huge traffic to major Internet manufacturers,this traffic is also driving more and more advertising.Various products of Internet companies have gradually replaced the position of traditional media in the advertising market and occupies a pivotal position.Different from traditional media,the deduction model adopted by Internet advertising exhibits is cost-per-mile(CPM)and cost-per-click(CPC).Cost-per-click is the most widely used payment method in Internet advertising products.With the rapid development of cost-per-click advertising products,the problem of click fraud has become more and more serious and has become an important research topic.At present,there are three main research directions for click fraud,the first is the prevention of click fraud,the second is the detection of click fraud,and the third is to improve the advertising payment method.This paper proposes a machine learning method based on Light GBM for click fraud prediction.By predicting whether a certain click is a fraudulent click,the problem of ad click fraud is transformed into a two-category problem.Among the huge number of clicks,the number of fraudulent clicks is relatively small,so the modeling process of the two-class model faces the problem of sample imbalance.In addition,the data also have the characteristics of simple features and weak crossoverFor the data set published by a certain platform,this article carried out data cleaning,feature analysis,feature extraction,feature engineering and other steps to establish a basic model first,and achieved good results.Analyzing the data found that there was an imbalance between positive and negative samples.This problem was considered when modeling,while setting the scale_pos_weight parameter of the model and re-modeling,the model performance has been improved,but the model characteristic performance needs to be improved.So more suitable features were designed through feature importance analysis,and finally,the model with AUC index of 0.974 was obtained.
Keywords/Search Tags:Click Fraud, Machine Learning, Gradient-boosted Tree
PDF Full Text Request
Related items