Font Size: a A A

Research On Mobile Advertising Click Fraud Detection Based On KMS Mixed Sampling

Posted on:2023-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:G P LiangFull Text:PDF
GTID:2558307151483744Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet technology and terminal intelligent tools,mobile advertising has become an essential form of advertising in the advertising industry.However,while mobile advertising has greatly increased the revenue of the advertising industry,it has also promoted the prosperity and development of advertising click fraud under CPC charging mode.The occurrence of advertising click fraud will not only make advertisers forced to increase the expenditure of advertising expenses,but also improve the possibility of industrial danger in the advertising industry.Therefore,exploring the method of advertising click fraud detection is of positive significance to the advertising industry.Many scholars at home and abroad have done a lot of work in the field of advertising click fraud detection,but there is little research on some emerging machine learning algorithms and outlier mining algorithms.In order to enrich the research in this field,aiming at the shortcomings of existing methods,this paper studies a new method of advertising click fraud detection from the perspectives of classification algorithm and outlier mining algorithm.The main contributions are as follows:1)Firstly,aiming at the imbalance of category distribution of advertising click fraud data,a kms hybrid sampling method is proposed,which combines the down sampling method based on k-means clustering and the up sampling method of smote;Then,the logical regression model is used to compare kms with various sampling methods,and the effectiveness of KMS mixed sampling method is proved in the AUC value index.2)Aiming at the problems that the base classifier of deep forest algorithm is relatively single and sensitive to unbalanced data,the cascade structure of deep forest is improved in two aspects,including:(1)establishing a base classifier pool for the cascade structure to increase the diversity of base classifiers;(2)Using AUC value instead of accuracy as the evaluation index of whether the cascade is growing or not,so as to reduce the sensitivity of the cascade structure to unbalanced data;CCForest model is proposed.After selecting a better combination of base classifiers through grouping comparison experiments,CCForest model and kms hybrid sampling method are combined to detect advertising click fraud.The experimental results show that the model has a good application effect in the task of advertising click fraud detection.3)Aiming at the problem that the existing density based outlier mining methods are not effective in advertising click fraud detection,a GLOF algorithm is proposed;The algorithm belongs to one of outlier mining algorithms.Firstly,it defines the main neighbor through k-nearest neighbor,reverse k-nearest neighbor and shared nearest neighbor,and takes the main neighbor as the neighborhood of data points;Then,the kernel density estimation method with Gaussian kernel function is used to describe the neighborhood density of data points,and the evaluation idea of LOF algorithm is used to judge whether the data points are outlier or not.The experimental results show that compared with LOF algorithm,GLOF algorithm improves the accuracy and effectiveness of ad click fraud detection.
Keywords/Search Tags:Advertising Click Fraud Detection, Category Imbalance, Deep Forest, Gaussian Kernel Function
PDF Full Text Request
Related items