Font Size: a A A

Data Analysis Technology In Mobile AD Traffic Anti-fraud

Posted on:2021-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:E L HuangFull Text:PDF
GTID:2518306311496044Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet in recent years,the number of mobile Internet users has exploded.Precision,immediacy,interactivity and diffusing are the advantages of mobile advertising,making it favored by more and more advertisers,who put mobile ads on mobile apps or computer websites.In the interest driven,advertising traffic fraud to earn advertisers commission vicious development and affects the healthy development of the mobile advertising industry,so the study of mobile advertising fraud traffic identification problem has a high practical significance.First of all,based on the current network traffic data of June 2019 provided by S company,this paper applies data analysis technology in the field of mobile advertising traffic anti-fraud,which requires some preparation work,such as data understanding,data exploration,data cleaning and data partitioning.Among them,the data cleaning stage is for outliers,missing values,category characteristic outliers,etc.,which occupies more than 70%of the time in the whole process of data analysis,which is the most time-consuming step that cannot be skipped.Secondly,the characteristics of the data after cleaning engineering is to unearth more valuable characteristics,the main building and split features,cross,the statistical feature,feature filtering work,including feature selection(recursive feature elimination method)combined with random forest algorithm and cross validation,the purpose is data support and select an optimal feature subset.After that,a variety of model attempts were made on the data after feature engineering,including the traditional machine learning decision tree method,the random forest model improved on this basis,and various boosting algorithms(GBDT model,LightGBM model,XGBoost model and CATBoost model)improved on this basis.This series of algorithm in dealing with the category of the variable values larger data with strong generalization ability,the advantage of high accuracy,and mobile Internet advertising data just has the category variable values,class variables,the characteristics of large amount of data,this paper will these new methods in this new field attempts to make a system,eventually combined with logistic regression model and integration model(stacking)to further enhance the model performance in recognition of mobile advertising fraud traffic.Finally,this paper design four groups of experiments,the use of the accuracy,precision ratio and recall ratio,F1 and AUC value as the evaluation index,and the classification of the main evaluation model based on a formula one value performance,comparing different models,different experiment,different evaluation index of performance,this paper USES the data analysis steps of quantitative comparison,classification of mobile advertising fraud traffic identification problem in the performance of the test set,in the experimental one,two,for not cleaning the original data and data after cleaning,the good model are LightGBM model.But a better model is the Catboost model for clean and feature engineered data.Therefore,the two best single models in this article are LightGBM and Catboost models.At the same time,the logistic regression model+stacking attempt in experiment 4 also obtained certain benefits,which improved the overall model effect to a certain extent,in which F1 value reached 0.9683.Meanwhile,various models were established for many times and the performance of each feature was compared,and relevant Suggestions were creatively analyzed from the feature perspective.Integrated in full,in this paper,a systemic method of data preprocessing,feature engineering application in the field of mobile advertising fraud,the classification of the good results have been achieved,and at the same time to try a wide variety of machine learning methods,the final fusion model obtained the very good classification effect,it also provides a reference for other classification task,as well as such problems in characteristic Angle against doing a special analysis,and combined with the feature of the model is given the importance of sorting,in characteristic Angle advice is given.Internet data scenarios vary in magnitude or cleanliness,but as long as the actual Internet data is analyzed according to specific problems,the migration of the fraud traffic identification model used in this paper can be extended to other Internet data classification scenarios.
Keywords/Search Tags:Mobile advertising, traffic cheating, data analysis, machine learning, integrated learning
PDF Full Text Request
Related items