Font Size: a A A

A Comparative Study On Related Models Of Advertising Click Rate Forecasting

Posted on:2021-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:J J GuoFull Text:PDF
GTID:2518306107479884Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the development of the Internet and the popularization of mobile devices,traditional advertising no longer adapts to the development of the times.Online advertising is becoming more and more,and computational advertising is emerging.Click-through rate prediction is one of the key research directions of computational advertising.The considerable benefits brought by click-through rate prediction not only bring huge benefits to advertising platforms and advertisers,but also bring a good experience to most users,so click-through rate prediction is very important.This article uses machine learning models to predict ad click-through rates.This article mainly analyzes the Avazu dataset.The training set of this data set contains 10 days of ad click data,a total of more than 40 million records,and a total of 23 features excluding the id column.One million pieces of data are selected for analysis,one-hot encoding for discrete features,and normalization for continuous ones.First of all,in the data analysis part,the relationship between clicks and time,and the relationship between clicks and advertising position are studied:The highest point of clicks is approximately between 13:00 and 14:00 PM,and the lowest point of clicks is around 00:00 PM;The data ratio of position 0 and position 1 is higher,and the impressions and clicks are the highest,but their click-through rate is not the highest.Through the analysis of each feature,the following rules were found:The number of impressions is proportional to the number of clicks,but high impressions and clicks do not necessarily have a high click-through rate.The data set has a positive and negative ratio of 2:8,which is unbalanced data.For this problem,undersampling or oversampling is often used.In this paper,the class?weight parameter is used to change the proportion of categories so that the result will not be biased.Then,the single-model logistic regression,decision tree and integrated model random forest and GBDT are used to predict click-through rate.The prediction effect of the decision tree in the single model is better,the prediction effect of the random forest in the integrated model is better.The prediction effect of the forest is the best in the four models.Because GBDT is not suitable for data with high-dimensional sparse features,the prediction result it is the worst.It is often combined with logistic regression to predict CTR.
Keywords/Search Tags:Advertising click-through rate, display advertising, integrated learning, feature engineering
PDF Full Text Request
Related items