Font Size: a A A

Study On Representation Learning For Advertisement Text And CTR Prediction Approach

Posted on:2020-02-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z L JiangFull Text:PDF
GTID:1488306497966419Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Online advertising can spread the information of goods and services to every corner of the world at a lower cost,gradually forming an online advertising market characterized by computing and technology-driven.As a key part of the advertising system,click-through rate prediction modeling which needs to deal with the complex information of advertisement,context and users,and identify latent patterns and rules,is a rather difficult problem in the industry,and has aroused extensive interest in industry and academia.Study on the advertising click-through rate prediction has important theoretical and practical significance for reducing the cost of advertising putting,improving user experience and improving the financial returns of media websize.In recent years,study on click-through rate prediction has made rapid development at home and abroad,but there are still some problems that need to be solved.Firstly,advertisement text data has the characteristics of short length,compact structure and strong correlation between words,so it is a difficult point to represent them effectively.Secondly,the existing novel click-through rate prediction models mostly adopt deep learning or model ensemble technology,which greatly improves the accuracy of click-through rate prediction.However,there are still problems that the accuracy of click-through rate prediction is affected or difficult to effectively model when faced with noisy or multi-field categorical data and text data.In order to solve the above problems,this dissertation carries out relevant study from three aspects: the representation learning for advertisement text data,the click-through rate prediction method for noise data and click-through rate prediction method for multi-field categorical data.The main works are as follows:(1)Aiming at the problem that the existing methods do not fully mine the latent semantic information in advertisement text data,a BTM(Biterm Topic Model)model with the excellent characteristic that can capture the topic of short text efficiently,is used to model the topic features of advertisement texts.The number of topics has an important influence on topic vectors.Since BTM can not automatically get a reasonable number of topics,an automatic optimization method of topic number based on density clustering is designed to obtain excellent topic feature vectors.(2)Aiming at the defect that traditional word semantic representation model only considers local context relations,a word semantic representation modeling method fusing topic features is proposed.This method co-models global context and local context together,which not only preserves the word order information of window context,but also enhances the influence of topic information on the word.Through the multi-angle mining of word semantics,the rich semantic word representation is obtained.On this basis,two sentence-level short text representation methods are designed to obtain embedding representation of advertisement text data rich in semantic information,which lays a foundation for subsequent click-through rate prediction modeling.(3)In order to address the problem of advertising click-through rate prediction in scenarios containing noise and text data,a click-through rate prediction method oriented to noisy data is proposed by using the advantage of fuzzy number theory in dealing with noise and uncertainty.In this method,fuzzy parameters management neural network are introduced,which enlarges the search space,expands the relationship between the input data and its higher-order features,and describes the uncertainty in the data.The fuzzy deep neural network constructed by stacking multiple fuzzy sub-models can deal with the complex uncertainty relationship between advertisement data and obtain more distinguishable high-order abstract features,which can improve the accuracy of click-through rate prediction to a certain extent in scenarios containing noisy and text data.(4)In order to solve the problem of advertising click-through prediction in scenarios with large amount of categorical field data and text data,with the help of the ability of embedding mapping network and factorization model to efficiently deal with sparse vectors and the ability of tree model to deal with continuous vectors,an advertising click-through prediction method oriented to multi-field categorical data was proposed.On the one hand,Embeding mapping network and factorization machine are used to model advertisement categorical data,and then stacked denoising auto encoder is used to extract higher-order features.On the other hand,gradient boosting decision tree is used to extract high-order features from advertisement text data.Finally,two parts of high-order features are concatenated together to predict click-through rate.It can efficiently solve the modeling task of scenarios with multi-field categorical data and text data,and obtain better click-through rate prediction results.
Keywords/Search Tags:CTR, Word Semantic Representation, Triangular Fuzzy Number, SDAE, GBDT
PDF Full Text Request
Related items