Font Size: a A A

Real-Time Bidding By Deep Reinforcement Learning In Display Advertising

Posted on:2020-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2428330590961158Subject:Engineering
Abstract/Summary:PDF Full Text Request
Real-time bidding(RTB)is an important mechanism in online display advertising,it allows advertisers to bid on a display ad impression in real time.The most important component of RTB is DSP,which acts on behalf of the advertisers.To place an ad automatically and optimally and maximize advertising revenue,it is critical for DSP to devise a learning algorithm to cleverly bid an ad impression in real-time.To achieve this goal,most previous works consider the bid decision as a static optimization problem of either treating the value of each impression independently or setting a bid price to each segment of ad volume.However,there are usually thousands or even more heterogeneous bidders competing for the same ad opportunities and the advertisers themselves may change campaign settings such as budget and targeted audience,which makes the marketplace highly dynamic and unpredictable.Therefore,this kind of static strategy is difficult to achieve the goal of advertisers in practical application.To address these challenges,this paper proposes a novel bidding strategy called Deep Reinforcement Learning to Bid(DRLB).In DRLB,the bid decision process is formulated as a reinforcement learning problem,where the state space is represented by the auction information and the campaign's real-time parameters,while an action is the bid price to set.Then a modelfree reinforcement learning algorithm called Deep Q Network is used to resolve the optimization problem because of the large-scale situations.The analysis in this paper shows that the immediate reward from environment is misleading under a critical resource constraint.Therefore,this paper innovates a reward function design methodology for the reinforcement learning problems with constraints.Considering the large-scale situation,this paper employ a deep neural network called RewardNet to learn the appropriate reward so that the optimal policy can be learned effectively.In addition,to effectively solve the exploration vs exploitation dilemma,this paper proposes an adaptive exploring policy to make the model converge to the optimal solution faster.Based on iPinYou dataset,different experiments demonstrate the effectiveness of the DRLB model and the proposed two innovations.
Keywords/Search Tags:display advertising, real-time bidding, bidding strategy, deep reinforcement learning
PDF Full Text Request
Related items