| The high number of mobile devices has brought the successful development of mobile Internet.The blossoming media on the Internet has inspired a new business model in the online advertising market-real-time bidding advertising.As the mainstream transaction paradigm of online advertising,RTB can significantly improve the efficiency and transparency of advertising market transactions and enhance the overall revenue of users,media,and advertisers in the advertising ecosystem.At the same time,the development of artificial intelligence has elevated the technical research behind online advertising to a new level,making it the focus of research in information systems and data mining in recent years.This thesis focuses on the bidding strategy of a single advertiser within a demandside platform in RTB.It proposes the optimization method of bidding strategy to improve advertisers’ advertising revenue using reinforcement learning tools.The research results of this thesis are:(1)This thesis first investigates several representative bidding strategies and analyzes in detail the modeling principles of their Markov decision processes and the design concepts of their core elements.This thesis derives a unified bidding function for reinforcement learning-based bidding strategies,with a base bidding function and a bidding factor obtained by reinforcement learning.This function is compatible with model-free reinforcement learning methods.In addition,this thesis empirically discusses the impact of the design of the state,action,and reward functions in the Markov decision process on the bidding performance of the model-free reinforcement learning-based bidding strategy,which provides practical guidance for developing intelligent bidding systems.Finally,the thesis presents suggestions for designing reinforcement learningbased bidding strategies.(2)This thesis proposes a novel budget-constrained bidding strategy optimization method using model-free parameterized action space reinforcement learning.The method trains an agent that uses a hybrid action space to generate bidding factors.The agent strives to adapt to the highly dynamic environment of RTB by fusing actions from discrete and continuous action spaces so that the bidding factor is always close to the optimal value.To characterize the state of the environment more accurately,the method also uses a cascading state representation that simultaneously represents information about the current and historical RTB environment.In addition,to guide the agent to learn the optimal strategy,the method designs a comparison reward function,which introduces weighting factors on returns and costs.It adaptively selects the weighting factors based on the auction results to achieve a flexible reward and punishment mechanism.In this thesis,experiments are conducted on a real dataset and the experimental results show the superior bidding performance of the method.(3)This thesis proposes a bidding strategy based on ad quality thresholds.The method introduces an ad quality threshold for ad impression screening.Only when the predicted click-through rate of an ad impression is higher than the ad quality threshold does the bidding agent participate in its bidding.The initial threshold for the ad delivery period is learned from historical data using statistical methods based on the ad quality priority principle.The real-time ad quality thresholds are subsequently adjusted at each time step using a threshold adjustment model constructed based on reinforcement learning.The experimental results show that the method can obtain better bidding performance than the baseline model. |