Font Size: a A A

Research And Implementation Of Dynamic Pricing Algorithm For Online Ride Sharing Based On Deep Reinforcement Learning

Posted on:2022-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:M J LiuFull Text:PDF
GTID:2518306602494804Subject:Dynamic pricing
Abstract/Summary:PDF Full Text Request
The emergence of online ride-sharing has greatly satisfied people's travel demand for comfortable travel environment and high quality services.However,the imbalance between supply and demand in both time and space of online ride-sharing is a common problem in urban transportation network.The dynamic pricing technology of online ride-sharing is one of the effective measures to balance supply and demand and increase platform's revenue.The current pricing strategy of the platform affects the destination distribution of orders and thus the future geographical distribution of drivers.Therefore,the pricing strategy needs to be forward-looking enough to consider optimizing the current revenue of the platform while taking the future revenue into account.However,most of the existing dynamic pricing algorithms aim to optimize either the platform's current revenue or short-term revenue,lacking of perspective.Aiming at the shortcomings of the existing algorithms,this paper proposes to optimize the long-term revenue of the platform by combining deep reinforcement learning,formulating forward-looking pricing strategies to increase the revenue of the platform.The main research work of this paper is as follows:First,this paper proposes a dynamic pricing method based on centralized control for ridesharing platform to optimize the long-term revenue.In this paper,the markov decision process of the dynamic pricing problem for online ride-sharing is firstly modeled.Optimizing the order price alone will lead to high and unstable price and the uncontrollable geographical distribution of vehicles in the future,thus the scheduling ratio of available vehicles are jointly optimized in this paper.Then,the optimized action is set in a continuous action space to avoid discretization of action value,making the output action more in line with the actual scenarios.In addition,a new reward function is designed for the dynamic pricing markov decision process for online ride-sharing.The reward function includes immediate revenue and order conversion response rate defined in this paper.Compared with the case that using only immediate revenue as the reward function,the new reward function can make the strategy converge faster and the revenue increase higher.Since continuous action space needs full exploration in the learning process,SAC algorithm with excellent exploration ability and suitability for optimizing the continuous action space problem is used in this paper to optimize the dynamic pricing strategy based on centralized control.Second,the dynamic pricing method of online ride-sharing based on centralized control has a high action dimension,which requires a large network model and high exploration cost of the agent.Therefore,this paper further proposes a multi-agent based dynamic pricing method for online ride-sharing.The problem of dynamic pricing for online ride-hailing is modeled as a partially observable markov decision process.Each region in the traffic network uses an agent to optimize the order price and vehicle scheduling ratio from this region to other regions.To enable the agents to form a cooperative relationship,the reward value for each time step is shared among all agents.In order to further enhance the cooperation between agents,this paper adds a KL divergence that can represent the balance between supply and demand next time step of the traffic network to the reward function,optimizing the joint strategy.Finally,the method of parameter sharing is applied to train the strategies of each agent simultaneously.Parameter sharing makes the policies of each agent can be trained with the experience samples of all agents,so that the training is more effective.Finally,this paper conducts comparative experiments on real-world datasets to verify the effectiveness of the two methods proposed in this paper.Compared with the benchmark algorithms,the dynamic pricing algorithm based on centralized control increases the total revenue of the platform by 22.3% and reduces the average price of the order by 10.3%.The designed reward function can make the strategies converge faster and better,and the model is insensitive to the hyper-parameter in the reward function.By analyzing the concrete pricing strategy,it is found that the proposed method can effectively balance the supply and demand of the transportation network.The multi-agent based algorithm increases platform's revenue by 17.9% and reduces the average order price by 9.1%.The multi-agent setting significantly reduces the dimension of the action space,requires smaller networks,and is more suitable for actual traffic scenarios.Experiments show that the multi-agent based algorithm can effectively achieve the collaboration between agents through the sharing of reward values,and the optimization of the joint strategy can further improve model performance.
Keywords/Search Tags:Dynamic pricing, Supply and demand balance, Reinforcement learning, Revenue maximization
PDF Full Text Request
Related items