Font Size: a A A

Research On Adaptive Matching Timing Strategy In Online Ride-hailing Based On Reinforcement Learning

Posted on:2023-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y P DengFull Text:PDF
GTID:2558307118496414Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Online ride-hailing has become an important transportation way.There exist a lot of online ride-hailing platforms,such as Didi,Uber and so on.How to improve the total revenue of the platform is a key issue.In the order matching process of online ride-hailing,the drivers and the passengers arrive dynamically.The number of drivers and passengers are changing over time.The more passengers and drivers to be matched,the more likely it is to find high-revenue matching combinations.If the platform adopts a uniform matching strategy,it ignores the possibility that delaying matching may bring in higher revenue.In addition,due to the different matching information in different regions,which including the numbers of passengers and drivers,locations,if the entire region adopts a uniform matching interval,the difference between multiple regions will be ignored,and the total revenue of the platform will also be reduced.In order to find matching time to improve the total revenue of the platform,we design an adaptive matching timing strategy in online ride-hailing.First of all,the order matching process of the online ride-hailing is modeled as a Markov Decision Process(MDP)model.An algorithm based on reinforcement learning is designed where it adjusts the matching time according to the change of environment.Secondly,considering the different matching information in different sub-regions,this thesis divides the entire region into multiple non-overlapping small regions,and makes different matching decision for multiple regions based on multi-agent reinforcement learning,where each region independently determines the matching time.The main works in this thesis are as follows:(1)We first introduce the basic settings of online ride-hailing system,including the process of order matching,the interaction of drivers,passengers and platform involved in order matching.Meanwhile,we symbolize the drivers and the orders,and give the equation to calculate the platform’s revenue.(2)In order to find the matching time to get higher revenue of the platform,we define the dynamic matching decision problem and model it as MDP model.Then we design a Dynamic Matching Decision Process(DMDP)algorithm to solve this problem based on reinforcement learning.Finally,we run experiment to verify the effectiveness of the DMDP algorithm,which compared with three algorithms named Restricted Q-Learning(RQL),GREEDY and UNIFORM in terms of the total revenue of the platform,the order response rate,the pick-up distance and the average additional distance per order.It is found that the DMDP algorithm performs best with respect to the total revenue of the platform,which is higher than the RQL algorithm2.55%,higher than the GREEDY algorithm and the UNIFORM algorithm 15.13%and 20.53% respectively.The experimental results show that DMDP algorithm also have good performance in other three metrics.(3)Different regions have different matching information.In order to enable these regions to be matched asynchronously and get higher revenue,we divide the entire region into multiple non-overlapping sub-regions according to the richness of matching information.Multiple regions cooperate with each other and make decision independently.We also model this problem as MDP model and propose the Multi-Regional Differentiated Matching Decision Process(MRDMDP)algorithm to solve this problem based on multi-agent reinforcement learning.Dividing the entire region may cause idle vehicles in some small regions,reducing the utilization rate of vehicles.We further propose Repositioning(REPOS)algorithm,which is worked with other multi-regional algorithms to reposition idle vehicles.To verify the effectiveness of DMDP algorithm,we conduct experiment to compare it with other algorithms.It is found that the MRDMDP algorithm has better performance in the total revenue of the platform than other multi-regional algorithms,named Multi-Regional RQL,Multi-Regional GREEDY and Multi-Regional UNIFORM,the values are 4.33%,19.63% and 31.15% respectively.Experimental results show that the REPOS algorithm worked with multi-regional algorithm can make use of idle vehicles in multiple regions better,and the performance of these multi-regional algorithms will be further improved,MRDMDP_REPOS is the best among these algorithms in the total revenue of the platform,which is higher than other algorithms3.41%,11.91% and 22.69% respectively.In this thesis,we analyze the strategy of matching time in online ride-hailing.An algorithm based on reinforcement learning is designed to solve the dynamic matching decision problem.This algorithm adjusts matching time according to the changing environment,and improve the total revenue of the platform.To explore the effect of different matching time in multiple regions whose matching information are different,we design an algorithm based on multi-agent reinforcement learning to solve multi-regional differentiated matching decision problem.This algorithm makes matching decision for these regions asynchronously,improving the total revenue of the platform.This thesis can provide some insights for online ride-hailing platform to design the matching time strategy.
Keywords/Search Tags:Online Ride-Hailing, Adaptive Matching, Reinforcement Learning, Reposition
PDF Full Text Request
Related items