With the transformation of China’s socio-economic development,the transportation industry has entered a stage of high-quality growth.Demand response service is an indispensable part of transportation service,such as express delivery service and demand responsive bus.Solving multi-vehicle routing problem with soft time windows efficiently is one of the core challenges of elaborate transportation service.The solution quality of this problem has important implications on service level,energy emission and transportation efficiency.However,the exact algorithm and heuristic algorithm require a lot of computing time to solve this problem.With the increasing transportation demand,traditional methods face a dilemma between computational time and solution quality.To address this issue,this paper proposes a machine learning framework that extracts spatiotemporal information,conducts multi-agent reinforcement learning modeling,and decodes the route.After long-term offline training,this algorithm can be deployed online and generate high quality solutions for new instances within seconds.The main contents are summarized as follows:(1)A multi-agent attention model based on reinforcement learning is proposed to solve the multi-vehicle routing problem with soft time windows.This model encodes the spatio-temporal information and generates vehicle routings iteratively from decoder framework according to fixed-order decision method.Furthermore,a free-match decision method is designed for the multi-depot problem.This algorithm can generate solution in seconds on different-scale cases,and outperforms Google OR-Tools and classic baselines by 2.4%-13.8%.(2)A joint embedded learning method is proposed to solve the multi-vehicle pickup and delivery problem with soft time windows.In this method,joint embedding in encoder is designed to aggregate spatio-temporal information,and multi-vehicle attention network with mask procedure in decoder is designed to generate the route.The result is improved by 2.4-8.0% compared with benchmarks,and the solution can be generated in seconds.A routing generation method based on parallel computing is proposed to optimize the dropoff candidate decision in the demand-responsive bus service,which can generate the route within seconds and save about 30% total travel cost.(3)A two-stage learning-based method is proposed to solve the relatively large scale pickup and delivery problem with soft time windows.In the first stage,the graph convolution network is utilized to assign customers to several vehicles according to their coordinates and time window information,while the route is generated through the encoder and decoder framework in the second stage.The result of the algorithm is improved by 2.1%-6.3% compared with the classic methods.Overall,this paper proposes a series of reinforcement learning methods that can solve the routing problem immediately through off-line training.The results show that this method is superior to a series of classic methods within a short computing time.In addition,the experiments further verify the generalization ability of the model by considering various customer numbers,vehicle capacity,driving speed and other factors. |