Font Size: a A A

Research And Simulation Application Of Deep Reinforcement Learning For Vehicle Routing Problem With Soft-time Windows

Posted on:2024-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LongFull Text:PDF
GTID:2542307124459924Subject:Engineering
Abstract/Summary:PDF Full Text Request
In urban traffic planning,Vehicle Routing Problem(VRP)can help the government optimize public transport routes,reduce traffic congestion and improve urban traffic efficiency.In other fields,such as e-commerce and medical distribution,the VRP problem also has important application value.The Vehicle Routing Problem with Soft Time Windows(VRPSTW)is a very complex combinatorial optimization problem,and the traditional heuristic algorithm often needs a lot of computational resources and time to find the optimal solution.Deep reinforcement learning can improve solution efficiency by learning more efficient strategies and rules.It can be solved in real-time environment and handle large-scale data,so it can find better solutions faster.In addition,deep reinforcement learning can deal with problems adaptively and can handle different problem instances without the need to redesign the algorithm,so it can be more flexible to deal with different VRP problems.Therefore,it is very reasonable and effective to select deep reinforcement learning to solve VRPSTW problem.This paper conducts an in-depth study on VRPSTW problem,constructs two models based on deep reinforcement learning,and verifies the effectiveness of the algorithm experimentally.The main work are as follows:(1)An AM model based on improved attention mechanism is proposed to solve VRPSTW problem.Specifically,our AAM model is applied in practical logistics distribution problems.In solving VRPSTW task with more realistic significance,the AAM model is trained by Actor-Critic based reinforcement learning algorithm.The encoder generates information about all the input nodes and the attention mechanism is used in the decoder to generate the probability distribution of the next input to select the location of the next customer.Encoders include Multi-Head Attention layer(MHA)and fully connected feedforward sublayer(FF).The decoder represents the context information decoded by the current node according to a context node information.The context node is mainly constructed through the embedding information of the encoder and the node output information of the decoder before the time step t.(2)A AAM model based on dynamic attention is proposed to solve the VRPSTW problem.In order to improve the solving speed of AAM model,dynamic attention mechanism was used to optimize AAM model,namely AAM-D model.In this model,node features are dynamically updated,which can be updated according to the decisions of the model in different construction steps,so as to better reflect the state of the instance.In order to extract the input sequence information better,the method of node updating in encoder is improved based on AAM model.In the traditional attention mechanism,the characteristics of the input instance are fixed by the encoder and do not change with the decision of the model.In the AAM-D model,the feature representation of the input instance is dynamically updated,which can be updated according to the decision of the model in different construction steps,so as to better reflect the state of the instance and capture the structural characteristics of the instance.(3)A VRPSTW experimental platform based on deep reinforcement learning is designed.The AAM-D model constructed in Chapter 4 is applied,the trained model is integrated into the application program,and a simulation experiment platform is developed to solve the VRPSTW problem.In this system,node data generation and export,node position visualization,optimal route scheme generation,result visualization analysis and export and other functions are realized,which provides a more convenient visualization experiment platform for later researchers.
Keywords/Search Tags:Deep Reinforcement Learning, Vehicle Routing Problem with Soft-Time Windows, Combinatorial Optimization Problems, Actor-Critic Algorithm, Policy Gradient Algorithm
PDF Full Text Request
Related items