Font Size: a A A

Research On The Traveling Salesman Problem Based On Deep Reinforcement Learning

Posted on:2022-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:H J ChenFull Text:PDF
GTID:2518306614458374Subject:Investment
Abstract/Summary:PDF Full Text Request
The traveling salesman problem is designed to choose the smallest paths to traverse all city nodes.It is the classic NP hard problem.If the time is multinomial,it is very hard to get an exact result.So it requires a lot of calculations.Because the the emergence of deep reinforcement learning has been greatly shrinking.The ability of autonomous computing has been greatly improved,and it has become relatively simple to solve the complex traveling salesman problem.Therefore,we combine reinforcement learning and deep learning to solve a series of problems of traveling salesmen.We consider using a Markov Decision process as model.And we can solve the traveling salesman traversal node process.First,the environment of the problem is abstracted into a graph structure,and deep learning is used to process the nodes in the legend.Then the state in the Markov decision process is represented by the partial solution,and the action is represented by the node to be traversed.As a response to the action,the reward value is fed back to the model after the traversal of the node is completed.We train the prediction of the problems through the training model.We must adjust the parameters of the neural network.And then we find the solution to the traveling salesman series of problems.First,we propose a deep reinforcement learning based traveling salesman problem research model RLSTsp(The model of classic traveling salesman problems based on the deep reinforcement learning).In order to better extract the node information in the instance graph,first we use a graph convolutional neural network.So we can obtain the vector representation of all nodes.Then,the strategy function is set to guide the output of the node.And the loss function is calculated by using the reward value obtained in the process of traversing the node.Finally,the loss function is optimized using the reinforcement learning network,and the goal is to find a route worthy of the optimal path.And in the process,we use the beam search algorithm to further get the solution method.The results show that our model performance is generally as good as open source high-performance heuristics and achieves near-optimal results on graphs of different scales.Secondly,we propose the deep reinforcement learning based dynamic traveling salesman problem research model RLSDTsp(The model of dynamic traveling salesman problems based on the deep reinforcement learning).Based on the original network,the time factor is integrated.In view of the dynamic nature of the network,we use a graph convolutional neural network to aggregate dynamic node information and node neighborhoods to obtain node representations at each moment.Afterwards,we use the recurrent neural to study.So we can get the timing information.Finally,the reward value is obtained by combining the node representation vector and the policy function.The reinforcement learning model can help getting the result.When the RLSDTsp model outputs the node sequence,it only needs to input the dynamic graph information and select the type of traveling salesman problem to be processed,and then several optimal node sequences for the corresponding problem can be determined.The effectiveness of learning heuristics is found by comparing the RLSDTsp model with other contrasting models,suggesting that the framework is a very promising tool for the traveling salesman problem at the dynamic graph scale.Finally,we propose a distributed reinforcement learning model DRLSDTsp(The model of dynamic traveling salesman problems based on the distributed reinforcement learning).First,we simply perform one-dimensional convolution on nodes to obtain representation vectors for all nodes,and generate a single embedding vector for the entire graph.Then,in order to get the different features of the input sequence,a multi-head attention mechanism is introduced to fit the features of the nodes.A policy solution is obtained by processing vector representations and unsynchronized outputs.Finally,the distributed reinforcement learning network uses n different threads to learn the node's policy function.Explore the environment and accumulate the action information of the tour into the experience cache.Different training networks extract important data from the cache and update the policy asynchronously,so that the model can more quickly estimate the possibility of each node being selected as the optimal solution.Through extensive experimental evaluations,we demonstrate that the framework can better learn the dynamic traveling salesman problem and achieve good performance on graphs with different numbers of nodes.
Keywords/Search Tags:deep reinforcement learning, traveling salesman problem, graph neural network, attention mechanism
PDF Full Text Request
Related items