Font Size: a A A

Research And Optimization Of Deep Reinforcement Learning Methods For Solving TSP Problem

Posted on:2024-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q WangFull Text:PDF
GTID:2568307124960059Subject:Electronic information
Abstract/Summary:PDF Full Text Request
The traveling salesman problem is a well-known NP-hard problem that has gained much attention in the field of combinatorial optimization.Its practical applications in transportation,production scheduling,and integrated circuit design make it a crucial research topic.However,traditional methods used to solve the problem are plagued by their low solution quality and insufficient computing resources.In recent years,deep reinforcement learning has been extensively utilized in various fields such as automatic driving,unmanned aerial vehicle,and gaming,demonstrating its remarkable decisionmaking and learning abilities.Therefore,to address the shortcomings of slow solving speed and low solution quality associated with traditional algorithms and taking into account the specific characteristics of the traveling salesman problem,this thesis proposes a novel approach to solving the problem by utilizing deep reinforcement learning and optimizing it.(1)In order to solve the traveling salesman problem,this thesis proposes a Hybrid Graph Pointer Network(HGPN)based on graph pointer network.The HGPN-RRB deep reinforcement learning model combining Hybrid Graph Pointer Network(HGPN)and REINFORCE with Rollout Baseline(RRB)algorithm is constructed.The encoder of the fusion Transformer is combined with the graph embedding to generate multiple embeddings for the feature context to better capture the information in the graph structure.An additional decoder is added to form the multi-decoder structure,which improves the decision-making process of the agent throughout the learning phase.For the whole solution process,firstly,in the encoding process,multi-head attention,feedforward neural network and graph neural network are used to extract features of the input sequence data.Secondly,in the decoding process,the influence of aggregators with different calculation methods on the convergence of the model is explored,and the sum aggregator is selected to speed up the convergence of the model.Thirdly,in the training process,the RRB algorithm is used to train the HGPN,so that the model is free from the dependence on labeled data when training network parameters.Finally,in this thesis,the HGPN-RRB model is used to solve different scales of traveling salesman problem.The experimental results show that the HGPN-RRB model has achieved better solution quality and faster solution speed than traditional algorithms and some deep reinforcement learning models.(2)In order to improve the solution speed and generalization ability of the HGPNRRB model,this thesis proposes the RRBLC algorithm based on the RRB algorithm,which integrates the local search algorithm and curriculum learning,and combines it with HGPN to construct the HGPN-RRBLC deep reinforcement learning model.Local search algorithm is introduced based on REINFORCE with Rollout Baseline algorithm.On the one hand,combining random OPT algorithm and local insertion algorithm,a joint local search technique was proposed to improve the computational efficiency and prevent the algorithm from falling into local optimum.On this basis,combined with Rollout Baseline,a new simple baseline was constructed to reduce the variance generated during training and construct a more efficient training algorithm.On the other hand,curriculum learning is introduced into the training algorithm to adjust the difficulty of instances with the training process to improve the generalization of the model.The experimental results show that the HGPN-RRBLC model outperforms the comparison algorithms in solving traveling salesman problem in terms of solving speed and solution quality,and achieves good generalization.At the same time,this thesis verifies the HGPN-RRBLC model on real world data sets,and the experimental results show that the HGPN-RRBLC model can be applied to solve practical problems.(3)In order to improve the stability of the model,this thesis constructs multiple starting points to meet the translation invariance,uses different point sequences to represent feasible solutions,and places different initial starting points in the encoding and decoding stages,so that each node can be selected by the policy network.In the inference stage,the four-distance expansion inference method is introduced to optimize the sequence vector output after model training.The experimental results show that the construction of multiple starting points and the four-distance expansion reasoning method can effectively improve the quality of the solution.
Keywords/Search Tags:Traveling Salesman Problem, Deep Reinforcement Learning, Graph Pointer Network, Transformer, REINFORCE
PDF Full Text Request
Related items