| fc-cvrp is one of variants of capacitated vehicle routing problem.there are numbers of homogeneous vehicles and customer points with a single depot in fc-cvrp.The vehicles all start from depot to visit customers and all customers are available and only visited once.After all of customers are visited,the vehicles return to depot which means that the distribution service is finished.Different from the traditional vehicle routing problem minimizing the path length,this problem aims at minimizing fuel consumption.Moreover,the relationship between fuel consumption and distance,vehicle current load is nonlinear.Therefore,when selecting the next access point,the driver needs to weigh the distance from the current point to the next point and the current load of the vehicle to minimize the total fuel consumption of the final path.Because the objective function of the problem is nonlinear,the solution space of the problem is large and the solution is complicated.In addition,the mainstream algorithms--exact algorithm and meta-heuristic algorithm,are difficult to use the prior knowledge of solving the problem instance to solve the new problem instance.In recent years,the performance of deep learning algorithm in operational research optimization problem is increasingly excellent,fast and high quality solution.In order to solve the above difficulties,this paper uses deep reinforcement learning to solve fv-cvrp.In the first place,fc-cvrp is summarized as a serial-to-sequence problem.Secondly,we build the corresponding Markov model and demonstrate the Markov nature of the problem,and propose a deep reinforcement learning algorithm--Route Planner.According to the characteristics of the problem,a strategic network framework is built based on Transformer model,and encoders and decoders in the framework are designed and their hyperparameters are optimized.In this paper,negative rewards are proposed as the basis of training,and the loss function of the Route Planner algorithm is designed according to the strategy gradient operator with baseline.The goal of the training is to obtain a strategy that considers the distance between two points and the current load of the vehicle,so as to reasonably select the access nodes.Finally,the Route Planner achieves a higher solution quality in a short time.We use the instance generation method in the literature,and each of the three different sizes of problem instances generates one hundred test problem instances.The baseline algorithms used in this thesis included precise algorithm based on Cplex solver,the heuristic algorithm Saving,BI and meta-heuristic algorithm SA in literature.The results show that the Route Planner algorithm achieves better results in a short time(1s),and performs well on the test data set’s solution mean,win rate and solution time.Through algorithm comparison,this paper verifies the effectiveness and superiority of the Route Planner algorithm in high response scenarios.Moreover,the Route Planner algorithm is visualized and a large number of data experiments are carried out.The visualization of the training curve verifies the convergence of Route Planner training.Visualization of solution paths explores the characteristics and tendencies of constructing solutions.The data experiment verifies the transfer learning ability of Route Planner,and then helps to converge the training and test the generalization performance of Route Planner.It is confirmed that Route Planner can get better results by sampling randomly for longer periods of time.Wilkerson’s symbolic rank test verifies the difference between the Route Planner algorithm and the better-performing baseline algorithm. |