| With the increase of vehicles in recent years,China’s urban traffic congestion rate has continued to rise,and the most practical way to alleviate traffic pressure is to establish a more efficient and practical traffic signal control system.According to the scale of traffic control,different traffic signal systems can be divided into point control,line control and regional traffic control.Point control generally focuses on improving the signal control capability of a single intersection by optimizing algorithms.Often,the coordination between intersections is ignored,and the signal control strategy cannot achieve a joint optimum.Regional traffic control usually divides the road network into a combination of multiple trunk road networks for research and discussion.Therefore,the research on the signal control of the main road network is more realistic.This paper mainly focuses on the highly scalable signal coordinated control method under the main road network.The main work of the paper is as follows:(1)The NSCP(Nonstationary Converging Policies)algorithm is optimized and improved and applied to the signal control of the trunk line network.The main body uses the DQN method to replace the Q learning framework in the original algorithm,and uses the convolutional neural network to enhance the feature extraction and storage capabilities of the algorithm.A more reasonable definition is made for the states and actions in the algorithm.The road network segmentation method was changed to achieve the purpose of solving the optimal joint action strategy in the optimization algorithm.Finally,road network environments with different numbers and distances of intersections were established,and the algorithm was tested with different traffic intensities to prove the accuracy and effectiveness of the improved NSCP algorithm.(2)Optimize and improve the MAX-PLUS algorithm and apply it to the signal control of the trunk line network,adopting the method that uses the cooperative graph to transmit signals between intersections and use this method to select behaviors.While using DQN and convolutional neural network to optimize the algorithm structure,the control module is improved by using deep double Q network and priority sampling strategy,which reduces the risk of high estimation of the algorithm,alleviates insufficient training of samples,and replaces the suboptimal solution with the most optimal solution.Excellent solution and other issues.Aiming at the shortcomings of the past reward statistics methods,two new reward statistics methods were proposed,which added influencing factors such as lanes and distances to improve the effectiveness of algorithm control.Road network environments with different numbers of intersections and distances were established,and the algorithm was tested with different traffic intensities to prove the accuracy and effectiveness of the improved MAX-PLUS algorithm,and a comparative analysis of experimental data proved that The improved NSCP algorithm has stronger stability,and the improved MAX-PLUS algorithm has better effectiveness. |