| Traffic lights play a crucial role in controlling the passage of road vehicles.At present,the traffic lights on urban roads mostly adopt fixed timing and fixed phase transformation control strategies,which are difficult to meet different traffic flow situations.Designing a control scheme that can adjust traffic light changes in real-time based on traffic flow at intersections has become one of the research hotspots in the field of intelligent transportation.However,the traffic flow at urban intersections has dynamic variability,making it difficult to directly study it.In order to design a suitable dynamic control scheme for traffic lights,deep reinforcement learning technology is introduced to abstract the intersection traffic light control problem into a reinforcement learning model.Different deep reinforcement learning algorithms are adopted for single intersections and road networks to achieve coordinated control of traffic lights.The main research content of this article is as follows:(1)A traffic light control model based on D3 QN algorithm is proposed to address the issues of overestimation and slow convergence speed in deep reinforcement learning algorithms for single intersection traffic light control;Considering the impact of vehicles in different states on traffic light control,a vehicle information matrix is proposed as input to the neural network;A new reward function has been defined to more accurately reflect the quality of actions taken by intelligent agents.Experiments were conducted on the algorithm and improvement points under different traffic flows.The experimental results show that the D3 QN algorithm and improvement points proposed in this paper can reduce the average queue length at intersections,verifying its effectiveness.(2)To address the complexity of interaction between multiple intersections in the road network,the single agent reinforcement learning algorithm is extended to multi-agent systems.The MADDPG algorithm adopts centralized training and distributed execution.The centralized training of the Critical network enables each agent to consider other agents in the decision-making process,and the step-by-step execution enables each agent to make decisions independently.To address the problem of slow training of algorithm models,a parallel priority experience replay mechanism(PPER)is introduced to improve the efficiency of training sample utilization.Experiments were conducted on the control of traffic lights in the four intersection road network.The experimental results show that the proposed PPER MADDPG algorithm can reduce the overall queue length of the road network,verifying its effectiveness.(3)We have built a traffic simulation system based on deep reinforcement learning and implemented the relevant algorithms in this article using Python programming language.The system provides a visual interface for traffic light control and can store data generated during the training process,providing a simulation experimental platform for studying traffic light control problems. |