Traffic light control is a core part of intelligent transportation construction.Efficient traffic light control strategies can not only alleviate traffic congestion and reduce environmental pollution caused by vehicle exhaust but also improve the traffic efficiency of urban roads.Because the traffic condition changes dynamically,the manually set strategy cannot make optimal decisions based on the real-time traffic.Reinforcement learning has developed rapidly in recent years,and such methods can continuously learn for dynamic environments and obtain optimal strategies that adapt to the current traffic.But their shortcomings are also significant.Due to the characteristics of the algorithm,their strategies are always too aggressive,and their data utilization is extremely low.Because of the shortcomings of existing methods,this thesis combines deep reinforcement learning to explore the optimal strategy for traffic light control.The main research work is as follows:In this thesis,we propose a model-free reinforcement learning traffic light control algorithm to optimize the traffic capacity of vehicles at intersections.First,the traffic light control problem is formalized as a Markov Decision Process.Aiming at the problem that the traffic light control strategy based on reinforcement learning is too radical,we combine ensemble learning to solve the overestimation problem of the value function to stabilize the learned strategy,and randomly selects the target network in the ensemble network so that the learned traffic light control strategy will not be too much conservative.In addition,this thesis adopts the UTD ratio to perform multiple rounds of policy updates after each step of agent-environment interaction to improve data utilization.On this basis,when the centralized traffic light control algorithm faces a multi-intersection environment,there are a lot of model parameters and high-dimensional state-action spaces,so the strategy is difficult to converge.This thesis further proposes a multi-agent traffic light control algorithm.The traffic light control problem of multiple intersections is formalized as a Partially Observable Markov Decision Process.One agent is set at one intersection to control the traffic light.It is necessary to cooperate among intersections to learn the globally optimal strategy.This thesis uses a graph attention neural network to observe the relationship between the target intersection and the intersection in the neighborhood and adopts the mean-field theory to enable agents to share parameters.Then we put all the data obtained by all the agents interacting with the intersection environment into the same experience pool so that all the agents can randomly sample from the experience pool for training and improve the learning efficiency of the agents.In this thesis,we carry out comparative experiments with the benchmark algorithm on the simulated data and the real-world data,and the effectiveness of the proposed in this thesis is verified.On the two real data set,the Multi-RELight algorithm proposed in this paper can effectively shorten the waiting vehicles queue length by 30.94% on average compared with the best performing benchmark algorithm.The design of random ensemble reinforcement learning network structure and improved data utilization can make the traffic light control strategy converge faster and more stably.The cooperation of the multi-agent algorithm model is more suitable for the large-scale multi-intersection situation in the real world,and the learning efficiency of agents is higher.The experimental results show that the multi-agent reinforcement learning method can learn the globally optimal traffic light control strategy through the cooperation between the target agent and the agents in the neighborhood. |