| In modern China,increasing traffic congestion is a constraint on the economic and environmental sustainability of many cities.The optimization of traffic flow can improve road capacity and thus reduce traffic congestion.The main method of altering traffic flow is traffic signal control.Therefore,the research and implementation of traffic signal control methods have great practical significance.The revolutionary development of artificial intelligence has brought better solutions to traffic signal control problem.Deep reinforcement learning,which is data-driven,model-free and self-learning,does not need to consider the internal mechanism of traffic system when being used to solve the traffic related problems.It can not only reduce the difficulty of solving the problem,but also adapt to the traffic scenarios much batter.With the emergence and development of multi-agent reinforcement learning,global optimization of regional signal control is gradually replacing local optimization of single intersection and becomes a hot research topic.This thesis deeply investigates deep reinforcement learning and its application in traffic signal control,then proposes and implements a multi-agent deep deterministic policy gradientbased traffic cyclic signal(MADDPG-TCS)control algorithm.The proposed MADDPG-TCS algorithm can effectively and stably control the traffic signals to improve the traffic efficiency.The main works of this thesis are as follows:Firstly,to improve road capacity and relieve traffic congestion,a reinforcement learning model is established and applied to traffic cyclic signal control.The deep deterministic policy gradient(DDPG)algorithm are used to iteratively learn the signal phase duration meanwhile vehicle-based states are employed.This model can be well adapted to the current situation where radar detection and video detection are the main sources of traffic data collection.Secondly,in this thesis,a cooperative action of relative signal differences is defined.The coupling relationship of intersections in traffic is taken into account as well.The regional road network is regarded as an undirected graph.Therefore,information of intersections with road connections is shared only for the current intersection for simplicity.By this approach,the following problems of directly applying MADDPG algorithm to regional traffic cyclic signal control are solved:(i)In the same region,the actions of different agent to control the traffic signals at intersections are not synchronized.(ii)With the intersection numbers going up,the dimensions of data obtained by the agents increases catastrophically.At the same time,the information of other intersections will interfere with the unrelated agents,which results in difficulties in convergence of MADDPG algorithm.Finally,to verify the performance of the proposed MADDPG-TCS algorithm,the traffic signal control for a region which contains three intersections of the same structure is simulated on SUMO software.To effectively improve the training efficiency of the algorithm,this thesis optimizes the training process by ignoring the initial time step data,adjusting the sampling and updating strategy and parallel simulation.In this thesis,the following conclusions are drawn through algorithm implementation and comparative analysis:(i)The MADDPG-TCS algorithm well balances the exploration and utilization of each agent in the training process,which leads to good convergence and stability of the algorithm.(ii)The MADDPG-TCS algorithm makes full use of the information of collaborating intersections.(iii)Compared with the fixed timing method and the Webster method,the MADDPG-TCS algorithm has significant improvements on the index of vehicle queue length and vehicle delay time. |