| With the continuous development of the social economy,the travel demand of urban residents continues to increase.At the same time,car ownership continues to increase.By the end of 2021,the number of motor vehicles in our country has reached 393 million(the number of automobiles is over 300 million)and the number of drivers has reached 479 million.More than thirty million motor vehicles are newly registered each year and more than 20 million drivers are newly licensed.On the other hand,the car ownership in serval of megalopolis accounts for more than 10% of the national car ownership,of which the car ownership in Beijing exceeds 6 million.However,in the face of such a huge number of cars,the construction and updating speed of urban transportation network infrastructure cannot match it,which results in increasingly serious urban traffic congestion.In this case,traffic congestion control technology is particularly important.At current stage,the vehicles in the traffic flow are mainly human-driven vehicles,which indicates that the vehicle behavior mainly depends on the decision-making of the driver and the traffic information cannot be transmitted to the driver in real time.It makes the performance of vehicle-level micro-traffic congestion control methods is very limited.Therefore,the current solution to the problem of traffic congestion control still has to start from the perspective of macroscopic traffic flow control.The current traffic signal control equipment is located at all over the intersection of the urban traffic network,which makes the traffic signal control become one of the most important traffic congestion control methods.Traffic signal control is an important and challenging real-world problem whose main goal is to minimize the transit time of all vehicles at intersections by coordinating the movement of various vehicles.The traditional traffic signal control proposes some corresponding rules based on some mathematical assumptions to control the intersection statically.Even though we can collect a lot of traffic data,have powerful computing power and many advanced intelligent traffic technologies,the traffic signal technology at intersections is still at a relatively primitive stage and it cannot effectively combine these technologies to improve the performance of traffic signal control.The traffic signal control technology still has the following shortcomings: 1)The traffic state in the real world is complex and changeable and it is difficult for the mathematical model to fully describe or fully consider these complex factors which results in deviations between the actual control and the real situation;2)most of the traffic signal control schemes stay on the island control stage.The lack of effective cooperation among multiple intersections leads to poor global control effect.The traffic signal control based on reinforcement learning can dynamically control the traffic flow and learn the real traffic state from the traffic flow data collected in real-time.Thus,it can avoid the possible deviation of the mathematical model-based method in the real environment.There can be information transfer and joint learning between learning agents,which can make them effectively cooperate.Therefore,reinforcement learning methods have great potential in large-scale urban traffic signal control.However,the current reinforcement learning also has the problem of credit assignment among multiple agents and the problem of low data efficiency in the learning process.This thesis mainly studies the multi-agent reinforcement learning algorithm which is suitable for traffic signal control.Aiming to solve the problem of credit assignment among multiple agents,this thesis innovatively introduces mean field theory and a credit assignment method based on entropy regularization in the temporal difference reinforcement learning framework to solve the dimensional disaster of large-scale traffic signal control and balance the learning process of each agent which brings agents consistency to reach a better performance.In addition,online reinforcement learning has the problem of low data efficiency because the algorithm needs to interact with the environment to obtain a large number of sample trajectories for training so that the time overhead is too large.It is not conducive to its application and model migration in large-scale traffic signal control.To solve this problem,this thesis innovatively introduces a meta-learning method that combines with a knowledge embedding model to assist the decision-making of reinforcement learning to improve the data efficiency of reinforcement learning.Compared with the traditional traffic signal control method,the method proposed in this thesis has obvious advantages in performance such as average traffic delay,average queuing length of vehicles at intersections,etc.Compared with other multi-agent reinforcement learning baselines,our method has a better performance in convergence speed and data efficiency. |