| In recent years,deep reinforcement learning methods have been applied to solve many complex real-world problems and as an emerging technology in traffic control.In order to find the best traffic signal control strategy,reinforcement learning learns how to make the best decisions in different states by continuously interacting with the traffic environment.This method can be deployed in traffic signal control to learn optimal traffic signal control strategies to reduce traffic congestion.However,there are still many constraints in applying reinforcement learning to actual intersection signal control,such as traffic detection technology,communication delay,scalability,stability,and suitable deep reinforcement learning(DRL)algorithms.Traffic signal control for DRL is still challenging.This thesis studies the application of deep reinforcement learning method in traffic signal control.The research work includes:First,in a single intersection scene,we propose a signal control method based on the Deep Recurrent Q-network(DRQN)algorithm.It is based on the deep Q-learning algorithm,combined with the recurrent neural network to improve it,so that it can also learn effectively in partially observable environments.And we consider the practical constraints to effectively design the state space,action space and reward function,making it more suitable for the traffic light control of single intersection in the real world.This study uses SUMO simulation software to conduct multiple sets of comparative experiments in traffic scenarios with different traffic flow distributions.The results show that the proposed agent can adapt to a variety of traffic scenarios,and can outperform static traffic signal control systems in low,medium,and high-density traffic scenarios,and the overall traffic efficiency is improved by more than 50%.Second,in the context of arterial traffic signal control,we extend the single-agent Proximal Policy Optimization(PPO)algorithm to the multi-agent domain.An adaptive arterial signal coordination control method based on Parameter Sharing Proximal Policy Optimization(PS-PPO)is proposed to reduce arterial traffic delays.Most existing traffic signal control methods based on Multi-Agent Reinforcement Learning(MARL)rely on unrealistic assumptions to improve their performance in complex and dynamic traffic scenarios.In order to reduce the need for these assumptions and enhance the practicality of the algorithm,this study applies a parameter sharing training protocol to alleviate the slow convergence due to non-stationarity and reduce the computational overhead with high scalability and high stability.This study also designs a new action space that uses a lead-lag phase scheme to improve the flexibility of coordination of multiple signals,and proposes a reward function that can effectively avoid overflow.Extensive simulation experimental results show that,compared with traditional methods and state-of-the-art reinforcement learning methods,PS-PPO performs more stably in synthetic arterial corridors and realworld arterial corridors,with the highest reward at the end of training and the least time required for convergence.There are enough advantages in computing performance,and it can effectively prevent traffic overflow.Therefore,the proposed algorithm can alleviate urban arterial traffic congestion more effectively than traditional methods and existing methods based on MARL. |