| As the number of cars increases,traffic congestion has become a public problem in the development of cities around the world.Road construction can alleviate traffic congestion to a certain extent,but it will be constrained by factors such as cost,land and time.Establishing an intelligent traffic signal control system is one of the most cost-effective ways to solve this problem.Deep reinforcement learning is applied to the single intersection signal control to improve traffic condition in this thesis.The main work of the thesis is as follows:(1)Q-learning and shallow neural network are combined and applied to single intersection signal control.The traffic state is defined according to the number of queued vehicles.The shallow neural network is used to fit the Q-function,and the difference between the two reward definition methods is discussed.The experimental results show that the signal control method based on Q-learning with shallow neural network is superior to two classic signal control methods including timing control and longest-queue-first control.(2)A single intersection signal control method based on deep Q-learning is studied.The traffic state is redefined according to the location of the vehicles.The deep convolutional neural network replaces the previous shallow neural network.Experience replay is introduced to improve the stability of the algorithm.The experimental results show that the traffic efficiency of the signal control method based on deep Q-learning is further improved in the same traffic environment.(3)Considering the difference in vehicle length,the single intersection signal control method based on deep Q-learning is improved.Two new traffic state representation methods are proposed.In addition,when the agent selects the action,the simulated annealing strategy is used to replace the previous ε-greedy exploration strategy,which shortens the training time and speeds up the convergence of the algorithm.(4)A single intersection signal control method based on Deep Deterministic Policy Gradient(DDPG)algorithm is studied.A new signal phase execution strategy is proposed.The execution time of the next phase is outputted at each decision point.The experimental results show that compared with the signal control method based on deep Q-learning,since the phase execution time of the signal control method based on DDPG algorithm can be continuous,the average delay of vehicles is shorter,and average queuing vehicle number of lanes is less. |