| With the accelerating process of urban modernization,the infrastructure of urban traffic has been gradually improved,and the scale and direction of traffic roads are more complex.Although the rapid development of urban traffic brings a lot of convenience to people’s daily travel,a series of negative problems have also emerged.For example,automobile emissions affect the air quality of cities,and traffic congestion occurs frequently,etc.For the above traffic congestion problem,changing the road planning situation or improving the road infrastructure can not be completely solved.Under the background of artificial intelligence technology,the application of intelligent transportation system provides a new way to deal with the problem of urban traffic congestion.In recent years,deep reinforcement learning algorithm has been rapidly applied to the research related to intelligent transportation systems by researchers due to its obvious advantages,and has achieved certain results.DQN algorithm is a typical deep reinforcement learning algorithm,which is widely used in various fields of transportation.However,the existing intelligent transportation systems still have great limitations.For example,the conditions of actual traffic road network are complex and changeable,and it is difficult for the system to perceive the temporal and spatial features of the global environment;the catastrophic forgetting problem of model valuation in the process of perceiving road network features still needs to be solved;for the congested areas of the road network that need to be focused on,the feature extraction capability and the attention of convolutional neural networks also needs to be enhanced.To address the above problems in the intelligent transportation systems,this study designs an empirical policy module based on the traffic light timing optimization model of Double DQN,and proposes a deep reinforcement learning model based on the empirical policy network to perceive the state of the traffic road network and make the optimal decisions.The historical average Q values generated by the empirical policy module are involved in the Q calculation of the Double DQN model and the parameter update of multiple modules,which alleviates the error accumulation problem of the action decision and thus obtains a more stable action output.In addition,to enhance the extraction capability and attention of local road network features,the model uses 3D convolutional neural network to extract temporal and spatial features,and a attention mechanism of channel dimension,SENet,is designed to enhance the feature attention of congested road areas.SENet models the interdependence between the feature channels by the structures of squeeze,excitation and scale for feature compression,activation and weight assignment,and enhances the focus on the key features of congested roads.In this study,a traffic light timing optimization model based on deep reinforcement learning with empirical policy is formed by improving the above two innovations.In this study,the number of vehicle waits and the average vehicle wait time of the road network are taken as evaluation indexes,and the SUMO traffic road network simulation platform is used to conduct ablation experiments and comparison experiments with several deep reinforcement learning models.The experimental results show that the traffic light timing optimization method based on deep reinforcement learning with empirical policy proposed in this study significantly reduces the number of vehicle and the average waiting time of the traffic road network,which not only improves the efficiency and stability of traffic light timing,but also further verifies the reliability and effectiveness of the traffic light timing optimization technique. |