| With the rapid development of urbanization,traffic congestion has become a serious social problem,while increasing energy consumption and exhaust emissions have aggravated the damage to urban ecology.Intersection traffic signal control considering vehicle emissions has become an important topic,however,the decision complexity of traffic signal control increases dramatically in a dynamic traffic environment of urban road networks.As an efficient solution for control optimization of complex systems,reinforcement learning is increasingly applied to the field of traffic signal control.It is a serious challenge to use road network traffic information to accurately model the state and reward functions in the reinforcement learning model while coordinating traffic signals at multi-intersections to improve road network traffic conditions.To address the above challenges,this paper focuses on the construction of traffic road network features to generate control strategies for individual agent,as well as coordinated control among multi-agents.The main contents are as follows:(1)First,the Phase-based Empirical Storage Double Deep Q Network(PES-DDQN)algorithm is proposed to solve the single intersection traffic signal control problem.For complex dynamic traffic situations,this paper proposes a phase-based empirical storage mechanism to cope with the problem of unbalanced traffic flow.Vehicle emissions are also taken into account in the design of the state and reward functions,with the aim of improving intersection traffic conditions while minimizing vehicle emissions.(2)For the problem of coordinated control of multi signals in a dynamic traffic environment with multi-intersections.This paper proposes a two-layer coordination algorithm based on multi-agent reinforcement learning—Multi-agent Coordinated Policy Optimization(MACo PO),for solving traffic signal control at multi-intersections.MACo PO consists of local cooperation,which adjusts the weights of individual rewards and neighborhood agents’ rewards by using local cooperation factors(LCF),and global coordination,which updates the LCF to maximize global rewards.The two-layer coordination mechanism enables neighborhood intersections in the road network to cooperate with each other,thus achieving the global optimization.Vehicle emissions are also taken into account in the design of the state and reward functions,with the aim of improving intersection traffic conditions while minimizing vehicle emissions.The effectiveness of the proposed method is verified by constructing a simulated road network and a real road network within the reinforcement learning traffic simulation platform built in this paper,and conducting a large number of simulation experiments. |