With the rapid development of China’s economy and the acceleration of urbanization,the number of motor vehicles has increased sharply,and the resulting problems such as traffic congestion and environmental pollution are becoming more and more serious.The implementation of the development strategy of "public transport priority" can effectively alleviate urban traffic problems and improve the environment of urban passenger transport system.Affected by road traffic,passenger flow demand and many other factors,bus operation is extremely unstable and even appears the phenomenon of "bus bunching",which has a negative impact on bus operation efficiency,reliability and high-quality development of the industry.The common way to solve this problem is to introduce static and dynamic control strategies,which mainly consider the local information of the current headway and timetable,ignoring the global coordination of the whole bus fleet and its long-term effect.Based on this,a multi-agent reinforcement learning model is proposed to develop a dynamic and flexible bus speed control strategy.In this model,each bus is set as an agent,which not only coordinates and optimizes with the front and rear vehicles,but also exchanges information with other vehicles in the fleet.Analyze the bus operation environment,and the main elements are modeled and abstracted.The reinforcement learning algorithm is described and analyzed,and the algorithm framework matching with the problem studied in this paper is selected,and then the basic algorithm(Proximal Policy Optimization,PPO)is selected,on the basis of which the bus fleet model is constructed and the bus operation simulation experiment process is designed.It provides the basis for subsequent model construction and simulation verification.Deduce and analyze the causes of headway disorder in the process of bus operation,and puts forward the speed control strategy considering stabilizing the headway.The PPO algorithm is introduced,and it is extended to the Multi-Agent Proximal Policy Optimization(MAPPO),based on which the speed control strategy model of MAPPO is established.The simulation experiment is designed to compare the effect of the speed control strategy with the non-control strategy and the simple holding control strategy to verify the effectiveness of the model.Aiming at the deficiency of the speed control strategy of stabilizing headway,that is,the excessive deceleration behavior of the vehicle leads to the passengers waiting at the station can not be carried in time,the reason for the imbalance of the number of passengers in the vehicle is deduced and analyzed.a speed control strategy considering expected passenger balance is proposed.Based on the MAPPO model,the Monitor neural network is introduced,and the Multi-Agent Proximal Policy Optimization with Monitor(MAPPO-M)is proposed to establish the MAPPO-M speed control strategy model.The simulation comparison experiment is designed and compared with the results of the MAPPO model to verify the necessity and effectiveness of the improvement of the model. |