| Due to the complicated road traffic and the large number of vehicles,there is a huge demand in traffic analysis.Many researchers have established corresponding traffic models for different road conditions for description and simulation,most of which focus on the macro-analysis of urban main roads and highway traffic flows.However,few researchers pay attention to microscopic models that describe the laws of individual motion.In the conditions of non-signal crossroad as capillaries of urban traffic,vehicles show different degrees of game behavior,due to the existence of a conflict of wayleave,which affects the efficiency of the crossroad and makes crossroad a common traffic accident point.In view of the phenomenon of game behavior at the non-signal crossroad,this thesis introduces a mechanism of reinforcement learning and game theory to model and study the problem by simulation.This thesis describes and models the road environment and traffic individuals,encapsulates the road scene abstraction as an environment module that can calculate the state features and reward values then updates the traffic individual state,abstracts the three traffic individuals of motor vehicle,non-motor vehicle and pedestrian as the reinforcement learning intelligence(hereinafter referred to as agent).It introduces the DDPG(Deep Deterministic Policy Gradient)algorithm to establish a single agent decision model,and then introduces the MADDPG(Multi-Agent DDPG)algorithm to establish a multi-agent game model,which obtains feedback from the road environment via discrete interaction.It finally trains agent’s microscopic decision-making model.The thesis will train independent traffic strategy under single agent scenario,use the Markov game model in multi-agent scenario to simulate the hybrid traffic scene,finally obtain the micro-model of agent under the traffic game.In order to support the simulation model training process,this thesis has developed a corresponding reinforcement learning python package to provide computational support for simulation scenes.This thesis finds that the traffic agent using reinforcement learning model shows randomness in the early learning stage.With the guidance of various rewards,a relatively fixed behavior strategy could be gradually formed.In the process of game learning,with the accumulation of game experiences,an agent is more inclined to evolve into a strategy with greater reward and finally achieve a new equilibrium state.To some extent,this process simulates the human learning process--constantly receiving environmental feedback to adjust their own behaviors,which removes the early assumption of agent behavior patterns,forming a modeling method that does not rely on hypothesis or large amounts of data.The reinforcement learning modeling adopted in this thesis can ensure that agents would always find a relatively fixed optimal behavior strategy after learning from a specific situation,which shows its theoretical and practical values in the field of microscopic traffic simulation.However,there is still room for further improvement in the multi-scene adaptability and computational optimization based on this model.As development of hardware,agent-based modeling will furnish more details to simulating traffic individuals at lower cost,resulting in higher simulation accuracy. |