| The Chinese urban rail transit system has developed rapidly in recent years and plays a significant role in people’s daily lives.With the increase of the number,speed,environment and situation of urban rail trains,it is difficult for manual train driving to meet the demand for higher operation quality of current trains.Therefore,it is urgent to develop Automatic Train Operation(ATO)to meet the requirements of intelligent and intelligent development of urban rail trains.The research on the control method of an ATO system is currently focused mainly on the target speed curve tracking and the strategy optimization of the fixed parameter model,which can not well adapt to the increasingly complex operation environment and diverse operation requirements of the train.Taking into account the existing research problems,this thesis mainly studies from the following contents.(1)Taking a single urban rail train as the research object,through analyzing the traction,braking characteristics,and running resistance of the train during operation,the inter station kinematic model and corresponding operating constraints of the train are established;According to the maximum principle,various situations of energy-saving operation conditions of trains are analyzed to obtain energy-saving driving strategies when there are multiple switching points between train stations;Finally,by analyzing the principle and structure of reinforcement learning,a Markov decision-making process for urban rail trains is established,and on this basis,a reinforcement learning control model for automatic driving of urban rail trains is established.(2)A method of ATO optimization control based on DQN algorithm is proposed.On the basis of train kinematics model,a multiobjective optimization function is developed for train inter-station operations.A reference system with time planning as active constraint is designed to improve the safety,punctuality and parking accuracy of train inter-station operation and reduce the energy consumption of train inter-station operation;On this basis,the train controller is built using the Q network in the reinforcement learning algorithm,and the parameter update method of the train controller is set.Through continuous interactive learning with the train operation environment,the neural network parameters in the reinforcement learning algorithm are continuously optimized,so that the train controller can better adapt to the complex operation environment of the train;Finally,a simulation experiment is designed.The simulation results show that the algorithm has better control ability compared to traditional target speed curve tracking algorithms,and can dynamically adjust the train operation strategy between stations in real-time to meet the needs of different operation planning time between stations.(3)In order to improve the iterative convergence performance of DQN algorithm,the DDPG algorithm with Actor-Critic network structure is used to optimize the network structure of DQN algorithm,and the strategy network with continuous traction/braking action output is added on the basis of the single Q network structure of DQN algorithm to improve the iterative convergence performance of the algorithm;Utilize the generated periodic historical data to continuously self-learning the ATO controller,thereby continuously updating the train controller to adapt to the changing train operating environment;Finally,a single vehicle multi-station operation time optimization model was proposed,which dynamically adjusts the redundant time between multiple stations of the train.The simulation results showed that the TR-DDPG algorithm can dynamically adjust the operation strategy of multiple stations on the single vehicle line.By dynamically allocating redundant time between multiple stations,the global traction energy consumption between multiple stations on the line is effectively reduced. |