Font Size: a A A

Research On Intelligent Control Strategy Of Urban Rail Train ATO Based On Deep Reinforcement Learning

Posted on:2023-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z L JinFull Text:PDF
GTID:2532306848480234Subject:Transportation engineering
Abstract/Summary:PDF Full Text Request
ATO(Automatic Train Operation)system is a very important part of the intelligent development of railway and automatic operation control of urban rail transit.The on-board ATO system calculates the target running speed according to the operation command and the MA(Movement Authority)combined with the railway line parameters,and then adjusts the output traction and braking commands to control the train to track the target running speed.Reasonable ATO train control strategy can ensure the safe and punctual operation of the train,improve the parking accuracy and passenger comfort,reduce the fatigue of drivers and reduce the energy consumption of train traction.At present,most researches are based on train modeling,using the bionic optimization algorithm to calculate the train running target profile,and it is impossible to adjust the control strategy in real time according to the running state of the train.Considering the complex and changeable train running environment,based on the characteristics of adaptive,model-free,strong decision-making power and other of reinforcement learning,the combination of DRL(Deep Reinforcement Learning)and train automatic control are carried out and studied.The main research contents are as follows:Firstly,Considering the train operation characteristics of urban rail transit.Based on the single particle model,the force analysis of the train is carried out,and the train operation model is established.The basic resistance parameters of train are obtained by system identification using train running data.The validity and accuracy of the train operation model is verified by the actual train data,which is used as the source of training data and the simulation environment for the follow-up experiments.Secondly,according to the reinforcement learning Markov decision model,the train speed,distance and remaining running time are taken as the state space.The train traction/braking force level is used as the action space.According to performance evaluation index of ATO system,the continuous reward function of reinforcement learning is designed from the four main control objectives of punctuality,safety,energy-efficient and accurate stop as the algorithm learning direction.At the same time,according to the actual running situation of the train,the ε-greedy exploration strategy is combined with driver’s driving experience,which restrains the exploration space of the algorithm,increases the number of effective samples and improves the learning efficiency and training speed of the algorithm.Then,according to the combination of reinforcement learning value function solution and policy function solution with deep learning,two different algorithms are used: DQN(Deep Q Network)algorithm and DDPG(Deep Deterministic Policy Gradient)algorithm used to solve the train energy-saving control strategy.The DQN algorithm uses the neural network to extract the train running status and uses the historical running data to train the network to approximate the actual action-value function.DDPG algorithm uses Actor-Critic structure,which combines the advantages of value function and policy function solution.The Critic network part evaluates the action strategy of the current train state output by the way of value function solution,and the Actor network part outputs the action policy of the present state by way of policy function solution.The evaluation of the policy is modified according to the Critic network.Finally,according to the above research results,based on the line data of Changsha Metro Line 2,the above algorithms are simulated and verified.The simulation results show that DDPG algorithm can reduce more energy consumption than DQN algorithm and PG(Policy Gradient)algorithm in the case of punctual,safe,comfortable and accurate stop.After training,the DDPG algorithm simulates the control strategy of the train travel planning schedule adjustment,the temporary adjustment of arrival time during operation and after t failure of traction system.The results show that the algorithm can adjust the control strategy in real time according to the current running state of the train feedback,as far as possible to make the train running to meet the requirements of punctual,safe,comfortable and accurate parking,and it has good versatility and real-time performance.
Keywords/Search Tags:Automatic Train Operation, Energy-efficient control strategy, Deep Q Learning, Deep Deterministic Strategy Gradient
PDF Full Text Request
Related items